feat: character system v2 — schema upgrade, memory system, per-character TTS routing
Character schema v2: background, dialogue_style, appearance, skills, gaze_presets with automatic v1→v2 migration. LLM-assisted character creation via Character MCP server. Two-tier memory system (personal per-character + general shared) with budget-based injection into LLM system prompt. Per-character TTS voice routing via state file — Wyoming TTS server reads active config to route between Kokoro (local) and ElevenLabs (cloud PCM 24kHz). Dashboard: memories page, conversation history, character profile on cards, auto-TTS engine selection from character config. Also includes VTube Studio expression bridge and ComfyUI API guide. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
88
TODO.md
88
TODO.md
@@ -26,7 +26,7 @@
|
||||
- [x] Register local GGUF models via Modelfiles (no download): llama3.3:70b, qwen3:32b, codestral:22b, qwen2.5:7b
|
||||
- [x] Register additional models: EVA-LLaMA-3.33-70B, Midnight-Miqu-70B, QwQ-32B, Qwen3.5-35B, Qwen3-Coder-30B, Qwen3-VL-30B, GLM-4.6V-Flash, DeepSeek-R1-8B, gemma-3-27b
|
||||
- [x] Add qwen3.5:35b-a3b (MoE, Q8_0) — 26.7 tok/s, recommended for voice pipeline
|
||||
- [x] Write model preload script + launchd service (keeps voice model in VRAM permanently)
|
||||
- [x] Write model keep-warm daemon + launchd service (pins qwen2.5:7b + $HOMEAI_MEDIUM_MODEL in VRAM, checks every 5 min)
|
||||
- [x] Deploy Open WebUI via Docker compose (port 3030)
|
||||
- [x] Verify Open WebUI connected to Ollama, all models available
|
||||
- [x] Run pipeline benchmark (homeai-voice/scripts/benchmark_pipeline.py) — STT/LLM/TTS latency profiled
|
||||
@@ -82,7 +82,7 @@
|
||||
- [x] Verify full voice → agent → HA action flow
|
||||
- [x] Add OpenClaw to Uptime Kuma monitors (Manual user action required)
|
||||
|
||||
### P5 · homeai-character *(can start alongside P4)*
|
||||
### P5 · homeai-dashboard *(character system + dashboard)*
|
||||
|
||||
- [x] Define and write `schema/character.schema.json` (v1)
|
||||
- [x] Write `characters/aria.json` — default character
|
||||
@@ -100,6 +100,15 @@
|
||||
- [x] Add character profile management to dashboard — store/switch character configs with attached profile images
|
||||
- [x] Add TTS voice preview in character editor — Kokoro preview via OpenClaw bridge with loading state, custom text, stop control
|
||||
- [x] Merge homeai-character + homeai-desktop into unified homeai-dashboard (services, chat, characters, editor)
|
||||
- [x] Upgrade character schema to v2 — background, dialogue_style, appearance, skills, gaze_presets (auto-migrate v1)
|
||||
- [x] Add LLM-assisted character creation via Character MCP server (Fandom/Wikipedia lookup)
|
||||
- [x] Add character memory system — personal (per-character) + general (shared) memories with dashboard UI
|
||||
- [x] Add conversation history with per-conversation persistence
|
||||
- [x] Wire character_id through full pipeline (dashboard → bridge → LLM system prompt)
|
||||
- [x] Add TTS text cleaning — strip tags, asterisks, emojis, markdown before synthesis
|
||||
- [x] Add per-character TTS voice routing — bridge writes state file, Wyoming server reads it
|
||||
- [x] Add ElevenLabs TTS support in Wyoming server — cloud voice synthesis via state file routing
|
||||
- [x] Dashboard auto-selects character's TTS engine/voice (Kokoro or ElevenLabs)
|
||||
- [ ] Deploy dashboard as Docker container or static site on Mac Mini
|
||||
|
||||
---
|
||||
@@ -123,50 +132,71 @@
|
||||
- [ ] Flash remaining units (bedroom, kitchen)
|
||||
- [ ] Document MAC address → room name mapping
|
||||
|
||||
### P6b · homeai-rpi (Kitchen Satellite)
|
||||
|
||||
- [x] Set up Wyoming Satellite on Raspberry Pi 5 (SELBINA) with ReSpeaker 2-Mics pHAT
|
||||
- [x] Write setup.sh — full Pi provisioning (venv, drivers, systemd, scripts)
|
||||
- [x] Write deploy.sh — remote deploy/manage from Mac Mini (push-wrapper, test-logs, etc.)
|
||||
- [x] Write satellite_wrapper.py — monkey-patches fixing TTS echo, writer race, streaming timeout
|
||||
- [x] Test multi-command voice loop without freezing
|
||||
|
||||
---
|
||||
|
||||
## Phase 5 — Visual Layer
|
||||
|
||||
### P7 · homeai-visual
|
||||
|
||||
- [ ] Install VTube Studio (Mac App Store)
|
||||
- [ ] Enable WebSocket API on port 8001
|
||||
- [ ] Source/purchase a Live2D model (nizima.com or booth.pm)
|
||||
- [ ] Load model in VTube Studio
|
||||
- [ ] Create hotkeys for all 8 expression states
|
||||
- [ ] Write `skills/vtube_studio` SKILL.md + implementation
|
||||
- [ ] Run auth flow — click Allow in VTube Studio, save token
|
||||
- [ ] Test all 8 expressions via test script
|
||||
- [ ] Update `aria.json` with real VTube Studio hotkey IDs
|
||||
- [ ] Write `lipsync.py` amplitude-based helper
|
||||
- [ ] Integrate lip sync into OpenClaw TTS dispatch
|
||||
- [ ] Test full pipeline: voice → thinking expression → speaking with lip sync
|
||||
#### VTube Studio Expression Bridge
|
||||
- [x] Write `vtube-bridge.py` — persistent WebSocket ↔ HTTP bridge daemon (port 8002)
|
||||
- [x] Write `vtube-ctl` CLI wrapper + OpenClaw skill (`~/.openclaw/skills/vtube-studio/`)
|
||||
- [x] Wire expression triggers into `openclaw-http-bridge.py` (thinking → idle, speaking → idle)
|
||||
- [x] Add amplitude-based lip sync to `wyoming_kokoro_server.py` (RMS → MouthOpen parameter)
|
||||
- [x] Write `test-expressions.py` — auth flow, expression cycle, lip sync sweep, latency test
|
||||
- [x] Write launchd plist + setup.sh for venv creation and service registration
|
||||
- [ ] Install VTube Studio from Mac App Store, enable WebSocket API (port 8001)
|
||||
- [ ] Source/purchase Live2D model, load in VTube Studio
|
||||
- [ ] Create 8 expression hotkeys, record UUIDs
|
||||
- [ ] Run `setup.sh` to create venv, install websockets, load launchd service
|
||||
- [ ] Run `vtube-ctl auth` — click Allow in VTube Studio
|
||||
- [ ] Update `aria.json` with real hotkey UUIDs (replace placeholders)
|
||||
- [ ] Run `test-expressions.py --all` — verify expressions + lip sync + latency
|
||||
- [ ] Set up VTube Studio mobile (iPhone/iPad) on Tailnet
|
||||
|
||||
#### Web Visuals (Dashboard)
|
||||
- [ ] Design PNG/GIF character visuals for web assistant (idle, thinking, speaking, etc.)
|
||||
- [ ] Integrate animated visuals into homeai-dashboard chat view
|
||||
- [ ] Sync visual state to voice pipeline events (listening, processing, responding)
|
||||
- [ ] Add expression transitions and idle animations
|
||||
|
||||
### P8 · homeai-android
|
||||
|
||||
- [ ] Build Android companion app for mobile assistant access
|
||||
- [ ] Integrate with OpenClaw bridge API (chat, TTS, STT)
|
||||
- [ ] Add character visual display
|
||||
- [ ] Push notification support via ntfy/FCM
|
||||
|
||||
---
|
||||
|
||||
## Phase 6 — Image Generation
|
||||
|
||||
### P8 · homeai-images
|
||||
### P9 · homeai-images (ComfyUI)
|
||||
|
||||
- [ ] Clone ComfyUI to `~/ComfyUI/`, install deps in venv
|
||||
- [ ] Verify MPS is detected at launch
|
||||
- [ ] Write and load launchd plist (`com.homeai.comfyui.plist`)
|
||||
- [ ] Download SDXL base model
|
||||
- [ ] Download Flux.1-schnell
|
||||
- [ ] Download ControlNet models (canny, depth)
|
||||
- [ ] Download SDXL base model + Flux.1-schnell + ControlNet models
|
||||
- [ ] Test generation via ComfyUI web UI (port 8188)
|
||||
- [ ] Build and export `quick.json`, `portrait.json`, `scene.json`, `upscale.json` workflows
|
||||
- [ ] Build and export workflow JSONs (quick, portrait, scene, upscale)
|
||||
- [ ] Write `skills/comfyui` SKILL.md + implementation
|
||||
- [ ] Test skill: "Generate a portrait of Aria looking happy"
|
||||
- [ ] Collect character reference images for LoRA training
|
||||
- [ ] Train SDXL LoRA with kohya_ss, verify character consistency
|
||||
- [ ] Add ComfyUI to Uptime Kuma monitors
|
||||
|
||||
---
|
||||
|
||||
## Phase 7 — Extended Integrations & Polish
|
||||
|
||||
### P10 · Integrations & Polish
|
||||
|
||||
- [ ] Deploy Music Assistant (Docker), integrate with Home Assistant
|
||||
- [ ] Write `skills/music` SKILL.md for OpenClaw
|
||||
- [ ] Deploy Snapcast server on Mac Mini
|
||||
@@ -183,10 +213,24 @@
|
||||
|
||||
---
|
||||
|
||||
## Stretch Goals
|
||||
|
||||
### Live2D / VTube Studio
|
||||
|
||||
- [ ] Learn Live2D modelling toolchain (Live2D Cubism Editor)
|
||||
- [ ] Install VTube Studio (Mac App Store), enable WebSocket API on port 8001
|
||||
- [ ] Source/commission a Live2D model (nizima.com or booth.pm)
|
||||
- [ ] Create hotkeys for expression states
|
||||
- [ ] Write `skills/vtube_studio` SKILL.md + implementation
|
||||
- [ ] Write `lipsync.py` amplitude-based helper
|
||||
- [ ] Integrate lip sync into OpenClaw TTS dispatch
|
||||
- [ ] Set up VTube Studio mobile (iPhone/iPad) on Tailnet
|
||||
|
||||
---
|
||||
|
||||
## Open Decisions
|
||||
|
||||
- [ ] Confirm character name (determines wake word training)
|
||||
- [ ] Live2D model: purchase off-the-shelf or commission custom?
|
||||
- [ ] mem0 backend: Chroma (simple) vs Qdrant Docker (better semantic search)?
|
||||
- [ ] Snapcast output: ESP32 built-in speakers or dedicated audio hardware per room?
|
||||
- [ ] Authelia user store: local file vs LDAP?
|
||||
|
||||
Reference in New Issue
Block a user