feat: character system v2 — schema upgrade, memory system, per-character TTS routing

Character schema v2: background, dialogue_style, appearance, skills, gaze_presets with automatic v1→v2 migration. LLM-assisted character creation via Character MCP server. Two-tier memory system (personal per-character + general shared) with budget-based injection into LLM system prompt. Per-character TTS voice routing via state file — Wyoming TTS server reads active config to route between Kokoro (local) and ElevenLabs (cloud PCM 24kHz). Dashboard: memories page, conversation history, character profile on cards, auto-TTS engine selection from character config. Also includes VTube Studio expression bridge and ComfyUI API guide. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-17 19:15:46 +00:00
parent 1e52c002c2
commit 60eb89ea42
39 changed files with 3846 additions and 409 deletions
--- a/TODO.md
+++ b/TODO.md
@@ -26,7 +26,7 @@
 - [x] Register local GGUF models via Modelfiles (no download): llama3.3:70b, qwen3:32b, codestral:22b, qwen2.5:7b
 - [x] Register additional models: EVA-LLaMA-3.33-70B, Midnight-Miqu-70B, QwQ-32B, Qwen3.5-35B, Qwen3-Coder-30B, Qwen3-VL-30B, GLM-4.6V-Flash, DeepSeek-R1-8B, gemma-3-27b
 - [x] Add qwen3.5:35b-a3b (MoE, Q8_0) — 26.7 tok/s, recommended for voice pipeline
- [x] Write model preload script + launchd service (keeps voice model in VRAM permanently)
+- [x] Write model keep-warm daemon + launchd service (pins qwen2.5:7b + $HOMEAI_MEDIUM_MODEL in VRAM, checks every 5 min)
 - [x] Deploy Open WebUI via Docker compose (port 3030)
 - [x] Verify Open WebUI connected to Ollama, all models available
 - [x] Run pipeline benchmark (homeai-voice/scripts/benchmark_pipeline.py) — STT/LLM/TTS latency profiled
@@ -82,7 +82,7 @@
 - [x] Verify full voice → agent → HA action flow
 - [x] Add OpenClaw to Uptime Kuma monitors (Manual user action required)

-### P5 · homeai-character *(can start alongside P4)*
+### P5 · homeai-dashboard *(character system + dashboard)*

 - [x] Define and write `schema/character.schema.json` (v1)
 - [x] Write `characters/aria.json` — default character
@@ -100,6 +100,15 @@
 - [x] Add character profile management to dashboard — store/switch character configs with attached profile images
 - [x] Add TTS voice preview in character editor — Kokoro preview via OpenClaw bridge with loading state, custom text, stop control
 - [x] Merge homeai-character + homeai-desktop into unified homeai-dashboard (services, chat, characters, editor)
+- [x] Upgrade character schema to v2 — background, dialogue_style, appearance, skills, gaze_presets (auto-migrate v1)
+- [x] Add LLM-assisted character creation via Character MCP server (Fandom/Wikipedia lookup)
+- [x] Add character memory system — personal (per-character) + general (shared) memories with dashboard UI
+- [x] Add conversation history with per-conversation persistence
+- [x] Wire character_id through full pipeline (dashboard → bridge → LLM system prompt)
+- [x] Add TTS text cleaning — strip tags, asterisks, emojis, markdown before synthesis
+- [x] Add per-character TTS voice routing — bridge writes state file, Wyoming server reads it
+- [x] Add ElevenLabs TTS support in Wyoming server — cloud voice synthesis via state file routing
+- [x] Dashboard auto-selects character's TTS engine/voice (Kokoro or ElevenLabs)
 - [ ] Deploy dashboard as Docker container or static site on Mac Mini

 ---
@@ -123,50 +132,71 @@
 - [ ] Flash remaining units (bedroom, kitchen)
 - [ ] Document MAC address → room name mapping

+### P6b · homeai-rpi (Kitchen Satellite)
+
+- [x] Set up Wyoming Satellite on Raspberry Pi 5 (SELBINA) with ReSpeaker 2-Mics pHAT
+- [x] Write setup.sh — full Pi provisioning (venv, drivers, systemd, scripts)
+- [x] Write deploy.sh — remote deploy/manage from Mac Mini (push-wrapper, test-logs, etc.)
+- [x] Write satellite_wrapper.py — monkey-patches fixing TTS echo, writer race, streaming timeout
+- [x] Test multi-command voice loop without freezing
+
 ---

 ## Phase 5 — Visual Layer

 ### P7 · homeai-visual

- [ ] Install VTube Studio (Mac App Store)
- [ ] Enable WebSocket API on port 8001
- [ ] Source/purchase a Live2D model (nizima.com or booth.pm)
- [ ] Load model in VTube Studio
- [ ] Create hotkeys for all 8 expression states
- [ ] Write `skills/vtube_studio` SKILL.md + implementation
- [ ] Run auth flow — click Allow in VTube Studio, save token
- [ ] Test all 8 expressions via test script
- [ ] Update `aria.json` with real VTube Studio hotkey IDs
- [ ] Write `lipsync.py` amplitude-based helper
- [ ] Integrate lip sync into OpenClaw TTS dispatch
- [ ] Test full pipeline: voice → thinking expression → speaking with lip sync
+#### VTube Studio Expression Bridge
+- [x] Write `vtube-bridge.py` — persistent WebSocket ↔ HTTP bridge daemon (port 8002)
+- [x] Write `vtube-ctl` CLI wrapper + OpenClaw skill (`~/.openclaw/skills/vtube-studio/`)
+- [x] Wire expression triggers into `openclaw-http-bridge.py` (thinking → idle, speaking → idle)
+- [x] Add amplitude-based lip sync to `wyoming_kokoro_server.py` (RMS → MouthOpen parameter)
+- [x] Write `test-expressions.py` — auth flow, expression cycle, lip sync sweep, latency test
+- [x] Write launchd plist + setup.sh for venv creation and service registration
+- [ ] Install VTube Studio from Mac App Store, enable WebSocket API (port 8001)
+- [ ] Source/purchase Live2D model, load in VTube Studio
+- [ ] Create 8 expression hotkeys, record UUIDs
+- [ ] Run `setup.sh` to create venv, install websockets, load launchd service
+- [ ] Run `vtube-ctl auth` — click Allow in VTube Studio
+- [ ] Update `aria.json` with real hotkey UUIDs (replace placeholders)
+- [ ] Run `test-expressions.py --all` — verify expressions + lip sync + latency
 - [ ] Set up VTube Studio mobile (iPhone/iPad) on Tailnet

+#### Web Visuals (Dashboard)
+- [ ] Design PNG/GIF character visuals for web assistant (idle, thinking, speaking, etc.)
+- [ ] Integrate animated visuals into homeai-dashboard chat view
+- [ ] Sync visual state to voice pipeline events (listening, processing, responding)
+- [ ] Add expression transitions and idle animations
+
+### P8 · homeai-android
+
+- [ ] Build Android companion app for mobile assistant access
+- [ ] Integrate with OpenClaw bridge API (chat, TTS, STT)
+- [ ] Add character visual display
+- [ ] Push notification support via ntfy/FCM
+
 ---

 ## Phase 6 — Image Generation

-### P8 · homeai-images
+### P9 · homeai-images (ComfyUI)

 - [ ] Clone ComfyUI to `~/ComfyUI/`, install deps in venv
 - [ ] Verify MPS is detected at launch
 - [ ] Write and load launchd plist (`com.homeai.comfyui.plist`)
- [ ] Download SDXL base model
- [ ] Download Flux.1-schnell
- [ ] Download ControlNet models (canny, depth)
+- [ ] Download SDXL base model + Flux.1-schnell + ControlNet models
 - [ ] Test generation via ComfyUI web UI (port 8188)
- [ ] Build and export `quick.json`, `portrait.json`, `scene.json`, `upscale.json` workflows
+- [ ] Build and export workflow JSONs (quick, portrait, scene, upscale)
 - [ ] Write `skills/comfyui` SKILL.md + implementation
- [ ] Test skill: "Generate a portrait of Aria looking happy"
 - [ ] Collect character reference images for LoRA training
- [ ] Train SDXL LoRA with kohya_ss, verify character consistency
 - [ ] Add ComfyUI to Uptime Kuma monitors

 ---

 ## Phase 7 — Extended Integrations & Polish

+### P10 · Integrations & Polish
+
 - [ ] Deploy Music Assistant (Docker), integrate with Home Assistant
 - [ ] Write `skills/music` SKILL.md for OpenClaw
 - [ ] Deploy Snapcast server on Mac Mini
@@ -183,10 +213,24 @@

 ---

+## Stretch Goals
+
+### Live2D / VTube Studio
+
+- [ ] Learn Live2D modelling toolchain (Live2D Cubism Editor)
+- [ ] Install VTube Studio (Mac App Store), enable WebSocket API on port 8001
+- [ ] Source/commission a Live2D model (nizima.com or booth.pm)
+- [ ] Create hotkeys for expression states
+- [ ] Write `skills/vtube_studio` SKILL.md + implementation
+- [ ] Write `lipsync.py` amplitude-based helper
+- [ ] Integrate lip sync into OpenClaw TTS dispatch
+- [ ] Set up VTube Studio mobile (iPhone/iPad) on Tailnet
+
+---
+
 ## Open Decisions

 - [ ] Confirm character name (determines wake word training)
- [ ] Live2D model: purchase off-the-shelf or commission custom?
 - [ ] mem0 backend: Chroma (simple) vs Qdrant Docker (better semantic search)?
 - [ ] Snapcast output: ESP32 built-in speakers or dedicated audio hardware per room?
 - [ ] Authelia user store: local file vs LDAP?