Files
homeai/TODO.md
Aodhan Collins c31724c92b Complete P2 (LLM) and P3 (voice pipeline) implementation
P2 — homeai-llm:
- Fix ollama launchd plist path for Apple Silicon (/opt/homebrew/bin/ollama)
- Add Modelfiles for local GGUF models: llama3.3:70b, qwen3:32b, codestral:22b
  (registered via `ollama create` — no re-download needed)

P3 — homeai-voice:
- Wyoming STT: wyoming-faster-whisper, large-v3 model, port 10300
- Wyoming TTS: custom Kokoro ONNX server (wyoming_kokoro_server.py), port 10301
  Voice af_heart; models at ~/models/kokoro/
- Wake word: openWakeWord daemon (hey_jarvis), notifies OpenClaw at /wake
- launchd plists for all three services + load-all-launchd.sh helper
- Smoke test: wyoming/test-pipeline.sh — 3/3 passing

HA Wyoming integration pending manual UI config (STT 10.0.0.200:10300,
TTS 10.0.0.200:10301).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-04 23:28:22 +00:00

185 lines
7.8 KiB
Markdown

# HomeAI — Master TODO
> Track progress across all sub-projects. See each sub-project `PLAN.md` for detailed implementation notes.
> Status: `[ ]` pending · `[~]` in progress · `[x]` done
---
## Phase 1 — Foundation
### P1 · homeai-infra
- [x] Install Docker Desktop for Mac, enable launch at login
- [x] Create shared `homeai` Docker network
- [x] Create `~/server/docker/` directory structure
- [x] Write compose files: Uptime Kuma, code-server, n8n (HA, Portainer, Gitea are pre-existing on 10.0.0.199)
- [x] `docker compose up -d` — bring all services up
- [x] Home Assistant onboarding — long-lived access token generated, stored in `.env`
- [ ] Install Tailscale, verify all services reachable on Tailnet
- [ ] Gitea: initialise all 8 sub-project repos, configure SSH
- [ ] Uptime Kuma: add monitors for all services, configure mobile alerts
- [ ] Verify all containers survive a cold reboot
### P2 · homeai-llm
- [x] Install Ollama natively via brew
- [x] Write and load launchd plist (`com.homeai.ollama.plist`) — `/opt/homebrew/bin/ollama`
- [x] Register local GGUF models via Modelfiles (no download): llama3.3:70b, qwen3:32b, codestral:22b
- [x] Deploy Open WebUI via Docker compose (port 3030)
- [x] Verify Open WebUI connected to Ollama, all models available
- [ ] Run `scripts/benchmark.sh` — record results in `benchmark-results.md`
- [ ] Add Ollama + Open WebUI to Uptime Kuma monitors
---
## Phase 2 — Voice Pipeline
### P3 · homeai-voice
- [x] Install `wyoming-faster-whisper` — model: faster-whisper-large-v3 (auto-downloaded)
- [x] Install Kokoro ONNX TTS — models at `~/models/kokoro/`
- [x] Write Wyoming-Kokoro adapter server (`homeai-voice/tts/wyoming_kokoro_server.py`)
- [x] Write + load launchd plists for Wyoming STT (10300) and TTS (10301)
- [x] Install openWakeWord + pyaudio — model: hey_jarvis
- [x] Write + load openWakeWord launchd plist (`com.homeai.wakeword`)
- [x] Write `wyoming/test-pipeline.sh` — smoke test (3/3 passing)
- [~] Connect Home Assistant Wyoming integration (STT + TTS) — awaiting HA UI config
- [ ] Create HA Voice Assistant pipeline
- [ ] Test HA Assist via browser: type query → hear spoken response
- [ ] Install Chatterbox TTS (MPS build), test with sample `.wav`
- [ ] Install Qwen3-TTS via MLX (fallback)
- [ ] Train custom wake word using character name
- [ ] Add Wyoming STT/TTS to Uptime Kuma monitors
---
## Phase 3 — Agent & Character
### P5 · homeai-character *(no runtime deps — can start alongside P1)*
- [ ] Define and write `schema/character.schema.json` (v1)
- [ ] Write `characters/aria.json` — default character
- [ ] Set up Vite project in `src/`, install deps
- [ ] Integrate existing `character-manager.jsx` into Vite project
- [ ] Add schema validation on export (ajv)
- [ ] Add expression mapping UI section
- [ ] Add custom rules editor
- [ ] Test full edit → export → validate → load cycle
- [ ] Record or source voice reference audio for Aria (`~/voices/aria.wav`)
- [ ] Pre-process audio with ffmpeg, test with Chatterbox
- [ ] Update `aria.json` with voice clone path if quality is good
- [ ] Write `SchemaValidator.js` as standalone utility
### P4 · homeai-agent
- [ ] Confirm OpenClaw installation method and Ollama compatibility
- [ ] Install OpenClaw, write `~/.openclaw/config.yaml`
- [ ] Verify OpenClaw responds to basic text query via `/chat`
- [ ] Write `skills/home_assistant.py` — test lights on/off via voice
- [ ] Write `skills/memory.py` — test store and recall
- [ ] Write `skills/weather.py` — verify HA weather sensor data
- [ ] Write `skills/timer.py` — test set/fire a timer
- [ ] Write skill stubs: `music.py`, `vtube_studio.py`, `comfyui.py`
- [ ] Set up mem0 with Chroma backend, test semantic recall
- [ ] Write and load memory backup launchd job
- [ ] Symlink `homeai-agent/skills/``~/.openclaw/skills/`
- [ ] Build morning briefing n8n workflow
- [ ] Build notification router n8n workflow
- [ ] Verify full voice → agent → HA action flow
- [ ] Add OpenClaw to Uptime Kuma monitors
---
## Phase 4 — Hardware Satellites
### P6 · homeai-esp32
- [ ] Install ESPHome: `pip install esphome`
- [ ] Write `esphome/secrets.yaml` (gitignored)
- [ ] Write `base.yaml`, `voice.yaml`, `display.yaml`, `animations.yaml`
- [ ] Write `s3-box-living-room.yaml` for first unit
- [ ] Flash first unit via USB
- [ ] Verify unit appears in HA device list
- [ ] Assign Wyoming voice pipeline to unit in HA
- [ ] Test full wake → STT → LLM → TTS → audio playback cycle
- [ ] Test LVGL face: idle → listening → thinking → speaking → error
- [ ] Verify OTA firmware update works wirelessly
- [ ] Flash remaining units (bedroom, kitchen, etc.)
- [ ] Document MAC address → room name mapping
---
## Phase 5 — Visual Layer
### P7 · homeai-visual
- [ ] Install VTube Studio (Mac App Store)
- [ ] Enable WebSocket API on port 8001
- [ ] Source/purchase a Live2D model (nizima.com or booth.pm)
- [ ] Load model in VTube Studio
- [ ] Create hotkeys for all 8 expression states
- [ ] Write `skills/vtube_studio.py` full implementation
- [ ] Run auth flow — click Allow in VTube Studio, save token
- [ ] Test all 8 expressions via test script
- [ ] Update `aria.json` with real VTube Studio hotkey IDs
- [ ] Write `lipsync.py` amplitude-based helper
- [ ] Integrate lip sync into OpenClaw TTS dispatch
- [ ] Symlink `skills/``~/.openclaw/skills/`
- [ ] Test full pipeline: voice → thinking expression → speaking with lip sync
- [ ] Set up VTube Studio mobile (iPhone/iPad) on Tailnet
---
## Phase 6 — Image Generation
### P8 · homeai-images
- [ ] Clone ComfyUI to `~/ComfyUI/`, install deps in venv
- [ ] Verify MPS is detected at launch
- [ ] Write and load launchd plist (`com.homeai.comfyui.plist`)
- [ ] Download SDXL base model
- [ ] Download Flux.1-schnell
- [ ] Download ControlNet models (canny, depth)
- [ ] Test generation via ComfyUI web UI (port 8188)
- [ ] Build and export `quick.json` workflow
- [ ] Build and export `portrait.json` workflow
- [ ] Build and export `scene.json` workflow (ControlNet)
- [ ] Build and export `upscale.json` workflow
- [ ] Write `skills/comfyui.py` full implementation
- [ ] Test skill: `comfyui.quick("test prompt")` → image file returned
- [ ] Collect character reference images for LoRA training
- [ ] Train SDXL LoRA with kohya_ss
- [ ] Load LoRA into `portrait.json`, verify character consistency
- [ ] Symlink `skills/``~/.openclaw/skills/`
- [ ] Test via OpenClaw: "Generate a portrait of Aria looking happy"
- [ ] Add ComfyUI to Uptime Kuma monitors
---
## Phase 7 — Extended Integrations & Polish
- [ ] Deploy Music Assistant (Docker), integrate with Home Assistant
- [ ] Complete `skills/music.py` in OpenClaw
- [ ] Deploy Snapcast server on Mac Mini
- [ ] Configure Snapcast clients on ESP32 units for multi-room audio
- [ ] Configure Authelia as 2FA layer in front of web UIs
- [ ] Build advanced n8n workflows (calendar reminders, daily briefing v2)
- [ ] Create iOS Shortcuts to trigger OpenClaw from iPhone widget
- [ ] Configure ntfy/Pushover alerts in Uptime Kuma for all services
- [ ] Automate mem0 + character config backup to Gitea (daily)
- [ ] Train custom wake word using character's name
- [ ] Document all service URLs, ports, and credentials in a private Gitea wiki
- [ ] Tailscale ACL hardening — restrict which devices can reach which services
- [ ] Stress test: reboot Mac Mini, verify all services recover in <2 minutes
---
## Open Decisions
- [ ] Confirm character name (determines wake word training)
- [ ] Confirm OpenClaw version/fork and Ollama compatibility
- [ ] Live2D model: purchase off-the-shelf or commission custom?
- [ ] mem0 backend: Chroma (simple) vs Qdrant Docker (better semantic search)?
- [ ] Snapcast output: ESP32 built-in speakers or dedicated audio hardware per room?
- [ ] Authelia user store: local file vs LDAP?