Files
homeai/TODO.md
Aodhan Collins c31724c92b Complete P2 (LLM) and P3 (voice pipeline) implementation
P2 — homeai-llm:
- Fix ollama launchd plist path for Apple Silicon (/opt/homebrew/bin/ollama)
- Add Modelfiles for local GGUF models: llama3.3:70b, qwen3:32b, codestral:22b
  (registered via `ollama create` — no re-download needed)

P3 — homeai-voice:
- Wyoming STT: wyoming-faster-whisper, large-v3 model, port 10300
- Wyoming TTS: custom Kokoro ONNX server (wyoming_kokoro_server.py), port 10301
  Voice af_heart; models at ~/models/kokoro/
- Wake word: openWakeWord daemon (hey_jarvis), notifies OpenClaw at /wake
- launchd plists for all three services + load-all-launchd.sh helper
- Smoke test: wyoming/test-pipeline.sh — 3/3 passing

HA Wyoming integration pending manual UI config (STT 10.0.0.200:10300,
TTS 10.0.0.200:10301).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-04 23:28:22 +00:00

7.8 KiB

HomeAI — Master TODO

Track progress across all sub-projects. See each sub-project PLAN.md for detailed implementation notes. Status: [ ] pending · [~] in progress · [x] done


Phase 1 — Foundation

P1 · homeai-infra

  • Install Docker Desktop for Mac, enable launch at login
  • Create shared homeai Docker network
  • Create ~/server/docker/ directory structure
  • Write compose files: Uptime Kuma, code-server, n8n (HA, Portainer, Gitea are pre-existing on 10.0.0.199)
  • docker compose up -d — bring all services up
  • Home Assistant onboarding — long-lived access token generated, stored in .env
  • Install Tailscale, verify all services reachable on Tailnet
  • Gitea: initialise all 8 sub-project repos, configure SSH
  • Uptime Kuma: add monitors for all services, configure mobile alerts
  • Verify all containers survive a cold reboot

P2 · homeai-llm

  • Install Ollama natively via brew
  • Write and load launchd plist (com.homeai.ollama.plist) — /opt/homebrew/bin/ollama
  • Register local GGUF models via Modelfiles (no download): llama3.3:70b, qwen3:32b, codestral:22b
  • Deploy Open WebUI via Docker compose (port 3030)
  • Verify Open WebUI connected to Ollama, all models available
  • Run scripts/benchmark.sh — record results in benchmark-results.md
  • Add Ollama + Open WebUI to Uptime Kuma monitors

Phase 2 — Voice Pipeline

P3 · homeai-voice

  • Install wyoming-faster-whisper — model: faster-whisper-large-v3 (auto-downloaded)
  • Install Kokoro ONNX TTS — models at ~/models/kokoro/
  • Write Wyoming-Kokoro adapter server (homeai-voice/tts/wyoming_kokoro_server.py)
  • Write + load launchd plists for Wyoming STT (10300) and TTS (10301)
  • Install openWakeWord + pyaudio — model: hey_jarvis
  • Write + load openWakeWord launchd plist (com.homeai.wakeword)
  • Write wyoming/test-pipeline.sh — smoke test (3/3 passing)
  • [~] Connect Home Assistant Wyoming integration (STT + TTS) — awaiting HA UI config
  • Create HA Voice Assistant pipeline
  • Test HA Assist via browser: type query → hear spoken response
  • Install Chatterbox TTS (MPS build), test with sample .wav
  • Install Qwen3-TTS via MLX (fallback)
  • Train custom wake word using character name
  • Add Wyoming STT/TTS to Uptime Kuma monitors

Phase 3 — Agent & Character

P5 · homeai-character (no runtime deps — can start alongside P1)

  • Define and write schema/character.schema.json (v1)
  • Write characters/aria.json — default character
  • Set up Vite project in src/, install deps
  • Integrate existing character-manager.jsx into Vite project
  • Add schema validation on export (ajv)
  • Add expression mapping UI section
  • Add custom rules editor
  • Test full edit → export → validate → load cycle
  • Record or source voice reference audio for Aria (~/voices/aria.wav)
  • Pre-process audio with ffmpeg, test with Chatterbox
  • Update aria.json with voice clone path if quality is good
  • Write SchemaValidator.js as standalone utility

P4 · homeai-agent

  • Confirm OpenClaw installation method and Ollama compatibility
  • Install OpenClaw, write ~/.openclaw/config.yaml
  • Verify OpenClaw responds to basic text query via /chat
  • Write skills/home_assistant.py — test lights on/off via voice
  • Write skills/memory.py — test store and recall
  • Write skills/weather.py — verify HA weather sensor data
  • Write skills/timer.py — test set/fire a timer
  • Write skill stubs: music.py, vtube_studio.py, comfyui.py
  • Set up mem0 with Chroma backend, test semantic recall
  • Write and load memory backup launchd job
  • Symlink homeai-agent/skills/~/.openclaw/skills/
  • Build morning briefing n8n workflow
  • Build notification router n8n workflow
  • Verify full voice → agent → HA action flow
  • Add OpenClaw to Uptime Kuma monitors

Phase 4 — Hardware Satellites

P6 · homeai-esp32

  • Install ESPHome: pip install esphome
  • Write esphome/secrets.yaml (gitignored)
  • Write base.yaml, voice.yaml, display.yaml, animations.yaml
  • Write s3-box-living-room.yaml for first unit
  • Flash first unit via USB
  • Verify unit appears in HA device list
  • Assign Wyoming voice pipeline to unit in HA
  • Test full wake → STT → LLM → TTS → audio playback cycle
  • Test LVGL face: idle → listening → thinking → speaking → error
  • Verify OTA firmware update works wirelessly
  • Flash remaining units (bedroom, kitchen, etc.)
  • Document MAC address → room name mapping

Phase 5 — Visual Layer

P7 · homeai-visual

  • Install VTube Studio (Mac App Store)
  • Enable WebSocket API on port 8001
  • Source/purchase a Live2D model (nizima.com or booth.pm)
  • Load model in VTube Studio
  • Create hotkeys for all 8 expression states
  • Write skills/vtube_studio.py full implementation
  • Run auth flow — click Allow in VTube Studio, save token
  • Test all 8 expressions via test script
  • Update aria.json with real VTube Studio hotkey IDs
  • Write lipsync.py amplitude-based helper
  • Integrate lip sync into OpenClaw TTS dispatch
  • Symlink skills/~/.openclaw/skills/
  • Test full pipeline: voice → thinking expression → speaking with lip sync
  • Set up VTube Studio mobile (iPhone/iPad) on Tailnet

Phase 6 — Image Generation

P8 · homeai-images

  • Clone ComfyUI to ~/ComfyUI/, install deps in venv
  • Verify MPS is detected at launch
  • Write and load launchd plist (com.homeai.comfyui.plist)
  • Download SDXL base model
  • Download Flux.1-schnell
  • Download ControlNet models (canny, depth)
  • Test generation via ComfyUI web UI (port 8188)
  • Build and export quick.json workflow
  • Build and export portrait.json workflow
  • Build and export scene.json workflow (ControlNet)
  • Build and export upscale.json workflow
  • Write skills/comfyui.py full implementation
  • Test skill: comfyui.quick("test prompt") → image file returned
  • Collect character reference images for LoRA training
  • Train SDXL LoRA with kohya_ss
  • Load LoRA into portrait.json, verify character consistency
  • Symlink skills/~/.openclaw/skills/
  • Test via OpenClaw: "Generate a portrait of Aria looking happy"
  • Add ComfyUI to Uptime Kuma monitors

Phase 7 — Extended Integrations & Polish

  • Deploy Music Assistant (Docker), integrate with Home Assistant
  • Complete skills/music.py in OpenClaw
  • Deploy Snapcast server on Mac Mini
  • Configure Snapcast clients on ESP32 units for multi-room audio
  • Configure Authelia as 2FA layer in front of web UIs
  • Build advanced n8n workflows (calendar reminders, daily briefing v2)
  • Create iOS Shortcuts to trigger OpenClaw from iPhone widget
  • Configure ntfy/Pushover alerts in Uptime Kuma for all services
  • Automate mem0 + character config backup to Gitea (daily)
  • Train custom wake word using character's name
  • Document all service URLs, ports, and credentials in a private Gitea wiki
  • Tailscale ACL hardening — restrict which devices can reach which services
  • Stress test: reboot Mac Mini, verify all services recover in <2 minutes

Open Decisions

  • Confirm character name (determines wake word training)
  • Confirm OpenClaw version/fork and Ollama compatibility
  • Live2D model: purchase off-the-shelf or commission custom?
  • mem0 backend: Chroma (simple) vs Qdrant Docker (better semantic search)?
  • Snapcast output: ESP32 built-in speakers or dedicated audio hardware per room?
  • Authelia user store: local file vs LDAP?