# HomeAI — Master TODO > Track progress across all sub-projects. See each sub-project `PLAN.md` for detailed implementation notes. > Status: `[ ]` pending · `[~]` in progress · `[x]` done --- ## Phase 1 — Foundation ### P1 · homeai-infra - [x] Install Docker Desktop for Mac, enable launch at login - [x] Create shared `homeai` Docker network - [x] Create `~/server/docker/` directory structure - [x] Write compose files: Uptime Kuma, code-server, n8n (HA, Portainer, Gitea are pre-existing on 10.0.0.199) - [x] `docker compose up -d` — bring all services up - [x] Home Assistant onboarding — long-lived access token generated, stored in `.env` - [ ] Install Tailscale, verify all services reachable on Tailnet - [ ] Uptime Kuma: add monitors for all services, configure mobile alerts - [ ] Verify all containers survive a cold reboot ### P2 · homeai-llm - [x] Install Ollama natively via brew - [x] Write and load launchd plist (`com.homeai.ollama.plist`) — `/opt/homebrew/bin/ollama` - [x] Register local GGUF models via Modelfiles (no download): llama3.3:70b, qwen3:32b, codestral:22b, qwen2.5:7b - [x] Register additional models: EVA-LLaMA-3.33-70B, Midnight-Miqu-70B, QwQ-32B, Qwen3.5-35B, Qwen3-Coder-30B, Qwen3-VL-30B, GLM-4.6V-Flash, DeepSeek-R1-8B, gemma-3-27b - [x] Add qwen3.5:35b-a3b (MoE, Q8_0) — 26.7 tok/s, recommended for voice pipeline - [x] Write model keep-warm daemon + launchd service (pins qwen2.5:7b + $HOMEAI_MEDIUM_MODEL in VRAM, checks every 5 min) - [x] Deploy Open WebUI via Docker compose (port 3030) - [x] Verify Open WebUI connected to Ollama, all models available - [x] Run pipeline benchmark (homeai-voice/scripts/benchmark_pipeline.py) — STT/LLM/TTS latency profiled - [ ] Add Ollama + Open WebUI to Uptime Kuma monitors --- ## Phase 2 — Voice Pipeline ### P3 · homeai-voice - [x] Install `wyoming-faster-whisper` — model: faster-whisper-large-v3 (auto-downloaded) - [x] Upgrade STT to wyoming-mlx-whisper (whisper-large-v3-turbo, MLX Metal GPU) — 20x faster (8s → 400ms) - [x] Install Kokoro ONNX TTS — models at `~/models/kokoro/` - [x] Write Wyoming-Kokoro adapter server (`homeai-voice/tts/wyoming_kokoro_server.py`) - [x] Write + load launchd plists for Wyoming STT (10300) and TTS (10301) - [x] Install openWakeWord + pyaudio — model: hey_jarvis - [x] Write + load openWakeWord launchd plist (`com.homeai.wakeword`) — DISABLED, replaced by Wyoming satellite - [x] Write `wyoming/test-pipeline.sh` — smoke test (3/3 passing) - [x] Install Wyoming satellite — handles wake word via HA voice pipeline - [x] Install Wyoming satellite for Mac Mini (port 10700) - [x] Write OpenClaw conversation custom component for Home Assistant - [x] Connect Home Assistant Wyoming integration (STT + TTS + Satellite) — ready to configure in HA UI - [x] Create HA Voice Assistant pipeline with OpenClaw conversation agent — component ready, needs HA UI setup - [x] Test HA Assist via browser: type query → hear spoken response - [x] Test full voice loop: wake word → STT → OpenClaw → TTS → audio playback - [ ] Install Chatterbox TTS (MPS build), test with sample `.wav` - [ ] Install Qwen3-TTS via MLX (fallback) - [ ] Train custom wake word using character name - [ ] Add Wyoming STT/TTS to Uptime Kuma monitors --- ## Phase 3 — Agent & Character ### P4 · homeai-agent - [x] Install OpenClaw (npm global, v2026.3.2) - [x] Configure Ollama provider (native API, `http://localhost:11434`) - [x] Write + load launchd plist (`com.homeai.openclaw`) — gateway on port 8080 - [x] Fix context window: set `contextWindow=32768` for llama3.3:70b in `openclaw.json` - [x] Fix Llama 3.3 Modelfile: add tool-calling TEMPLATE block - [x] Verify `openclaw agent --message "..." --agent main` → completed - [x] Write `skills/home-assistant` SKILL.md — HA REST API control via ha-ctl CLI - [x] Write `skills/voice-assistant` SKILL.md — voice response style guide - [x] Wire HASS_TOKEN — create `~/.homeai/hass_token` or set env in launchd plist - [x] Fix HA tool calling: set commands.native=true, symlink ha-ctl to PATH, update TOOLS.md - [x] Test home-assistant skill: "turn on/off the reading lamp" — verified exec→ha-ctl→HA action - [x] Set up mem0 with Chroma backend, test semantic recall - [x] Write memory backup launchd job - [x] Build morning briefing n8n workflow - [x] Build notification router n8n workflow - [x] Verify full voice → agent → HA action flow - [x] Add OpenClaw to Uptime Kuma monitors (Manual user action required) ### P5 · homeai-dashboard *(character system + dashboard)* - [x] Define and write `schema/character.schema.json` (v1) - [x] Write `characters/aria.json` — default character - [x] Set up Vite project in `src/`, install deps - [x] Integrate existing `character-manager.jsx` into Vite project - [x] Add schema validation on export (ajv) - [x] Add expression mapping UI section - [x] Add custom rules editor - [x] Test full edit → export → validate → load cycle - [x] Wire character system prompt into OpenClaw agent config - [x] Record or source voice reference audio for Aria (`~/voices/aria.wav`) - [x] Pre-process audio with ffmpeg, test with Chatterbox - [x] Update `aria.json` with voice clone path if quality is good - [x] Build unified HomeAI dashboard — dark-themed frontend showing live service status + links to individual UIs - [x] Add character profile management to dashboard — store/switch character configs with attached profile images - [x] Add TTS voice preview in character editor — Kokoro preview via OpenClaw bridge with loading state, custom text, stop control - [x] Merge homeai-character + homeai-desktop into unified homeai-dashboard (services, chat, characters, editor) - [x] Upgrade character schema to v2 — background, dialogue_style, appearance, skills, gaze_presets (auto-migrate v1) - [x] Add LLM-assisted character creation via Character MCP server (Fandom/Wikipedia lookup) - [x] Add character memory system — personal (per-character) + general (shared) memories with dashboard UI - [x] Add conversation history with per-conversation persistence - [x] Wire character_id through full pipeline (dashboard → bridge → LLM system prompt) - [x] Add TTS text cleaning — strip tags, asterisks, emojis, markdown before synthesis - [x] Add per-character TTS voice routing — bridge writes state file, Wyoming server reads it - [x] Add ElevenLabs TTS support in Wyoming server — cloud voice synthesis via state file routing - [x] Dashboard auto-selects character's TTS engine/voice (Kokoro or ElevenLabs) - [ ] Deploy dashboard as Docker container or static site on Mac Mini --- ## Phase 4 — Hardware Satellites ### P6 · homeai-esp32 - [x] Install ESPHome in `~/homeai-esphome-env` (Python 3.12 venv) - [x] Write `esphome/secrets.yaml` (gitignored) - [x] Write `homeai-living-room.yaml` (based on official S3-BOX-3 reference config) - [x] Generate placeholder face illustrations (7 PNGs, 320×240) - [x] Write `setup.sh` with flash/ota/logs/validate commands - [x] Write `deploy.sh` with OTA deploy, image management, multi-unit support - [x] Flash first unit via USB (living room) - [x] Verify unit appears in HA device list (requires HA 2026.x for ESPHome 2025.12+ compat) - [x] Assign Wyoming voice pipeline to unit in HA - [x] Test full wake → STT → LLM → TTS → audio playback cycle - [x] Test display states: idle → listening → thinking → replying → error - [x] Verify OTA firmware update works wirelessly (`deploy.sh --device OTA`) - [ ] Flash remaining units (bedroom, kitchen) - [ ] Document MAC address → room name mapping ### P6b · homeai-rpi (Kitchen Satellite) - [x] Set up Wyoming Satellite on Raspberry Pi 5 (SELBINA) with ReSpeaker 2-Mics pHAT - [x] Write setup.sh — full Pi provisioning (venv, drivers, systemd, scripts) - [x] Write deploy.sh — remote deploy/manage from Mac Mini (push-wrapper, test-logs, etc.) - [x] Write satellite_wrapper.py — monkey-patches fixing TTS echo, writer race, streaming timeout - [x] Test multi-command voice loop without freezing --- ## Phase 5 — Visual Layer ### P7 · homeai-visual #### VTube Studio Expression Bridge - [x] Write `vtube-bridge.py` — persistent WebSocket ↔ HTTP bridge daemon (port 8002) - [x] Write `vtube-ctl` CLI wrapper + OpenClaw skill (`~/.openclaw/skills/vtube-studio/`) - [x] Wire expression triggers into `openclaw-http-bridge.py` (thinking → idle, speaking → idle) - [x] Add amplitude-based lip sync to `wyoming_kokoro_server.py` (RMS → MouthOpen parameter) - [x] Write `test-expressions.py` — auth flow, expression cycle, lip sync sweep, latency test - [x] Write launchd plist + setup.sh for venv creation and service registration - [ ] Install VTube Studio from Mac App Store, enable WebSocket API (port 8001) - [ ] Source/purchase Live2D model, load in VTube Studio - [ ] Create 8 expression hotkeys, record UUIDs - [ ] Run `setup.sh` to create venv, install websockets, load launchd service - [ ] Run `vtube-ctl auth` — click Allow in VTube Studio - [ ] Update `aria.json` with real hotkey UUIDs (replace placeholders) - [ ] Run `test-expressions.py --all` — verify expressions + lip sync + latency - [ ] Set up VTube Studio mobile (iPhone/iPad) on Tailnet #### Web Visuals (Dashboard) - [ ] Design PNG/GIF character visuals for web assistant (idle, thinking, speaking, etc.) - [ ] Integrate animated visuals into homeai-dashboard chat view - [ ] Sync visual state to voice pipeline events (listening, processing, responding) - [ ] Add expression transitions and idle animations ### P8 · homeai-android - [ ] Build Android companion app for mobile assistant access - [ ] Integrate with OpenClaw bridge API (chat, TTS, STT) - [ ] Add character visual display - [ ] Push notification support via ntfy/FCM --- ## Phase 6 — Image Generation ### P9 · homeai-images (ComfyUI) - [ ] Clone ComfyUI to `~/ComfyUI/`, install deps in venv - [ ] Verify MPS is detected at launch - [ ] Write and load launchd plist (`com.homeai.comfyui.plist`) - [ ] Download SDXL base model + Flux.1-schnell + ControlNet models - [ ] Test generation via ComfyUI web UI (port 8188) - [ ] Build and export workflow JSONs (quick, portrait, scene, upscale) - [ ] Write `skills/comfyui` SKILL.md + implementation - [ ] Collect character reference images for LoRA training - [ ] Add ComfyUI to Uptime Kuma monitors --- ## Phase 7 — Extended Integrations & Polish ### P10 · Integrations & Polish - [ ] Deploy Music Assistant (Docker), integrate with Home Assistant - [ ] Write `skills/music` SKILL.md for OpenClaw - [ ] Deploy Snapcast server on Mac Mini - [ ] Configure Snapcast clients on ESP32 units for multi-room audio - [ ] Configure Authelia as 2FA layer in front of web UIs - [ ] Build advanced n8n workflows (calendar reminders, daily briefing v2) - [ ] Create iOS Shortcuts to trigger OpenClaw from iPhone widget - [ ] Configure ntfy/Pushover alerts in Uptime Kuma for all services - [ ] Automate mem0 + character config backup to Gitea (daily) - [ ] Train custom wake word using character's name - [ ] Document all service URLs, ports, and credentials in a private Gitea wiki - [ ] Tailscale ACL hardening — restrict which devices can reach which services - [ ] Stress test: reboot Mac Mini, verify all services recover in <2 minutes --- ## Stretch Goals ### Live2D / VTube Studio - [ ] Learn Live2D modelling toolchain (Live2D Cubism Editor) - [ ] Install VTube Studio (Mac App Store), enable WebSocket API on port 8001 - [ ] Source/commission a Live2D model (nizima.com or booth.pm) - [ ] Create hotkeys for expression states - [ ] Write `skills/vtube_studio` SKILL.md + implementation - [ ] Write `lipsync.py` amplitude-based helper - [ ] Integrate lip sync into OpenClaw TTS dispatch - [ ] Set up VTube Studio mobile (iPhone/iPad) on Tailnet --- ## Open Decisions - [ ] Confirm character name (determines wake word training) - [ ] mem0 backend: Chroma (simple) vs Qdrant Docker (better semantic search)? - [ ] Snapcast output: ESP32 built-in speakers or dedicated audio hardware per room? - [ ] Authelia user store: local file vs LDAP?