Files
homeai/TODO.md
Aodhan Collins 117254d560 feat: Music Assistant, Claude primary LLM, model tag in chat, setup.sh rewrite
- Deploy Music Assistant on Pi (10.0.0.199:8095) with host networking for
  Chromecast mDNS discovery, Spotify + SMB library support
- Switch primary LLM from Ollama to Claude Sonnet 4 (Anthropic API),
  local models remain as fallback
- Add model info tag under each assistant message in dashboard chat,
  persisted in conversation JSON
- Rewrite homeai-agent/setup.sh: loads .env, injects API keys into plists,
  symlinks plists to ~/Library/LaunchAgents/, smoke tests services
- Update install_service() in common.sh to use symlinks instead of copies
- Open UFW ports on Pi for Music Assistant (8095, 8097, 8927)
- Add ANTHROPIC_API_KEY to openclaw + bridge launchd plists

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-18 22:21:28 +00:00

237 lines
12 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# HomeAI — Master TODO
> Track progress across all sub-projects. See each sub-project `PLAN.md` for detailed implementation notes.
> Status: `[ ]` pending · `[~]` in progress · `[x]` done
---
## Phase 1 — Foundation
### P1 · homeai-infra
- [x] Install Docker Desktop for Mac, enable launch at login
- [x] Create shared `homeai` Docker network
- [x] Create `~/server/docker/` directory structure
- [x] Write compose files: Uptime Kuma, code-server, n8n (HA, Portainer, Gitea are pre-existing on 10.0.0.199)
- [x] `docker compose up -d` — bring all services up
- [x] Home Assistant onboarding — long-lived access token generated, stored in `.env`
- [ ] Install Tailscale, verify all services reachable on Tailnet
- [x] Uptime Kuma: add monitors for all services, configure mobile alerts
- [ ] Verify all containers survive a cold reboot
### P2 · homeai-llm
- [x] Install Ollama natively via brew
- [x] Write and load launchd plist (`com.homeai.ollama.plist`) — `/opt/homebrew/bin/ollama`
- [x] Register local GGUF models via Modelfiles (no download): llama3.3:70b, qwen3:32b, codestral:22b, qwen2.5:7b
- [x] Register additional models: EVA-LLaMA-3.33-70B, Midnight-Miqu-70B, QwQ-32B, Qwen3.5-35B, Qwen3-Coder-30B, Qwen3-VL-30B, GLM-4.6V-Flash, DeepSeek-R1-8B, gemma-3-27b
- [x] Add qwen3.5:35b-a3b (MoE, Q8_0) — 26.7 tok/s, recommended for voice pipeline
- [x] Write model keep-warm daemon + launchd service (pins qwen2.5:7b + $HOMEAI_MEDIUM_MODEL in VRAM, checks every 5 min)
- [x] Deploy Open WebUI via Docker compose (port 3030)
- [x] Verify Open WebUI connected to Ollama, all models available
- [x] Run pipeline benchmark (homeai-voice/scripts/benchmark_pipeline.py) — STT/LLM/TTS latency profiled
- [x] Add Ollama + Open WebUI to Uptime Kuma monitors
---
## Phase 2 — Voice Pipeline
### P3 · homeai-voice
- [x] Install `wyoming-faster-whisper` — model: faster-whisper-large-v3 (auto-downloaded)
- [x] Upgrade STT to wyoming-mlx-whisper (whisper-large-v3-turbo, MLX Metal GPU) — 20x faster (8s → 400ms)
- [x] Install Kokoro ONNX TTS — models at `~/models/kokoro/`
- [x] Write Wyoming-Kokoro adapter server (`homeai-voice/tts/wyoming_kokoro_server.py`)
- [x] Write + load launchd plists for Wyoming STT (10300) and TTS (10301)
- [x] Install openWakeWord + pyaudio — model: hey_jarvis
- [x] Write + load openWakeWord launchd plist (`com.homeai.wakeword`) — DISABLED, replaced by Wyoming satellite
- [x] Write `wyoming/test-pipeline.sh` — smoke test (3/3 passing)
- [x] Install Wyoming satellite — handles wake word via HA voice pipeline
- [x] Install Wyoming satellite for Mac Mini (port 10700)
- [x] Write OpenClaw conversation custom component for Home Assistant
- [x] Connect Home Assistant Wyoming integration (STT + TTS + Satellite) — ready to configure in HA UI
- [x] Create HA Voice Assistant pipeline with OpenClaw conversation agent — component ready, needs HA UI setup
- [x] Test HA Assist via browser: type query → hear spoken response
- [x] Test full voice loop: wake word → STT → OpenClaw → TTS → audio playback
- [ ] Install Chatterbox TTS (MPS build), test with sample `.wav`
- [ ] Install Qwen3-TTS via MLX (fallback)
- [ ] Train custom wake word using character name
- [x] Add Wyoming STT/TTS to Uptime Kuma monitors
---
## Phase 3 — Agent & Character
### P4 · homeai-agent
- [x] Install OpenClaw (npm global, v2026.3.2)
- [x] Configure Ollama provider (native API, `http://localhost:11434`)
- [x] Write + load launchd plist (`com.homeai.openclaw`) — gateway on port 8080
- [x] Fix context window: set `contextWindow=32768` for llama3.3:70b in `openclaw.json`
- [x] Fix Llama 3.3 Modelfile: add tool-calling TEMPLATE block
- [x] Verify `openclaw agent --message "..." --agent main` → completed
- [x] Write `skills/home-assistant` SKILL.md — HA REST API control via ha-ctl CLI
- [x] Write `skills/voice-assistant` SKILL.md — voice response style guide
- [x] Wire HASS_TOKEN — create `~/.homeai/hass_token` or set env in launchd plist
- [x] Fix HA tool calling: set commands.native=true, symlink ha-ctl to PATH, update TOOLS.md
- [x] Test home-assistant skill: "turn on/off the reading lamp" — verified exec→ha-ctl→HA action
- [x] Set up mem0 with Chroma backend, test semantic recall
- [x] Write memory backup launchd job
- [x] Build morning briefing n8n workflow
- [x] Build notification router n8n workflow
- [x] Verify full voice → agent → HA action flow
- [x] Add OpenClaw to Uptime Kuma monitors (Manual user action required)
### P5 · homeai-dashboard *(character system + dashboard)*
- [x] Define and write `schema/character.schema.json` (v1)
- [x] Write `characters/aria.json` — default character
- [x] Set up Vite project in `src/`, install deps
- [x] Integrate existing `character-manager.jsx` into Vite project
- [x] Add schema validation on export (ajv)
- [x] Add expression mapping UI section
- [x] Add custom rules editor
- [x] Test full edit → export → validate → load cycle
- [x] Wire character system prompt into OpenClaw agent config
- [x] Record or source voice reference audio for Aria (`~/voices/aria.wav`)
- [x] Pre-process audio with ffmpeg, test with Chatterbox
- [x] Update `aria.json` with voice clone path if quality is good
- [x] Build unified HomeAI dashboard — dark-themed frontend showing live service status + links to individual UIs
- [x] Add character profile management to dashboard — store/switch character configs with attached profile images
- [x] Add TTS voice preview in character editor — Kokoro preview via OpenClaw bridge with loading state, custom text, stop control
- [x] Merge homeai-character + homeai-desktop into unified homeai-dashboard (services, chat, characters, editor)
- [x] Upgrade character schema to v2 — background, dialogue_style, appearance, skills, gaze_presets (auto-migrate v1)
- [x] Add LLM-assisted character creation via Character MCP server (Fandom/Wikipedia lookup)
- [x] Add character memory system — personal (per-character) + general (shared) memories with dashboard UI
- [x] Add conversation history with per-conversation persistence
- [x] Wire character_id through full pipeline (dashboard → bridge → LLM system prompt)
- [x] Add TTS text cleaning — strip tags, asterisks, emojis, markdown before synthesis
- [x] Add per-character TTS voice routing — bridge writes state file, Wyoming server reads it
- [x] Add ElevenLabs TTS support in Wyoming server — cloud voice synthesis via state file routing
- [x] Dashboard auto-selects character's TTS engine/voice (Kokoro or ElevenLabs)
- [ ] Deploy dashboard as Docker container or static site on Mac Mini
---
## Phase 4 — Hardware Satellites
### P6 · homeai-esp32
- [x] Install ESPHome in `~/homeai-esphome-env` (Python 3.12 venv)
- [x] Write `esphome/secrets.yaml` (gitignored)
- [x] Write `homeai-living-room.yaml` (based on official S3-BOX-3 reference config)
- [x] Generate placeholder face illustrations (7 PNGs, 320×240)
- [x] Write `setup.sh` with flash/ota/logs/validate commands
- [x] Write `deploy.sh` with OTA deploy, image management, multi-unit support
- [x] Flash first unit via USB (living room)
- [x] Verify unit appears in HA device list (requires HA 2026.x for ESPHome 2025.12+ compat)
- [x] Assign Wyoming voice pipeline to unit in HA
- [x] Test full wake → STT → LLM → TTS → audio playback cycle
- [x] Test display states: idle → listening → thinking → replying → error
- [x] Verify OTA firmware update works wirelessly (`deploy.sh --device OTA`)
- [ ] Flash remaining units (bedroom, kitchen)
- [ ] Document MAC address → room name mapping
### P6b · homeai-rpi (Kitchen Satellite)
- [x] Set up Wyoming Satellite on Raspberry Pi 5 (SELBINA) with ReSpeaker 2-Mics pHAT
- [x] Write setup.sh — full Pi provisioning (venv, drivers, systemd, scripts)
- [x] Write deploy.sh — remote deploy/manage from Mac Mini (push-wrapper, test-logs, etc.)
- [x] Write satellite_wrapper.py — monkey-patches fixing TTS echo, writer race, streaming timeout
- [x] Test multi-command voice loop without freezing
---
## Phase 5 — Visual Layer
### P7 · homeai-visual
#### VTube Studio Expression Bridge
- [x] Write `vtube-bridge.py` — persistent WebSocket ↔ HTTP bridge daemon (port 8002)
- [x] Write `vtube-ctl` CLI wrapper + OpenClaw skill (`~/.openclaw/skills/vtube-studio/`)
- [x] Wire expression triggers into `openclaw-http-bridge.py` (thinking → idle, speaking → idle)
- [x] Add amplitude-based lip sync to `wyoming_kokoro_server.py` (RMS → MouthOpen parameter)
- [x] Write `test-expressions.py` — auth flow, expression cycle, lip sync sweep, latency test
- [x] Write launchd plist + setup.sh for venv creation and service registration
- [ ] Install VTube Studio from Mac App Store, enable WebSocket API (port 8001)
- [ ] Source/purchase Live2D model, load in VTube Studio
- [ ] Create 8 expression hotkeys, record UUIDs
- [ ] Run `setup.sh` to create venv, install websockets, load launchd service
- [ ] Run `vtube-ctl auth` — click Allow in VTube Studio
- [ ] Update `aria.json` with real hotkey UUIDs (replace placeholders)
- [ ] Run `test-expressions.py --all` — verify expressions + lip sync + latency
- [ ] Set up VTube Studio mobile (iPhone/iPad) on Tailnet
#### Web Visuals (Dashboard)
- [ ] Design PNG/GIF character visuals for web assistant (idle, thinking, speaking, etc.)
- [ ] Integrate animated visuals into homeai-dashboard chat view
- [ ] Sync visual state to voice pipeline events (listening, processing, responding)
- [ ] Add expression transitions and idle animations
### P8 · homeai-android
- [ ] Build Android companion app for mobile assistant access
- [ ] Integrate with OpenClaw bridge API (chat, TTS, STT)
- [ ] Add character visual display
- [ ] Push notification support via ntfy/FCM
---
## Phase 6 — Image Generation
### P9 · homeai-images (ComfyUI)
- [ ] Clone ComfyUI to `~/ComfyUI/`, install deps in venv
- [ ] Verify MPS is detected at launch
- [ ] Write and load launchd plist (`com.homeai.comfyui.plist`)
- [ ] Download SDXL base model + Flux.1-schnell + ControlNet models
- [ ] Test generation via ComfyUI web UI (port 8188)
- [ ] Build and export workflow JSONs (quick, portrait, scene, upscale)
- [ ] Write `skills/comfyui` SKILL.md + implementation
- [ ] Collect character reference images for LoRA training
- [ ] Add ComfyUI to Uptime Kuma monitors
---
## Phase 7 — Extended Integrations & Polish
### P10 · Integrations & Polish
- [x] Deploy Music Assistant (Docker on Pi 10.0.0.199:8095), Spotify + SMB + Chromecast
- [x] Write `skills/music` SKILL.md for OpenClaw
- [ ] Deploy Snapcast server on Mac Mini
- [ ] Configure Snapcast clients on ESP32 units for multi-room audio
- [ ] Configure Authelia as 2FA layer in front of web UIs
- [ ] Build advanced n8n workflows (calendar reminders, daily briefing v2)
- [ ] Create iOS Shortcuts to trigger OpenClaw from iPhone widget
- [ ] Configure ntfy/Pushover alerts in Uptime Kuma for all services
- [ ] Automate mem0 + character config backup to Gitea (daily)
- [ ] Train custom wake word using character's name
- [ ] Document all service URLs, ports, and credentials in a private Gitea wiki
- [ ] Tailscale ACL hardening — restrict which devices can reach which services
- [ ] Stress test: reboot Mac Mini, verify all services recover in <2 minutes
---
## Stretch Goals
### Live2D / VTube Studio
- [ ] Learn Live2D modelling toolchain (Live2D Cubism Editor)
- [ ] Install VTube Studio (Mac App Store), enable WebSocket API on port 8001
- [ ] Source/commission a Live2D model (nizima.com or booth.pm)
- [ ] Create hotkeys for expression states
- [ ] Write `skills/vtube_studio` SKILL.md + implementation
- [ ] Write `lipsync.py` amplitude-based helper
- [ ] Integrate lip sync into OpenClaw TTS dispatch
- [ ] Set up VTube Studio mobile (iPhone/iPad) on Tailnet
---
## Open Decisions
- [ ] Confirm character name (determines wake word training)
- [ ] mem0 backend: Chroma (simple) vs Qdrant Docker (better semantic search)?
- [ ] Snapcast output: ESP32 built-in speakers or dedicated audio hardware per room?
- [ ] Authelia user store: local file vs LDAP?