Files
homeai/TODO.md
Aodhan Collins 117254d560 feat: Music Assistant, Claude primary LLM, model tag in chat, setup.sh rewrite
- Deploy Music Assistant on Pi (10.0.0.199:8095) with host networking for
  Chromecast mDNS discovery, Spotify + SMB library support
- Switch primary LLM from Ollama to Claude Sonnet 4 (Anthropic API),
  local models remain as fallback
- Add model info tag under each assistant message in dashboard chat,
  persisted in conversation JSON
- Rewrite homeai-agent/setup.sh: loads .env, injects API keys into plists,
  symlinks plists to ~/Library/LaunchAgents/, smoke tests services
- Update install_service() in common.sh to use symlinks instead of copies
- Open UFW ports on Pi for Music Assistant (8095, 8097, 8927)
- Add ANTHROPIC_API_KEY to openclaw + bridge launchd plists

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-18 22:21:28 +00:00

12 KiB
Raw Blame History

HomeAI — Master TODO

Track progress across all sub-projects. See each sub-project PLAN.md for detailed implementation notes. Status: [ ] pending · [~] in progress · [x] done


Phase 1 — Foundation

P1 · homeai-infra

  • Install Docker Desktop for Mac, enable launch at login
  • Create shared homeai Docker network
  • Create ~/server/docker/ directory structure
  • Write compose files: Uptime Kuma, code-server, n8n (HA, Portainer, Gitea are pre-existing on 10.0.0.199)
  • docker compose up -d — bring all services up
  • Home Assistant onboarding — long-lived access token generated, stored in .env
  • Install Tailscale, verify all services reachable on Tailnet
  • Uptime Kuma: add monitors for all services, configure mobile alerts
  • Verify all containers survive a cold reboot

P2 · homeai-llm

  • Install Ollama natively via brew
  • Write and load launchd plist (com.homeai.ollama.plist) — /opt/homebrew/bin/ollama
  • Register local GGUF models via Modelfiles (no download): llama3.3:70b, qwen3:32b, codestral:22b, qwen2.5:7b
  • Register additional models: EVA-LLaMA-3.33-70B, Midnight-Miqu-70B, QwQ-32B, Qwen3.5-35B, Qwen3-Coder-30B, Qwen3-VL-30B, GLM-4.6V-Flash, DeepSeek-R1-8B, gemma-3-27b
  • Add qwen3.5:35b-a3b (MoE, Q8_0) — 26.7 tok/s, recommended for voice pipeline
  • Write model keep-warm daemon + launchd service (pins qwen2.5:7b + $HOMEAI_MEDIUM_MODEL in VRAM, checks every 5 min)
  • Deploy Open WebUI via Docker compose (port 3030)
  • Verify Open WebUI connected to Ollama, all models available
  • Run pipeline benchmark (homeai-voice/scripts/benchmark_pipeline.py) — STT/LLM/TTS latency profiled
  • Add Ollama + Open WebUI to Uptime Kuma monitors

Phase 2 — Voice Pipeline

P3 · homeai-voice

  • Install wyoming-faster-whisper — model: faster-whisper-large-v3 (auto-downloaded)
  • Upgrade STT to wyoming-mlx-whisper (whisper-large-v3-turbo, MLX Metal GPU) — 20x faster (8s → 400ms)
  • Install Kokoro ONNX TTS — models at ~/models/kokoro/
  • Write Wyoming-Kokoro adapter server (homeai-voice/tts/wyoming_kokoro_server.py)
  • Write + load launchd plists for Wyoming STT (10300) and TTS (10301)
  • Install openWakeWord + pyaudio — model: hey_jarvis
  • Write + load openWakeWord launchd plist (com.homeai.wakeword) — DISABLED, replaced by Wyoming satellite
  • Write wyoming/test-pipeline.sh — smoke test (3/3 passing)
  • Install Wyoming satellite — handles wake word via HA voice pipeline
  • Install Wyoming satellite for Mac Mini (port 10700)
  • Write OpenClaw conversation custom component for Home Assistant
  • Connect Home Assistant Wyoming integration (STT + TTS + Satellite) — ready to configure in HA UI
  • Create HA Voice Assistant pipeline with OpenClaw conversation agent — component ready, needs HA UI setup
  • Test HA Assist via browser: type query → hear spoken response
  • Test full voice loop: wake word → STT → OpenClaw → TTS → audio playback
  • Install Chatterbox TTS (MPS build), test with sample .wav
  • Install Qwen3-TTS via MLX (fallback)
  • Train custom wake word using character name
  • Add Wyoming STT/TTS to Uptime Kuma monitors

Phase 3 — Agent & Character

P4 · homeai-agent

  • Install OpenClaw (npm global, v2026.3.2)
  • Configure Ollama provider (native API, http://localhost:11434)
  • Write + load launchd plist (com.homeai.openclaw) — gateway on port 8080
  • Fix context window: set contextWindow=32768 for llama3.3:70b in openclaw.json
  • Fix Llama 3.3 Modelfile: add tool-calling TEMPLATE block
  • Verify openclaw agent --message "..." --agent main → completed
  • Write skills/home-assistant SKILL.md — HA REST API control via ha-ctl CLI
  • Write skills/voice-assistant SKILL.md — voice response style guide
  • Wire HASS_TOKEN — create ~/.homeai/hass_token or set env in launchd plist
  • Fix HA tool calling: set commands.native=true, symlink ha-ctl to PATH, update TOOLS.md
  • Test home-assistant skill: "turn on/off the reading lamp" — verified exec→ha-ctl→HA action
  • Set up mem0 with Chroma backend, test semantic recall
  • Write memory backup launchd job
  • Build morning briefing n8n workflow
  • Build notification router n8n workflow
  • Verify full voice → agent → HA action flow
  • Add OpenClaw to Uptime Kuma monitors (Manual user action required)

P5 · homeai-dashboard (character system + dashboard)

  • Define and write schema/character.schema.json (v1)
  • Write characters/aria.json — default character
  • Set up Vite project in src/, install deps
  • Integrate existing character-manager.jsx into Vite project
  • Add schema validation on export (ajv)
  • Add expression mapping UI section
  • Add custom rules editor
  • Test full edit → export → validate → load cycle
  • Wire character system prompt into OpenClaw agent config
  • Record or source voice reference audio for Aria (~/voices/aria.wav)
  • Pre-process audio with ffmpeg, test with Chatterbox
  • Update aria.json with voice clone path if quality is good
  • Build unified HomeAI dashboard — dark-themed frontend showing live service status + links to individual UIs
  • Add character profile management to dashboard — store/switch character configs with attached profile images
  • Add TTS voice preview in character editor — Kokoro preview via OpenClaw bridge with loading state, custom text, stop control
  • Merge homeai-character + homeai-desktop into unified homeai-dashboard (services, chat, characters, editor)
  • Upgrade character schema to v2 — background, dialogue_style, appearance, skills, gaze_presets (auto-migrate v1)
  • Add LLM-assisted character creation via Character MCP server (Fandom/Wikipedia lookup)
  • Add character memory system — personal (per-character) + general (shared) memories with dashboard UI
  • Add conversation history with per-conversation persistence
  • Wire character_id through full pipeline (dashboard → bridge → LLM system prompt)
  • Add TTS text cleaning — strip tags, asterisks, emojis, markdown before synthesis
  • Add per-character TTS voice routing — bridge writes state file, Wyoming server reads it
  • Add ElevenLabs TTS support in Wyoming server — cloud voice synthesis via state file routing
  • Dashboard auto-selects character's TTS engine/voice (Kokoro or ElevenLabs)
  • Deploy dashboard as Docker container or static site on Mac Mini

Phase 4 — Hardware Satellites

P6 · homeai-esp32

  • Install ESPHome in ~/homeai-esphome-env (Python 3.12 venv)
  • Write esphome/secrets.yaml (gitignored)
  • Write homeai-living-room.yaml (based on official S3-BOX-3 reference config)
  • Generate placeholder face illustrations (7 PNGs, 320×240)
  • Write setup.sh with flash/ota/logs/validate commands
  • Write deploy.sh with OTA deploy, image management, multi-unit support
  • Flash first unit via USB (living room)
  • Verify unit appears in HA device list (requires HA 2026.x for ESPHome 2025.12+ compat)
  • Assign Wyoming voice pipeline to unit in HA
  • Test full wake → STT → LLM → TTS → audio playback cycle
  • Test display states: idle → listening → thinking → replying → error
  • Verify OTA firmware update works wirelessly (deploy.sh --device OTA)
  • Flash remaining units (bedroom, kitchen)
  • Document MAC address → room name mapping

P6b · homeai-rpi (Kitchen Satellite)

  • Set up Wyoming Satellite on Raspberry Pi 5 (SELBINA) with ReSpeaker 2-Mics pHAT
  • Write setup.sh — full Pi provisioning (venv, drivers, systemd, scripts)
  • Write deploy.sh — remote deploy/manage from Mac Mini (push-wrapper, test-logs, etc.)
  • Write satellite_wrapper.py — monkey-patches fixing TTS echo, writer race, streaming timeout
  • Test multi-command voice loop without freezing

Phase 5 — Visual Layer

P7 · homeai-visual

VTube Studio Expression Bridge

  • Write vtube-bridge.py — persistent WebSocket ↔ HTTP bridge daemon (port 8002)
  • Write vtube-ctl CLI wrapper + OpenClaw skill (~/.openclaw/skills/vtube-studio/)
  • Wire expression triggers into openclaw-http-bridge.py (thinking → idle, speaking → idle)
  • Add amplitude-based lip sync to wyoming_kokoro_server.py (RMS → MouthOpen parameter)
  • Write test-expressions.py — auth flow, expression cycle, lip sync sweep, latency test
  • Write launchd plist + setup.sh for venv creation and service registration
  • Install VTube Studio from Mac App Store, enable WebSocket API (port 8001)
  • Source/purchase Live2D model, load in VTube Studio
  • Create 8 expression hotkeys, record UUIDs
  • Run setup.sh to create venv, install websockets, load launchd service
  • Run vtube-ctl auth — click Allow in VTube Studio
  • Update aria.json with real hotkey UUIDs (replace placeholders)
  • Run test-expressions.py --all — verify expressions + lip sync + latency
  • Set up VTube Studio mobile (iPhone/iPad) on Tailnet

Web Visuals (Dashboard)

  • Design PNG/GIF character visuals for web assistant (idle, thinking, speaking, etc.)
  • Integrate animated visuals into homeai-dashboard chat view
  • Sync visual state to voice pipeline events (listening, processing, responding)
  • Add expression transitions and idle animations

P8 · homeai-android

  • Build Android companion app for mobile assistant access
  • Integrate with OpenClaw bridge API (chat, TTS, STT)
  • Add character visual display
  • Push notification support via ntfy/FCM

Phase 6 — Image Generation

P9 · homeai-images (ComfyUI)

  • Clone ComfyUI to ~/ComfyUI/, install deps in venv
  • Verify MPS is detected at launch
  • Write and load launchd plist (com.homeai.comfyui.plist)
  • Download SDXL base model + Flux.1-schnell + ControlNet models
  • Test generation via ComfyUI web UI (port 8188)
  • Build and export workflow JSONs (quick, portrait, scene, upscale)
  • Write skills/comfyui SKILL.md + implementation
  • Collect character reference images for LoRA training
  • Add ComfyUI to Uptime Kuma monitors

Phase 7 — Extended Integrations & Polish

P10 · Integrations & Polish

  • Deploy Music Assistant (Docker on Pi 10.0.0.199:8095), Spotify + SMB + Chromecast
  • Write skills/music SKILL.md for OpenClaw
  • Deploy Snapcast server on Mac Mini
  • Configure Snapcast clients on ESP32 units for multi-room audio
  • Configure Authelia as 2FA layer in front of web UIs
  • Build advanced n8n workflows (calendar reminders, daily briefing v2)
  • Create iOS Shortcuts to trigger OpenClaw from iPhone widget
  • Configure ntfy/Pushover alerts in Uptime Kuma for all services
  • Automate mem0 + character config backup to Gitea (daily)
  • Train custom wake word using character's name
  • Document all service URLs, ports, and credentials in a private Gitea wiki
  • Tailscale ACL hardening — restrict which devices can reach which services
  • Stress test: reboot Mac Mini, verify all services recover in <2 minutes

Stretch Goals

Live2D / VTube Studio

  • Learn Live2D modelling toolchain (Live2D Cubism Editor)
  • Install VTube Studio (Mac App Store), enable WebSocket API on port 8001
  • Source/commission a Live2D model (nizima.com or booth.pm)
  • Create hotkeys for expression states
  • Write skills/vtube_studio SKILL.md + implementation
  • Write lipsync.py amplitude-based helper
  • Integrate lip sync into OpenClaw TTS dispatch
  • Set up VTube Studio mobile (iPhone/iPad) on Tailnet

Open Decisions

  • Confirm character name (determines wake word training)
  • mem0 backend: Chroma (simple) vs Qdrant Docker (better semantic search)?
  • Snapcast output: ESP32 built-in speakers or dedicated audio hardware per room?
  • Authelia user store: local file vs LDAP?