- Deploy Music Assistant on Pi (10.0.0.199:8095) with host networking for Chromecast mDNS discovery, Spotify + SMB library support - Switch primary LLM from Ollama to Claude Sonnet 4 (Anthropic API), local models remain as fallback - Add model info tag under each assistant message in dashboard chat, persisted in conversation JSON - Rewrite homeai-agent/setup.sh: loads .env, injects API keys into plists, symlinks plists to ~/Library/LaunchAgents/, smoke tests services - Update install_service() in common.sh to use symlinks instead of copies - Open UFW ports on Pi for Music Assistant (8095, 8097, 8927) - Add ANTHROPIC_API_KEY to openclaw + bridge launchd plists Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
12 KiB
12 KiB
HomeAI — Master TODO
Track progress across all sub-projects. See each sub-project
PLAN.mdfor detailed implementation notes. Status:[ ]pending ·[~]in progress ·[x]done
Phase 1 — Foundation
P1 · homeai-infra
- Install Docker Desktop for Mac, enable launch at login
- Create shared
homeaiDocker network - Create
~/server/docker/directory structure - Write compose files: Uptime Kuma, code-server, n8n (HA, Portainer, Gitea are pre-existing on 10.0.0.199)
docker compose up -d— bring all services up- Home Assistant onboarding — long-lived access token generated, stored in
.env - Install Tailscale, verify all services reachable on Tailnet
- Uptime Kuma: add monitors for all services, configure mobile alerts
- Verify all containers survive a cold reboot
P2 · homeai-llm
- Install Ollama natively via brew
- Write and load launchd plist (
com.homeai.ollama.plist) —/opt/homebrew/bin/ollama - Register local GGUF models via Modelfiles (no download): llama3.3:70b, qwen3:32b, codestral:22b, qwen2.5:7b
- Register additional models: EVA-LLaMA-3.33-70B, Midnight-Miqu-70B, QwQ-32B, Qwen3.5-35B, Qwen3-Coder-30B, Qwen3-VL-30B, GLM-4.6V-Flash, DeepSeek-R1-8B, gemma-3-27b
- Add qwen3.5:35b-a3b (MoE, Q8_0) — 26.7 tok/s, recommended for voice pipeline
- Write model keep-warm daemon + launchd service (pins qwen2.5:7b + $HOMEAI_MEDIUM_MODEL in VRAM, checks every 5 min)
- Deploy Open WebUI via Docker compose (port 3030)
- Verify Open WebUI connected to Ollama, all models available
- Run pipeline benchmark (homeai-voice/scripts/benchmark_pipeline.py) — STT/LLM/TTS latency profiled
- Add Ollama + Open WebUI to Uptime Kuma monitors
Phase 2 — Voice Pipeline
P3 · homeai-voice
- Install
wyoming-faster-whisper— model: faster-whisper-large-v3 (auto-downloaded) - Upgrade STT to wyoming-mlx-whisper (whisper-large-v3-turbo, MLX Metal GPU) — 20x faster (8s → 400ms)
- Install Kokoro ONNX TTS — models at
~/models/kokoro/ - Write Wyoming-Kokoro adapter server (
homeai-voice/tts/wyoming_kokoro_server.py) - Write + load launchd plists for Wyoming STT (10300) and TTS (10301)
- Install openWakeWord + pyaudio — model: hey_jarvis
- Write + load openWakeWord launchd plist (
com.homeai.wakeword) — DISABLED, replaced by Wyoming satellite - Write
wyoming/test-pipeline.sh— smoke test (3/3 passing) - Install Wyoming satellite — handles wake word via HA voice pipeline
- Install Wyoming satellite for Mac Mini (port 10700)
- Write OpenClaw conversation custom component for Home Assistant
- Connect Home Assistant Wyoming integration (STT + TTS + Satellite) — ready to configure in HA UI
- Create HA Voice Assistant pipeline with OpenClaw conversation agent — component ready, needs HA UI setup
- Test HA Assist via browser: type query → hear spoken response
- Test full voice loop: wake word → STT → OpenClaw → TTS → audio playback
- Install Chatterbox TTS (MPS build), test with sample
.wav - Install Qwen3-TTS via MLX (fallback)
- Train custom wake word using character name
- Add Wyoming STT/TTS to Uptime Kuma monitors
Phase 3 — Agent & Character
P4 · homeai-agent
- Install OpenClaw (npm global, v2026.3.2)
- Configure Ollama provider (native API,
http://localhost:11434) - Write + load launchd plist (
com.homeai.openclaw) — gateway on port 8080 - Fix context window: set
contextWindow=32768for llama3.3:70b inopenclaw.json - Fix Llama 3.3 Modelfile: add tool-calling TEMPLATE block
- Verify
openclaw agent --message "..." --agent main→ completed - Write
skills/home-assistantSKILL.md — HA REST API control via ha-ctl CLI - Write
skills/voice-assistantSKILL.md — voice response style guide - Wire HASS_TOKEN — create
~/.homeai/hass_tokenor set env in launchd plist - Fix HA tool calling: set commands.native=true, symlink ha-ctl to PATH, update TOOLS.md
- Test home-assistant skill: "turn on/off the reading lamp" — verified exec→ha-ctl→HA action
- Set up mem0 with Chroma backend, test semantic recall
- Write memory backup launchd job
- Build morning briefing n8n workflow
- Build notification router n8n workflow
- Verify full voice → agent → HA action flow
- Add OpenClaw to Uptime Kuma monitors (Manual user action required)
P5 · homeai-dashboard (character system + dashboard)
- Define and write
schema/character.schema.json(v1) - Write
characters/aria.json— default character - Set up Vite project in
src/, install deps - Integrate existing
character-manager.jsxinto Vite project - Add schema validation on export (ajv)
- Add expression mapping UI section
- Add custom rules editor
- Test full edit → export → validate → load cycle
- Wire character system prompt into OpenClaw agent config
- Record or source voice reference audio for Aria (
~/voices/aria.wav) - Pre-process audio with ffmpeg, test with Chatterbox
- Update
aria.jsonwith voice clone path if quality is good - Build unified HomeAI dashboard — dark-themed frontend showing live service status + links to individual UIs
- Add character profile management to dashboard — store/switch character configs with attached profile images
- Add TTS voice preview in character editor — Kokoro preview via OpenClaw bridge with loading state, custom text, stop control
- Merge homeai-character + homeai-desktop into unified homeai-dashboard (services, chat, characters, editor)
- Upgrade character schema to v2 — background, dialogue_style, appearance, skills, gaze_presets (auto-migrate v1)
- Add LLM-assisted character creation via Character MCP server (Fandom/Wikipedia lookup)
- Add character memory system — personal (per-character) + general (shared) memories with dashboard UI
- Add conversation history with per-conversation persistence
- Wire character_id through full pipeline (dashboard → bridge → LLM system prompt)
- Add TTS text cleaning — strip tags, asterisks, emojis, markdown before synthesis
- Add per-character TTS voice routing — bridge writes state file, Wyoming server reads it
- Add ElevenLabs TTS support in Wyoming server — cloud voice synthesis via state file routing
- Dashboard auto-selects character's TTS engine/voice (Kokoro or ElevenLabs)
- Deploy dashboard as Docker container or static site on Mac Mini
Phase 4 — Hardware Satellites
P6 · homeai-esp32
- Install ESPHome in
~/homeai-esphome-env(Python 3.12 venv) - Write
esphome/secrets.yaml(gitignored) - Write
homeai-living-room.yaml(based on official S3-BOX-3 reference config) - Generate placeholder face illustrations (7 PNGs, 320×240)
- Write
setup.shwith flash/ota/logs/validate commands - Write
deploy.shwith OTA deploy, image management, multi-unit support - Flash first unit via USB (living room)
- Verify unit appears in HA device list (requires HA 2026.x for ESPHome 2025.12+ compat)
- Assign Wyoming voice pipeline to unit in HA
- Test full wake → STT → LLM → TTS → audio playback cycle
- Test display states: idle → listening → thinking → replying → error
- Verify OTA firmware update works wirelessly (
deploy.sh --device OTA) - Flash remaining units (bedroom, kitchen)
- Document MAC address → room name mapping
P6b · homeai-rpi (Kitchen Satellite)
- Set up Wyoming Satellite on Raspberry Pi 5 (SELBINA) with ReSpeaker 2-Mics pHAT
- Write setup.sh — full Pi provisioning (venv, drivers, systemd, scripts)
- Write deploy.sh — remote deploy/manage from Mac Mini (push-wrapper, test-logs, etc.)
- Write satellite_wrapper.py — monkey-patches fixing TTS echo, writer race, streaming timeout
- Test multi-command voice loop without freezing
Phase 5 — Visual Layer
P7 · homeai-visual
VTube Studio Expression Bridge
- Write
vtube-bridge.py— persistent WebSocket ↔ HTTP bridge daemon (port 8002) - Write
vtube-ctlCLI wrapper + OpenClaw skill (~/.openclaw/skills/vtube-studio/) - Wire expression triggers into
openclaw-http-bridge.py(thinking → idle, speaking → idle) - Add amplitude-based lip sync to
wyoming_kokoro_server.py(RMS → MouthOpen parameter) - Write
test-expressions.py— auth flow, expression cycle, lip sync sweep, latency test - Write launchd plist + setup.sh for venv creation and service registration
- Install VTube Studio from Mac App Store, enable WebSocket API (port 8001)
- Source/purchase Live2D model, load in VTube Studio
- Create 8 expression hotkeys, record UUIDs
- Run
setup.shto create venv, install websockets, load launchd service - Run
vtube-ctl auth— click Allow in VTube Studio - Update
aria.jsonwith real hotkey UUIDs (replace placeholders) - Run
test-expressions.py --all— verify expressions + lip sync + latency - Set up VTube Studio mobile (iPhone/iPad) on Tailnet
Web Visuals (Dashboard)
- Design PNG/GIF character visuals for web assistant (idle, thinking, speaking, etc.)
- Integrate animated visuals into homeai-dashboard chat view
- Sync visual state to voice pipeline events (listening, processing, responding)
- Add expression transitions and idle animations
P8 · homeai-android
- Build Android companion app for mobile assistant access
- Integrate with OpenClaw bridge API (chat, TTS, STT)
- Add character visual display
- Push notification support via ntfy/FCM
Phase 6 — Image Generation
P9 · homeai-images (ComfyUI)
- Clone ComfyUI to
~/ComfyUI/, install deps in venv - Verify MPS is detected at launch
- Write and load launchd plist (
com.homeai.comfyui.plist) - Download SDXL base model + Flux.1-schnell + ControlNet models
- Test generation via ComfyUI web UI (port 8188)
- Build and export workflow JSONs (quick, portrait, scene, upscale)
- Write
skills/comfyuiSKILL.md + implementation - Collect character reference images for LoRA training
- Add ComfyUI to Uptime Kuma monitors
Phase 7 — Extended Integrations & Polish
P10 · Integrations & Polish
- Deploy Music Assistant (Docker on Pi 10.0.0.199:8095), Spotify + SMB + Chromecast
- Write
skills/musicSKILL.md for OpenClaw - Deploy Snapcast server on Mac Mini
- Configure Snapcast clients on ESP32 units for multi-room audio
- Configure Authelia as 2FA layer in front of web UIs
- Build advanced n8n workflows (calendar reminders, daily briefing v2)
- Create iOS Shortcuts to trigger OpenClaw from iPhone widget
- Configure ntfy/Pushover alerts in Uptime Kuma for all services
- Automate mem0 + character config backup to Gitea (daily)
- Train custom wake word using character's name
- Document all service URLs, ports, and credentials in a private Gitea wiki
- Tailscale ACL hardening — restrict which devices can reach which services
- Stress test: reboot Mac Mini, verify all services recover in <2 minutes
Stretch Goals
Live2D / VTube Studio
- Learn Live2D modelling toolchain (Live2D Cubism Editor)
- Install VTube Studio (Mac App Store), enable WebSocket API on port 8001
- Source/commission a Live2D model (nizima.com or booth.pm)
- Create hotkeys for expression states
- Write
skills/vtube_studioSKILL.md + implementation - Write
lipsync.pyamplitude-based helper - Integrate lip sync into OpenClaw TTS dispatch
- Set up VTube Studio mobile (iPhone/iPad) on Tailnet
Open Decisions
- Confirm character name (determines wake word training)
- mem0 backend: Chroma (simple) vs Qdrant Docker (better semantic search)?
- Snapcast output: ESP32 built-in speakers or dedicated audio hardware per room?
- Authelia user store: local file vs LDAP?