Files

Aodhan Collins 117254d560 feat: Music Assistant, Claude primary LLM, model tag in chat, setup.sh rewrite

- Deploy Music Assistant on Pi (10.0.0.199:8095) with host networking for
  Chromecast mDNS discovery, Spotify + SMB library support
- Switch primary LLM from Ollama to Claude Sonnet 4 (Anthropic API),
  local models remain as fallback
- Add model info tag under each assistant message in dashboard chat,
  persisted in conversation JSON
- Rewrite homeai-agent/setup.sh: loads .env, injects API keys into plists,
  symlinks plists to ~/Library/LaunchAgents/, smoke tests services
- Update install_service() in common.sh to use symlinks instead of copies
- Open UFW ports on Pi for Music Assistant (8095, 8097, 8927)
- Add ANTHROPIC_API_KEY to openclaw + bridge launchd plists

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-03-18 22:21:28 +00:00

12 KiB

Raw Blame History

HomeAI — Master TODO

Track progress across all sub-projects. See each sub-project PLAN.md for detailed implementation notes. Status: [ ] pending · [~] in progress · [x] done

Phase 1 — Foundation

P1 · homeai-infra

Install Docker Desktop for Mac, enable launch at login
Create shared homeai Docker network
Create ~/server/docker/ directory structure
Write compose files: Uptime Kuma, code-server, n8n (HA, Portainer, Gitea are pre-existing on 10.0.0.199)
docker compose up -d — bring all services up
Home Assistant onboarding — long-lived access token generated, stored in .env
Install Tailscale, verify all services reachable on Tailnet
Uptime Kuma: add monitors for all services, configure mobile alerts
Verify all containers survive a cold reboot

P2 · homeai-llm

Install Ollama natively via brew
Write and load launchd plist (com.homeai.ollama.plist) — /opt/homebrew/bin/ollama
Register local GGUF models via Modelfiles (no download): llama3.3:70b, qwen3:32b, codestral:22b, qwen2.5:7b
Register additional models: EVA-LLaMA-3.33-70B, Midnight-Miqu-70B, QwQ-32B, Qwen3.5-35B, Qwen3-Coder-30B, Qwen3-VL-30B, GLM-4.6V-Flash, DeepSeek-R1-8B, gemma-3-27b
Add qwen3.5:35b-a3b (MoE, Q8_0) — 26.7 tok/s, recommended for voice pipeline
Write model keep-warm daemon + launchd service (pins qwen2.5:7b + $HOMEAI_MEDIUM_MODEL in VRAM, checks every 5 min)
Deploy Open WebUI via Docker compose (port 3030)
Verify Open WebUI connected to Ollama, all models available
Run pipeline benchmark (homeai-voice/scripts/benchmark_pipeline.py) — STT/LLM/TTS latency profiled
Add Ollama + Open WebUI to Uptime Kuma monitors

Phase 2 — Voice Pipeline

P3 · homeai-voice

Install wyoming-faster-whisper — model: faster-whisper-large-v3 (auto-downloaded)
Upgrade STT to wyoming-mlx-whisper (whisper-large-v3-turbo, MLX Metal GPU) — 20x faster (8s → 400ms)
Install Kokoro ONNX TTS — models at ~/models/kokoro/
Write Wyoming-Kokoro adapter server (homeai-voice/tts/wyoming_kokoro_server.py)
Write + load launchd plists for Wyoming STT (10300) and TTS (10301)
Install openWakeWord + pyaudio — model: hey_jarvis
Write + load openWakeWord launchd plist (com.homeai.wakeword) — DISABLED, replaced by Wyoming satellite
Write wyoming/test-pipeline.sh — smoke test (3/3 passing)
Install Wyoming satellite — handles wake word via HA voice pipeline
Install Wyoming satellite for Mac Mini (port 10700)
Write OpenClaw conversation custom component for Home Assistant
Connect Home Assistant Wyoming integration (STT + TTS + Satellite) — ready to configure in HA UI
Create HA Voice Assistant pipeline with OpenClaw conversation agent — component ready, needs HA UI setup
Test HA Assist via browser: type query → hear spoken response
Test full voice loop: wake word → STT → OpenClaw → TTS → audio playback
Install Chatterbox TTS (MPS build), test with sample .wav
Install Qwen3-TTS via MLX (fallback)
Train custom wake word using character name
Add Wyoming STT/TTS to Uptime Kuma monitors

Phase 3 — Agent & Character

P4 · homeai-agent

Install OpenClaw (npm global, v2026.3.2)
Configure Ollama provider (native API, http://localhost:11434)
Write + load launchd plist (com.homeai.openclaw) — gateway on port 8080
Fix context window: set contextWindow=32768 for llama3.3:70b in openclaw.json
Fix Llama 3.3 Modelfile: add tool-calling TEMPLATE block
Verify openclaw agent --message "..." --agent main → completed
Write skills/home-assistant SKILL.md — HA REST API control via ha-ctl CLI
Write skills/voice-assistant SKILL.md — voice response style guide
Wire HASS_TOKEN — create ~/.homeai/hass_token or set env in launchd plist
Fix HA tool calling: set commands.native=true, symlink ha-ctl to PATH, update TOOLS.md
Test home-assistant skill: "turn on/off the reading lamp" — verified exec→ha-ctl→HA action
Set up mem0 with Chroma backend, test semantic recall
Write memory backup launchd job
Build morning briefing n8n workflow
Build notification router n8n workflow
Verify full voice → agent → HA action flow
Add OpenClaw to Uptime Kuma monitors (Manual user action required)

P5 · homeai-dashboard (character system + dashboard)

Define and write schema/character.schema.json (v1)
Write characters/aria.json — default character
Set up Vite project in src/, install deps
Integrate existing character-manager.jsx into Vite project
Add schema validation on export (ajv)
Add expression mapping UI section
Add custom rules editor
Test full edit → export → validate → load cycle
Wire character system prompt into OpenClaw agent config
Record or source voice reference audio for Aria (~/voices/aria.wav)
Pre-process audio with ffmpeg, test with Chatterbox
Update aria.json with voice clone path if quality is good
Build unified HomeAI dashboard — dark-themed frontend showing live service status + links to individual UIs
Add character profile management to dashboard — store/switch character configs with attached profile images
Add TTS voice preview in character editor — Kokoro preview via OpenClaw bridge with loading state, custom text, stop control
Merge homeai-character + homeai-desktop into unified homeai-dashboard (services, chat, characters, editor)
Upgrade character schema to v2 — background, dialogue_style, appearance, skills, gaze_presets (auto-migrate v1)
Add LLM-assisted character creation via Character MCP server (Fandom/Wikipedia lookup)
Add character memory system — personal (per-character) + general (shared) memories with dashboard UI
Add conversation history with per-conversation persistence
Wire character_id through full pipeline (dashboard → bridge → LLM system prompt)
Add TTS text cleaning — strip tags, asterisks, emojis, markdown before synthesis
Add per-character TTS voice routing — bridge writes state file, Wyoming server reads it
Add ElevenLabs TTS support in Wyoming server — cloud voice synthesis via state file routing
Dashboard auto-selects character's TTS engine/voice (Kokoro or ElevenLabs)
Deploy dashboard as Docker container or static site on Mac Mini

Phase 4 — Hardware Satellites

P6 · homeai-esp32

Install ESPHome in ~/homeai-esphome-env (Python 3.12 venv)
Write esphome/secrets.yaml (gitignored)
Write homeai-living-room.yaml (based on official S3-BOX-3 reference config)
Generate placeholder face illustrations (7 PNGs, 320×240)
Write setup.sh with flash/ota/logs/validate commands
Write deploy.sh with OTA deploy, image management, multi-unit support
Flash first unit via USB (living room)
Verify unit appears in HA device list (requires HA 2026.x for ESPHome 2025.12+ compat)
Assign Wyoming voice pipeline to unit in HA
Test full wake → STT → LLM → TTS → audio playback cycle
Test display states: idle → listening → thinking → replying → error
Verify OTA firmware update works wirelessly (deploy.sh --device OTA)
Flash remaining units (bedroom, kitchen)
Document MAC address → room name mapping

P6b · homeai-rpi (Kitchen Satellite)

Set up Wyoming Satellite on Raspberry Pi 5 (SELBINA) with ReSpeaker 2-Mics pHAT
Write setup.sh — full Pi provisioning (venv, drivers, systemd, scripts)
Write deploy.sh — remote deploy/manage from Mac Mini (push-wrapper, test-logs, etc.)
Write satellite_wrapper.py — monkey-patches fixing TTS echo, writer race, streaming timeout
Test multi-command voice loop without freezing

Phase 5 — Visual Layer

P7 · homeai-visual

VTube Studio Expression Bridge

Write vtube-bridge.py — persistent WebSocket ↔ HTTP bridge daemon (port 8002)
Write vtube-ctl CLI wrapper + OpenClaw skill (~/.openclaw/skills/vtube-studio/)
Wire expression triggers into openclaw-http-bridge.py (thinking → idle, speaking → idle)
Add amplitude-based lip sync to wyoming_kokoro_server.py (RMS → MouthOpen parameter)
Write test-expressions.py — auth flow, expression cycle, lip sync sweep, latency test
Write launchd plist + setup.sh for venv creation and service registration
Install VTube Studio from Mac App Store, enable WebSocket API (port 8001)
Source/purchase Live2D model, load in VTube Studio
Create 8 expression hotkeys, record UUIDs
Run setup.sh to create venv, install websockets, load launchd service
Run vtube-ctl auth — click Allow in VTube Studio
Update aria.json with real hotkey UUIDs (replace placeholders)
Run test-expressions.py --all — verify expressions + lip sync + latency
Set up VTube Studio mobile (iPhone/iPad) on Tailnet

Web Visuals (Dashboard)

Design PNG/GIF character visuals for web assistant (idle, thinking, speaking, etc.)
Integrate animated visuals into homeai-dashboard chat view
Sync visual state to voice pipeline events (listening, processing, responding)
Add expression transitions and idle animations

P8 · homeai-android

Build Android companion app for mobile assistant access
Integrate with OpenClaw bridge API (chat, TTS, STT)
Add character visual display
Push notification support via ntfy/FCM

Phase 6 — Image Generation

P9 · homeai-images (ComfyUI)

Clone ComfyUI to ~/ComfyUI/, install deps in venv
Verify MPS is detected at launch
Write and load launchd plist (com.homeai.comfyui.plist)
Download SDXL base model + Flux.1-schnell + ControlNet models
Test generation via ComfyUI web UI (port 8188)
Build and export workflow JSONs (quick, portrait, scene, upscale)
Write skills/comfyui SKILL.md + implementation
Collect character reference images for LoRA training
Add ComfyUI to Uptime Kuma monitors

Phase 7 — Extended Integrations & Polish

P10 · Integrations & Polish

Deploy Music Assistant (Docker on Pi 10.0.0.199:8095), Spotify + SMB + Chromecast
Write skills/music SKILL.md for OpenClaw
Deploy Snapcast server on Mac Mini
Configure Snapcast clients on ESP32 units for multi-room audio
Configure Authelia as 2FA layer in front of web UIs
Build advanced n8n workflows (calendar reminders, daily briefing v2)
Create iOS Shortcuts to trigger OpenClaw from iPhone widget
Configure ntfy/Pushover alerts in Uptime Kuma for all services
Automate mem0 + character config backup to Gitea (daily)
Train custom wake word using character's name
Document all service URLs, ports, and credentials in a private Gitea wiki
Tailscale ACL hardening — restrict which devices can reach which services
Stress test: reboot Mac Mini, verify all services recover in <2 minutes

Stretch Goals

Live2D / VTube Studio

Learn Live2D modelling toolchain (Live2D Cubism Editor)
Install VTube Studio (Mac App Store), enable WebSocket API on port 8001
Source/commission a Live2D model (nizima.com or booth.pm)
Create hotkeys for expression states
Write skills/vtube_studio SKILL.md + implementation
Write lipsync.py amplitude-based helper
Integrate lip sync into OpenClaw TTS dispatch
Set up VTube Studio mobile (iPhone/iPad) on Tailnet

Open Decisions

Confirm character name (determines wake word training)
mem0 backend: Chroma (simple) vs Qdrant Docker (better semantic search)?
Snapcast output: ESP32 built-in speakers or dedicated audio hardware per room?
Authelia user store: local file vs LDAP?

12 KiB Raw Blame History Unescape Escape

HomeAI — Master TODO

Phase 1 — Foundation

P1 · homeai-infra

P2 · homeai-llm

Phase 2 — Voice Pipeline

P3 · homeai-voice

Phase 3 — Agent & Character

P4 · homeai-agent

P5 · homeai-dashboard (character system + dashboard)

Phase 4 — Hardware Satellites

P6 · homeai-esp32

P6b · homeai-rpi (Kitchen Satellite)

Phase 5 — Visual Layer

P7 · homeai-visual

VTube Studio Expression Bridge

Web Visuals (Dashboard)

P8 · homeai-android

Phase 6 — Image Generation

P9 · homeai-images (ComfyUI)

Phase 7 — Extended Integrations & Polish

P10 · Integrations & Polish

Stretch Goals

Live2D / VTube Studio

Open Decisions

12 KiB

Raw Blame History