Files
homeai/CLAUDE.md
Aodhan Collins 117254d560 feat: Music Assistant, Claude primary LLM, model tag in chat, setup.sh rewrite
- Deploy Music Assistant on Pi (10.0.0.199:8095) with host networking for
  Chromecast mDNS discovery, Spotify + SMB library support
- Switch primary LLM from Ollama to Claude Sonnet 4 (Anthropic API),
  local models remain as fallback
- Add model info tag under each assistant message in dashboard chat,
  persisted in conversation JSON
- Rewrite homeai-agent/setup.sh: loads .env, injects API keys into plists,
  symlinks plists to ~/Library/LaunchAgents/, smoke tests services
- Update install_service() in common.sh to use symlinks instead of copies
- Open UFW ports on Pi for Music Assistant (8095, 8097, 8927)
- Add ANTHROPIC_API_KEY to openclaw + bridge launchd plists

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-18 22:21:28 +00:00

11 KiB

CLAUDE.md — Home AI Assistant Project

Project Overview

A self-hosted, always-on personal AI assistant running on a Mac Mini M4 Pro (64GB RAM, 1TB SSD). The goal is a modular, expandable system that replaces commercial smart home speakers (Google Home etc.) with a locally-run AI that has a defined personality, voice, visual representation, and full smart home integration.


Hardware

Component Spec
Chip Apple M4 Pro
CPU 14-core
GPU 20-core
Neural Engine 16-core
RAM 64GB unified memory
Storage 1TB SSD
Network Gigabit Ethernet

Primary LLM is Claude Sonnet 4 via Anthropic API. Local Ollama models available as fallback. All other inference (STT, TTS, image gen) runs locally.


Core Stack

AI & LLM

  • Claude Sonnet 4 — primary LLM via Anthropic API (anthropic/claude-sonnet-4-20250514), used for all agent interactions
  • Ollama — local LLM runtime (fallback models: Llama 3.3 70B, Qwen 3.5 35B-A3B, Qwen 2.5 7B)
  • Model keep-warm daemonpreload-models.sh runs as a loop, checks every 5 min, re-pins evicted models with keep_alive=-1. Keeps qwen2.5:7b (small/fast) and $HOMEAI_MEDIUM_MODEL (default: qwen3.5:35b-a3b) always loaded in VRAM. Medium model is configurable via env var for per-persona model assignment.
  • Open WebUI — browser-based chat interface, runs as Docker container

Image Generation

  • ComfyUI — primary image generation UI, node-based workflows
  • Target models: SDXL, Flux.1, ControlNet
  • Runs via Metal (Apple GPU API)

Speech

  • Whisper.cpp — speech-to-text, optimised for Apple Silicon/Neural Engine
  • Kokoro TTS — fast, lightweight text-to-speech (primary, low-latency, local)
  • ElevenLabs TTS — cloud voice cloning/synthesis (per-character voice ID, routed via state file)
  • Chatterbox TTS — voice cloning engine (Apple Silicon MPS optimised)
  • Qwen3-TTS — alternative voice cloning via MLX
  • openWakeWord — always-on wake word detection

Smart Home

  • Home Assistant — smart home control platform (Docker)
  • Wyoming Protocol — bridges Whisper STT + Kokoro/Piper TTS into Home Assistant
  • Music Assistant — self-hosted music control (Docker on Pi at 10.0.0.199:8095), Spotify + SMB library + Chromecast players
  • Snapcast — multi-room synchronised audio output

AI Agent / Orchestration

  • OpenClaw — primary AI agent layer; receives voice commands, calls tools, manages personality
  • OpenClaw Skills — 13 skills total: home-assistant, image-generation, voice-assistant, vtube-studio, memory, service-monitor, character, routine, music, workflow, gitea, calendar, mode
  • n8n — visual workflow automation (Docker), chains AI actions
  • Character Memory System — two-tier JSON-based memories (personal per-character + general shared), injected into LLM system prompt with budget truncation
  • Public/Private Mode — routes requests to local Ollama (private) or cloud LLMs (public) with per-category overrides via active-mode.json. Default primary model is Claude Sonnet 4.

Character & Personality

  • Character Schema v2 — JSON spec with background, dialogue_style, appearance, skills, gaze_presets (v1 auto-migrated)
  • HomeAI Dashboard — unified web app: character editor, chat, memory manager, service dashboard
  • Character MCP Server — LLM-assisted character creation via Fandom wiki/Wikipedia lookup (Docker)
  • Character config stored as JSON files in ~/homeai-data/characters/, consumed by bridge for system prompt construction

Visual Representation

  • VTube Studio — Live2D model display on desktop (macOS) and mobile (iOS/Android)
  • VTube Studio WebSocket API used to drive expressions from the AI pipeline
  • LVGL — simplified animated face on ESP32-S3-BOX-3 units
  • Live2D model: to be sourced/commissioned (nizima.com or booth.pm)

Room Presence (Smart Speaker Replacement)

  • ESP32-S3-BOX-3 units — one per room
  • Flashed with ESPHome
  • Acts as Wyoming Satellite (mic input → Mac Mini → TTS audio back)
  • LVGL display shows animated face + status info
  • Communicates over local WiFi

Infrastructure

  • Docker Desktop for Mac — containerises Home Assistant, Open WebUI, n8n, etc.
  • Tailscale — secure remote access to all services, no port forwarding
  • Authelia — 2FA authentication layer for exposed web UIs
  • Portainer — Docker container management UI
  • Uptime Kuma — service health monitoring and mobile alerts
  • Gitea — self-hosted Git server for all project code and configs
  • code-server — browser-based VS Code for remote development

Voice Pipeline (End-to-End)

ESP32-S3-BOX-3 (room)
  → Wake word detected (openWakeWord, runs locally on device or Mac Mini)
  → Audio streamed to Mac Mini via Wyoming Satellite
  → Whisper MLX transcribes speech to text
  → HA conversation agent → OpenClaw HTTP Bridge
  → Bridge resolves character (satellite_id → character mapping)
  → Bridge builds system prompt (profile + memories) and writes TTS config to state file
  → Bridge checks active-mode.json for model routing (private=local, public=cloud)
  → OpenClaw CLI → LLM generates response (Claude Sonnet 4 default, Ollama fallback)
  → Response dispatched:
      → Wyoming TTS reads state file → routes to Kokoro (local) or ElevenLabs (cloud)
      → Audio sent back to ESP32-S3-BOX-3 (spoken response)
      → VTube Studio API triggered (expression + lip sync on desktop/mobile)
      → Home Assistant action called if applicable (lights, music, etc.)

Timeout Strategy

The HTTP bridge checks Ollama /api/ps before each request to determine if the LLM is already loaded:

Layer Warm (model loaded) Cold (model loading)
HA conversation component 200s 200s
OpenClaw HTTP bridge 60s 180s
OpenClaw agent 60s 60s

The keep-warm daemon ensures models stay loaded, so cold starts should be rare (only after Ollama restarts or VRAM pressure).


Character System

The AI assistant has a defined personality managed via the HomeAI Dashboard (character editor + memory manager).

Character Schema v2

Each character is a JSON file in ~/homeai-data/characters/ with:

  • System prompt — core personality, injected into every LLM request
  • Profile fields — background, appearance, dialogue_style, skills array
  • TTS config — engine (kokoro/elevenlabs), kokoro_voice, elevenlabs_voice_id, elevenlabs_model, speed
  • GAZE presets — array of {preset, trigger} for image generation styles
  • Custom prompt rules — trigger/response overrides for specific contexts

Memory System

Two-tier memory stored as JSON in ~/homeai-data/memories/:

  • Personal memories (personal/{character_id}.json) — per-character, about user interactions
  • General memories (general.json) — shared operational knowledge (tool usage, device info, routines)

Memories are injected into the system prompt by the bridge with budget truncation (personal: 4000 chars, general: 3000 chars, newest first).

TTS Voice Routing

The bridge writes the active character's TTS config to ~/homeai-data/active-tts-voice.json before each request. The Wyoming TTS server reads this state file to determine which engine/voice to use:

  • Kokoro — local, fast, uses kokoro_voice field (e.g., af_heart)
  • ElevenLabs — cloud, uses elevenlabs_voice_id + elevenlabs_model, returns PCM 24kHz

This works for both ESP32/HA pipeline and dashboard chat.


Project Priorities

  1. Foundation — Docker stack up (Home Assistant, Open WebUI, Portainer, Uptime Kuma)
  2. LLM — Ollama running with target models, Open WebUI connected
  3. Voice pipeline — Whisper → Ollama → Kokoro → Wyoming → Home Assistant
  4. OpenClaw — installed, onboarded, connected to Ollama and Home Assistant
  5. ESP32-S3-BOX-3 — ESPHome flash, Wyoming Satellite, display faces
  6. Character system — schema v2, dashboard editor, memory system, per-character TTS routing
  7. OpenClaw skills expansion — 9 new skills (memory, monitor, character, routine, music, workflow, gitea, calendar, mode) + public/private mode routing
  8. Music Assistant — deployed on Pi (10.0.0.199:8095), Spotify + SMB + Chromecast players
  9. Animated visual — PNG/GIF character visual for the web assistant (initial visual layer)
  10. Android app — companion app for mobile access to the assistant
  11. ComfyUI — image generation online, character-consistent model workflows
  12. Extended integrations — Snapcast, code-server
  13. Polish — Authelia, Tailscale hardening, iOS widgets

Stretch Goals

  • Live2D / VTube Studio — full Live2D model with WebSocket API bridge (requires learning Live2D tooling)

Key Paths & Conventions

  • Launchd plists (source): homeai-*/launchd/ (symlinked to ~/Library/LaunchAgents/)
  • Docker compose (Mac Mini): homeai-infra/docker/docker-compose.yml
  • Docker compose (Pi/SELBINA): ~/docker/selbina/ on 10.0.0.199
  • OpenClaw skills: ~/.openclaw/skills/
  • OpenClaw workspace tools: ~/.openclaw/workspace/TOOLS.md
  • OpenClaw config: ~/.openclaw/openclaw.json
  • Character configs: ~/homeai-data/characters/
  • Character memories: ~/homeai-data/memories/
  • Conversation history: ~/homeai-data/conversations/
  • Active TTS state: ~/homeai-data/active-tts-voice.json
  • Active mode state: ~/homeai-data/active-mode.json
  • Satellite → character map: ~/homeai-data/satellite-map.json
  • Local routines: ~/homeai-data/routines/
  • Voice reminders: ~/homeai-data/reminders.json
  • Whisper models: ~/models/whisper/
  • Ollama models: managed by Ollama at ~/.ollama/models/
  • ComfyUI models: ~/ComfyUI/models/
  • Voice reference audio: ~/voices/
  • Gitea repos root: ~/gitea/
  • Music Assistant (Pi): ~/docker/selbina/music-assistant/ on 10.0.0.199
  • Skills user guide: homeai-agent/SKILLS_GUIDE.md

Notes for Planning

  • All services should survive a Mac Mini reboot (launchd or Docker restart policies)
  • ESP32-S3-BOX-3 units are dumb satellites — all intelligence stays on Mac Mini
  • The character JSON schema (from Character Manager) should be treated as a versioned spec; pipeline components read from it, never hardcode personality values
  • OpenClaw skills are the primary extension mechanism — new capabilities = new skills
  • Primary LLM is Claude Sonnet 4 (Anthropic API); local Ollama models are available as fallback
  • Launchd plists are symlinked from repo source to ~/Library/LaunchAgents/ — edit source, then bootout/bootstrap to reload
  • Music Assistant runs on Pi (10.0.0.199), not Mac Mini — needs host networking for Chromecast mDNS discovery
  • VTube Studio API bridge should be a standalone OpenClaw skill with clear event interface
  • mem0 memory store should be backed up as part of regular Gitea commits