# CLAUDE.md — Home AI Assistant Project ## Project Overview A self-hosted, always-on personal AI assistant running on a **Mac Mini M4 Pro (64GB RAM, 1TB SSD)**. The goal is a modular, expandable system that replaces commercial smart home speakers (Google Home etc.) with a locally-run AI that has a defined personality, voice, visual representation, and full smart home integration. --- ## Hardware | Component | Spec | |---|---| | Chip | Apple M4 Pro | | CPU | 14-core | | GPU | 20-core | | Neural Engine | 16-core | | RAM | 64GB unified memory | | Storage | 1TB SSD | | Network | Gigabit Ethernet | Primary LLMs are Claude 4.5/4.6 family via Anthropic API (Haiku for quick, Sonnet for standard, Opus for creative/RP). Local Ollama models available as fallback. All other inference (STT, TTS, image gen) runs locally. --- ## Core Stack ### AI & LLM - **Claude 4.5/4.6 family** — primary LLMs via Anthropic API, tiered per prompt style: Haiku 4.5 (quick commands), Sonnet 4.6 (standard/creative), Opus 4.6 (roleplay/storytelling) - **Ollama** — local LLM runtime (fallback models: Llama 3.3 70B, Qwen 3.5 35B-A3B, Qwen 2.5 7B) - **Model keep-warm daemon** — `preload-models.sh` runs as a loop, checks every 5 min, re-pins evicted models with `keep_alive=-1`. Keeps `qwen2.5:7b` (small/fast) and `$HOMEAI_MEDIUM_MODEL` (default: `qwen3.5:35b-a3b`) always loaded in VRAM. Medium model is configurable via env var for per-persona model assignment. - **Open WebUI** — browser-based chat interface, runs as Docker container ### Image Generation - **ComfyUI** — primary image generation UI, node-based workflows - Target models: SDXL, Flux.1, ControlNet - Runs via Metal (Apple GPU API) ### Speech - **Whisper.cpp** — speech-to-text, optimised for Apple Silicon/Neural Engine - **Kokoro TTS** — fast, lightweight text-to-speech (primary, low-latency, local) - **ElevenLabs TTS** — cloud voice cloning/synthesis (per-character voice ID, routed via state file) - **Chatterbox TTS** — voice cloning engine (Apple Silicon MPS optimised) - **Qwen3-TTS** — alternative voice cloning via MLX - **openWakeWord** — always-on wake word detection ### Smart Home - **Home Assistant** — smart home control platform (Docker) - **Wyoming Protocol** — bridges Whisper STT + Kokoro/Piper TTS into Home Assistant - **Music Assistant** — self-hosted music control (Docker on Pi at 10.0.0.199:8095), Spotify + SMB library + Chromecast players - **Snapcast** — multi-room synchronised audio output ### AI Agent / Orchestration - **OpenClaw** — primary AI agent layer; receives voice commands, calls tools, manages personality - **OpenClaw Skills** — 13 skills total: home-assistant, image-generation, voice-assistant, vtube-studio, memory, service-monitor, character, routine, music, workflow, gitea, calendar, mode - **n8n** — visual workflow automation (Docker), chains AI actions - **Character Memory System** — SQLite + sqlite-vec semantic search (personal per-character + general shared + follow-ups), injected into LLM system prompt with context-aware retrieval - **Prompt Styles** — 6 styles (quick, standard, creative, roleplayer, game-master, storyteller) with per-style model routing, temperature, and section stripping. JSON templates in `homeai-agent/prompt-styles/` - **Public/Private Mode** — routes requests to local Ollama (private) or cloud LLMs (public) with per-category overrides via `active-mode.json`. Default primary model is Claude Sonnet 4.6, with per-style model tiering (Haiku/Sonnet/Opus). ### Character & Personality - **Character Schema v2** — JSON spec with background, dialogue_style, appearance, skills, gaze_presets, dream_id, gaze_character, prompt style overrides (v1 auto-migrated) - **HomeAI Dashboard** — unified web app: character editor, chat, memory manager, service dashboard - **Dream** — character management service (http://10.0.0.101:3000), REST API for character CRUD with GAZE integration for cover images - **Character MCP Server** — LLM-assisted character creation via Fandom wiki/Wikipedia lookup (Docker) - **GAZE** — image generation service (http://10.0.0.101:5782), REST API for presets, characters, and job-based image generation - Character config stored as JSON files in `~/homeai-data/characters/`, consumed by bridge for system prompt construction ### Visual Representation - **VTube Studio** — Live2D model display on desktop (macOS) and mobile (iOS/Android) - VTube Studio WebSocket API used to drive expressions from the AI pipeline - **LVGL** — simplified animated face on ESP32-S3-BOX-3 units - Live2D model: to be sourced/commissioned (nizima.com or booth.pm) ### Room Presence (Smart Speaker Replacement) - **ESP32-S3-BOX-3** units — one per room - Flashed with **ESPHome** - Acts as Wyoming Satellite (mic input → Mac Mini → TTS audio back) - LVGL display shows animated face + status info - Communicates over local WiFi ### Infrastructure - **Docker Desktop for Mac** — containerises Home Assistant, Open WebUI, n8n, etc. - **Tailscale** — secure remote access to all services, no port forwarding - **Authelia** — 2FA authentication layer for exposed web UIs - **Portainer** — Docker container management UI - **Uptime Kuma** — service health monitoring and mobile alerts - **Gitea** — self-hosted Git server for all project code and configs - **code-server** — browser-based VS Code for remote development --- ## Voice Pipeline (End-to-End) ``` ESP32-S3-BOX-3 (room) → Wake word detected (openWakeWord, runs locally on device or Mac Mini) → Audio streamed to Mac Mini via Wyoming Satellite → Whisper MLX transcribes speech to text → HA conversation agent → OpenClaw HTTP Bridge → Bridge resolves character (satellite_id → character mapping) → Bridge builds system prompt (profile + memories) and writes TTS config to state file → Bridge checks active-mode.json for model routing (private=local, public=cloud) → OpenClaw CLI → LLM generates response (Claude Haiku/Sonnet/Opus per style, Ollama fallback) → Response dispatched: → Wyoming TTS reads state file → routes to Kokoro (local) or ElevenLabs (cloud) → Audio sent back to ESP32-S3-BOX-3 (spoken response) → VTube Studio API triggered (expression + lip sync on desktop/mobile) → Home Assistant action called if applicable (lights, music, etc.) ``` ### Timeout Strategy The HTTP bridge checks Ollama `/api/ps` before each request to determine if the LLM is already loaded: | Layer | Warm (model loaded) | Cold (model loading) | |---|---|---| | HA conversation component | 200s | 200s | | OpenClaw HTTP bridge | 60s | 180s | | OpenClaw agent | 60s | 60s | The keep-warm daemon ensures models stay loaded, so cold starts should be rare (only after Ollama restarts or VRAM pressure). --- ## Character System The AI assistant has a defined personality managed via the HomeAI Dashboard (character editor + memory manager). ### Character Schema v2 Each character is a JSON file in `~/homeai-data/characters/` with: - **System prompt** — core personality, injected into every LLM request - **Profile fields** — background, appearance, dialogue_style, skills array - **TTS config** — engine (kokoro/elevenlabs), kokoro_voice, elevenlabs_voice_id, elevenlabs_model, speed - **GAZE presets** — array of `{preset, trigger}` for image generation styles - **Dream link** — `dream_id` for syncing character data from Dream service - **GAZE link** — `gaze_character` for auto-assigned cover image and presets - **Prompt style config** — `default_prompt_style`, `prompt_style_overrides` for per-style tuning - **Custom prompt rules** — trigger/response overrides for specific contexts ### Memory System SQLite + sqlite-vec database at `~/homeai-data/memories/memories.db`: - **Personal memories** — per-character, semantic/episodic/relational/opinion types - **General memories** — shared operational knowledge (character_id = "general") - **Follow-ups** — LLM-driven questions injected into system prompt, auto-resolve after 2 surfacings or 48h - **Privacy levels** — public, sensitive, local_only (local_only excluded from cloud model requests) - **Semantic search** — sentence-transformers all-MiniLM-L6-v2 embeddings (384 dims) for context-aware retrieval - Core module: `homeai-agent/memory_store.py` (imported by bridge + memory-ctl skill) ### Prompt Styles Six response styles in `homeai-agent/prompt-styles/`, each a JSON template with model, temperature, and instructions: - **quick** — Claude Haiku 4.5, low temp, brief responses, strips profile sections - **standard** — Claude Sonnet 4.6, balanced - **creative** — Claude Sonnet 4.6, higher temp, elaborative - **roleplayer** — Claude Opus 4.6, full personality injection - **game-master** — Claude Opus 4.6, narrative-focused - **storyteller** — Claude Opus 4.6, story-centric Style selection: dashboard chat has a style picker; characters can set `default_prompt_style`; satellites use the global active style. Bridge resolves model per style → group → mode → default. ### TTS Voice Routing The bridge writes the active character's TTS config to `~/homeai-data/active-tts-voice.json` before each request. The Wyoming TTS server reads this state file to determine which engine/voice to use: - **Kokoro** — local, fast, uses `kokoro_voice` field (e.g., `af_heart`) - **ElevenLabs** — cloud, uses `elevenlabs_voice_id` + `elevenlabs_model`, returns PCM 24kHz This works for both ESP32/HA pipeline and dashboard chat. --- ## Project Priorities 1. **Foundation** — Docker stack up (Home Assistant, Open WebUI, Portainer, Uptime Kuma) ✅ 2. **LLM** — Ollama running with target models, Open WebUI connected ✅ 3. **Voice pipeline** — Whisper → Ollama → Kokoro → Wyoming → Home Assistant ✅ 4. **OpenClaw** — installed, onboarded, connected to Ollama and Home Assistant ✅ 5. **ESP32-S3-BOX-3** — ESPHome flash, Wyoming Satellite, display faces ✅ 6. **Character system** — schema v2, dashboard editor, memory system, per-character TTS routing ✅ 7. **OpenClaw skills expansion** — 9 new skills (memory, monitor, character, routine, music, workflow, gitea, calendar, mode) + public/private mode routing ✅ 8. **Music Assistant** — deployed on Pi (10.0.0.199:8095), Spotify + SMB + Chromecast players ✅ 9. **Memory v2 + Prompt Styles + Dream/GAZE** — SQLite memory with semantic search, 6 prompt styles with model tiering, Dream character import, GAZE character linking ✅ 10. **Animated visual** — PNG/GIF character visual for the web assistant (initial visual layer) 11. **Android app** — companion app for mobile access to the assistant 12. **ComfyUI** — image generation online, character-consistent model workflows 13. **Extended integrations** — Snapcast, code-server 14. **Polish** — Authelia, Tailscale hardening, iOS widgets ### Stretch Goals - **Live2D / VTube Studio** — full Live2D model with WebSocket API bridge (requires learning Live2D tooling) --- ## Key Paths & Conventions - Launchd plists (source): `homeai-*/launchd/` (symlinked to `~/Library/LaunchAgents/`) - Docker compose (Mac Mini): `homeai-infra/docker/docker-compose.yml` - Docker compose (Pi/SELBINA): `~/docker/selbina/` on 10.0.0.199 - OpenClaw skills: `~/.openclaw/skills/` - OpenClaw workspace tools: `~/.openclaw/workspace/TOOLS.md` - OpenClaw config: `~/.openclaw/openclaw.json` - Character configs: `~/homeai-data/characters/` - Character memories DB: `~/homeai-data/memories/memories.db` - Memory store module: `homeai-agent/memory_store.py` - Prompt style templates: `homeai-agent/prompt-styles/` - Active prompt style: `~/homeai-data/active-prompt-style.json` - Conversation history: `~/homeai-data/conversations/` - Active TTS state: `~/homeai-data/active-tts-voice.json` - Active mode state: `~/homeai-data/active-mode.json` - Satellite → character map: `~/homeai-data/satellite-map.json` - Local routines: `~/homeai-data/routines/` - Voice reminders: `~/homeai-data/reminders.json` - Whisper models: `~/models/whisper/` - Ollama models: managed by Ollama at `~/.ollama/models/` - ComfyUI models: `~/ComfyUI/models/` - Voice reference audio: `~/voices/` - Gitea repos root: `~/gitea/` - Music Assistant (Pi): `~/docker/selbina/music-assistant/` on 10.0.0.199 - Skills user guide: `homeai-agent/SKILLS_GUIDE.md` - Dream service: `http://10.0.0.101:3000` (character management, REST API) - GAZE service: `http://10.0.0.101:5782` (image generation, REST API) --- ## Notes for Planning - All services should survive a Mac Mini reboot (launchd or Docker restart policies) - ESP32-S3-BOX-3 units are dumb satellites — all intelligence stays on Mac Mini - The character JSON schema (from Character Manager) should be treated as a versioned spec; pipeline components read from it, never hardcode personality values - OpenClaw skills are the primary extension mechanism — new capabilities = new skills - Primary LLMs are Claude 4.5/4.6 family (Anthropic API) with per-style tiering; local Ollama models are available as fallback - Launchd plists are symlinked from repo source to ~/Library/LaunchAgents/ — edit source, then bootout/bootstrap to reload - Music Assistant runs on Pi (10.0.0.199), not Mac Mini — needs host networking for Chromecast mDNS discovery - VTube Studio API bridge should be a standalone OpenClaw skill with clear event interface - Memory DB (`memories.db`) should be backed up as part of regular Gitea commits - Dream characters can be linked to GAZE characters for cover image fallback and cross-referencing - Prompt style selection hierarchy: explicit user pick → character default → global active style