SQLite + sqlite-vec replaces JSON memory files with semantic search, follow-up injection, privacy levels, and lifecycle management. Six prompt styles (quick/standard/creative/roleplayer/game-master/storyteller) with per-style Claude model tiering (Haiku/Sonnet/Opus), temperature control, and section stripping. Characters can set default style and per-style overrides. Dream character import and GAZE character linking in the dashboard editor with auto-populated fields, cover image resolution, and preset assignment. Bridge: session isolation (conversation_id / 12h satellite buckets), model routing refactor, PUT/DELETE support, memory REST endpoints. Dashboard: mobile-responsive sidebar, retry button, style picker in chat, follow-up banner, memory lifecycle/privacy UI, cloud model options in editor. Wyoming TTS: upgraded to v1.8.0 for HA 1.7.2 compatibility. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
13 KiB
CLAUDE.md — Home AI Assistant Project
Project Overview
A self-hosted, always-on personal AI assistant running on a Mac Mini M4 Pro (64GB RAM, 1TB SSD). The goal is a modular, expandable system that replaces commercial smart home speakers (Google Home etc.) with a locally-run AI that has a defined personality, voice, visual representation, and full smart home integration.
Hardware
| Component | Spec |
|---|---|
| Chip | Apple M4 Pro |
| CPU | 14-core |
| GPU | 20-core |
| Neural Engine | 16-core |
| RAM | 64GB unified memory |
| Storage | 1TB SSD |
| Network | Gigabit Ethernet |
Primary LLMs are Claude 4.5/4.6 family via Anthropic API (Haiku for quick, Sonnet for standard, Opus for creative/RP). Local Ollama models available as fallback. All other inference (STT, TTS, image gen) runs locally.
Core Stack
AI & LLM
- Claude 4.5/4.6 family — primary LLMs via Anthropic API, tiered per prompt style: Haiku 4.5 (quick commands), Sonnet 4.6 (standard/creative), Opus 4.6 (roleplay/storytelling)
- Ollama — local LLM runtime (fallback models: Llama 3.3 70B, Qwen 3.5 35B-A3B, Qwen 2.5 7B)
- Model keep-warm daemon —
preload-models.shruns as a loop, checks every 5 min, re-pins evicted models withkeep_alive=-1. Keepsqwen2.5:7b(small/fast) and$HOMEAI_MEDIUM_MODEL(default:qwen3.5:35b-a3b) always loaded in VRAM. Medium model is configurable via env var for per-persona model assignment. - Open WebUI — browser-based chat interface, runs as Docker container
Image Generation
- ComfyUI — primary image generation UI, node-based workflows
- Target models: SDXL, Flux.1, ControlNet
- Runs via Metal (Apple GPU API)
Speech
- Whisper.cpp — speech-to-text, optimised for Apple Silicon/Neural Engine
- Kokoro TTS — fast, lightweight text-to-speech (primary, low-latency, local)
- ElevenLabs TTS — cloud voice cloning/synthesis (per-character voice ID, routed via state file)
- Chatterbox TTS — voice cloning engine (Apple Silicon MPS optimised)
- Qwen3-TTS — alternative voice cloning via MLX
- openWakeWord — always-on wake word detection
Smart Home
- Home Assistant — smart home control platform (Docker)
- Wyoming Protocol — bridges Whisper STT + Kokoro/Piper TTS into Home Assistant
- Music Assistant — self-hosted music control (Docker on Pi at 10.0.0.199:8095), Spotify + SMB library + Chromecast players
- Snapcast — multi-room synchronised audio output
AI Agent / Orchestration
- OpenClaw — primary AI agent layer; receives voice commands, calls tools, manages personality
- OpenClaw Skills — 13 skills total: home-assistant, image-generation, voice-assistant, vtube-studio, memory, service-monitor, character, routine, music, workflow, gitea, calendar, mode
- n8n — visual workflow automation (Docker), chains AI actions
- Character Memory System — SQLite + sqlite-vec semantic search (personal per-character + general shared + follow-ups), injected into LLM system prompt with context-aware retrieval
- Prompt Styles — 6 styles (quick, standard, creative, roleplayer, game-master, storyteller) with per-style model routing, temperature, and section stripping. JSON templates in
homeai-agent/prompt-styles/ - Public/Private Mode — routes requests to local Ollama (private) or cloud LLMs (public) with per-category overrides via
active-mode.json. Default primary model is Claude Sonnet 4.6, with per-style model tiering (Haiku/Sonnet/Opus).
Character & Personality
- Character Schema v2 — JSON spec with background, dialogue_style, appearance, skills, gaze_presets, dream_id, gaze_character, prompt style overrides (v1 auto-migrated)
- HomeAI Dashboard — unified web app: character editor, chat, memory manager, service dashboard
- Dream — character management service (http://10.0.0.101:3000), REST API for character CRUD with GAZE integration for cover images
- Character MCP Server — LLM-assisted character creation via Fandom wiki/Wikipedia lookup (Docker)
- GAZE — image generation service (http://10.0.0.101:5782), REST API for presets, characters, and job-based image generation
- Character config stored as JSON files in
~/homeai-data/characters/, consumed by bridge for system prompt construction
Visual Representation
- VTube Studio — Live2D model display on desktop (macOS) and mobile (iOS/Android)
- VTube Studio WebSocket API used to drive expressions from the AI pipeline
- LVGL — simplified animated face on ESP32-S3-BOX-3 units
- Live2D model: to be sourced/commissioned (nizima.com or booth.pm)
Room Presence (Smart Speaker Replacement)
- ESP32-S3-BOX-3 units — one per room
- Flashed with ESPHome
- Acts as Wyoming Satellite (mic input → Mac Mini → TTS audio back)
- LVGL display shows animated face + status info
- Communicates over local WiFi
Infrastructure
- Docker Desktop for Mac — containerises Home Assistant, Open WebUI, n8n, etc.
- Tailscale — secure remote access to all services, no port forwarding
- Authelia — 2FA authentication layer for exposed web UIs
- Portainer — Docker container management UI
- Uptime Kuma — service health monitoring and mobile alerts
- Gitea — self-hosted Git server for all project code and configs
- code-server — browser-based VS Code for remote development
Voice Pipeline (End-to-End)
ESP32-S3-BOX-3 (room)
→ Wake word detected (openWakeWord, runs locally on device or Mac Mini)
→ Audio streamed to Mac Mini via Wyoming Satellite
→ Whisper MLX transcribes speech to text
→ HA conversation agent → OpenClaw HTTP Bridge
→ Bridge resolves character (satellite_id → character mapping)
→ Bridge builds system prompt (profile + memories) and writes TTS config to state file
→ Bridge checks active-mode.json for model routing (private=local, public=cloud)
→ OpenClaw CLI → LLM generates response (Claude Haiku/Sonnet/Opus per style, Ollama fallback)
→ Response dispatched:
→ Wyoming TTS reads state file → routes to Kokoro (local) or ElevenLabs (cloud)
→ Audio sent back to ESP32-S3-BOX-3 (spoken response)
→ VTube Studio API triggered (expression + lip sync on desktop/mobile)
→ Home Assistant action called if applicable (lights, music, etc.)
Timeout Strategy
The HTTP bridge checks Ollama /api/ps before each request to determine if the LLM is already loaded:
| Layer | Warm (model loaded) | Cold (model loading) |
|---|---|---|
| HA conversation component | 200s | 200s |
| OpenClaw HTTP bridge | 60s | 180s |
| OpenClaw agent | 60s | 60s |
The keep-warm daemon ensures models stay loaded, so cold starts should be rare (only after Ollama restarts or VRAM pressure).
Character System
The AI assistant has a defined personality managed via the HomeAI Dashboard (character editor + memory manager).
Character Schema v2
Each character is a JSON file in ~/homeai-data/characters/ with:
- System prompt — core personality, injected into every LLM request
- Profile fields — background, appearance, dialogue_style, skills array
- TTS config — engine (kokoro/elevenlabs), kokoro_voice, elevenlabs_voice_id, elevenlabs_model, speed
- GAZE presets — array of
{preset, trigger}for image generation styles - Dream link —
dream_idfor syncing character data from Dream service - GAZE link —
gaze_characterfor auto-assigned cover image and presets - Prompt style config —
default_prompt_style,prompt_style_overridesfor per-style tuning - Custom prompt rules — trigger/response overrides for specific contexts
Memory System
SQLite + sqlite-vec database at ~/homeai-data/memories/memories.db:
- Personal memories — per-character, semantic/episodic/relational/opinion types
- General memories — shared operational knowledge (character_id = "general")
- Follow-ups — LLM-driven questions injected into system prompt, auto-resolve after 2 surfacings or 48h
- Privacy levels — public, sensitive, local_only (local_only excluded from cloud model requests)
- Semantic search — sentence-transformers all-MiniLM-L6-v2 embeddings (384 dims) for context-aware retrieval
- Core module:
homeai-agent/memory_store.py(imported by bridge + memory-ctl skill)
Prompt Styles
Six response styles in homeai-agent/prompt-styles/, each a JSON template with model, temperature, and instructions:
- quick — Claude Haiku 4.5, low temp, brief responses, strips profile sections
- standard — Claude Sonnet 4.6, balanced
- creative — Claude Sonnet 4.6, higher temp, elaborative
- roleplayer — Claude Opus 4.6, full personality injection
- game-master — Claude Opus 4.6, narrative-focused
- storyteller — Claude Opus 4.6, story-centric
Style selection: dashboard chat has a style picker; characters can set default_prompt_style; satellites use the global active style. Bridge resolves model per style → group → mode → default.
TTS Voice Routing
The bridge writes the active character's TTS config to ~/homeai-data/active-tts-voice.json before each request. The Wyoming TTS server reads this state file to determine which engine/voice to use:
- Kokoro — local, fast, uses
kokoro_voicefield (e.g.,af_heart) - ElevenLabs — cloud, uses
elevenlabs_voice_id+elevenlabs_model, returns PCM 24kHz
This works for both ESP32/HA pipeline and dashboard chat.
Project Priorities
- Foundation — Docker stack up (Home Assistant, Open WebUI, Portainer, Uptime Kuma) ✅
- LLM — Ollama running with target models, Open WebUI connected ✅
- Voice pipeline — Whisper → Ollama → Kokoro → Wyoming → Home Assistant ✅
- OpenClaw — installed, onboarded, connected to Ollama and Home Assistant ✅
- ESP32-S3-BOX-3 — ESPHome flash, Wyoming Satellite, display faces ✅
- Character system — schema v2, dashboard editor, memory system, per-character TTS routing ✅
- OpenClaw skills expansion — 9 new skills (memory, monitor, character, routine, music, workflow, gitea, calendar, mode) + public/private mode routing ✅
- Music Assistant — deployed on Pi (10.0.0.199:8095), Spotify + SMB + Chromecast players ✅
- Memory v2 + Prompt Styles + Dream/GAZE — SQLite memory with semantic search, 6 prompt styles with model tiering, Dream character import, GAZE character linking ✅
- Animated visual — PNG/GIF character visual for the web assistant (initial visual layer)
- Android app — companion app for mobile access to the assistant
- ComfyUI — image generation online, character-consistent model workflows
- Extended integrations — Snapcast, code-server
- Polish — Authelia, Tailscale hardening, iOS widgets
Stretch Goals
- Live2D / VTube Studio — full Live2D model with WebSocket API bridge (requires learning Live2D tooling)
Key Paths & Conventions
- Launchd plists (source):
homeai-*/launchd/(symlinked to~/Library/LaunchAgents/) - Docker compose (Mac Mini):
homeai-infra/docker/docker-compose.yml - Docker compose (Pi/SELBINA):
~/docker/selbina/on 10.0.0.199 - OpenClaw skills:
~/.openclaw/skills/ - OpenClaw workspace tools:
~/.openclaw/workspace/TOOLS.md - OpenClaw config:
~/.openclaw/openclaw.json - Character configs:
~/homeai-data/characters/ - Character memories DB:
~/homeai-data/memories/memories.db - Memory store module:
homeai-agent/memory_store.py - Prompt style templates:
homeai-agent/prompt-styles/ - Active prompt style:
~/homeai-data/active-prompt-style.json - Conversation history:
~/homeai-data/conversations/ - Active TTS state:
~/homeai-data/active-tts-voice.json - Active mode state:
~/homeai-data/active-mode.json - Satellite → character map:
~/homeai-data/satellite-map.json - Local routines:
~/homeai-data/routines/ - Voice reminders:
~/homeai-data/reminders.json - Whisper models:
~/models/whisper/ - Ollama models: managed by Ollama at
~/.ollama/models/ - ComfyUI models:
~/ComfyUI/models/ - Voice reference audio:
~/voices/ - Gitea repos root:
~/gitea/ - Music Assistant (Pi):
~/docker/selbina/music-assistant/on 10.0.0.199 - Skills user guide:
homeai-agent/SKILLS_GUIDE.md - Dream service:
http://10.0.0.101:3000(character management, REST API) - GAZE service:
http://10.0.0.101:5782(image generation, REST API)
Notes for Planning
- All services should survive a Mac Mini reboot (launchd or Docker restart policies)
- ESP32-S3-BOX-3 units are dumb satellites — all intelligence stays on Mac Mini
- The character JSON schema (from Character Manager) should be treated as a versioned spec; pipeline components read from it, never hardcode personality values
- OpenClaw skills are the primary extension mechanism — new capabilities = new skills
- Primary LLMs are Claude 4.5/4.6 family (Anthropic API) with per-style tiering; local Ollama models are available as fallback
- Launchd plists are symlinked from repo source to ~/Library/LaunchAgents/ — edit source, then bootout/bootstrap to reload
- Music Assistant runs on Pi (10.0.0.199), not Mac Mini — needs host networking for Chromecast mDNS discovery
- VTube Studio API bridge should be a standalone OpenClaw skill with clear event interface
- Memory DB (
memories.db) should be backed up as part of regular Gitea commits - Dream characters can be linked to GAZE characters for cover image fallback and cross-referencing
- Prompt style selection hierarchy: explicit user pick → character default → global active style