Files

Aodhan Collins 56580a2cb2 feat: memory v2, prompt styles, Dream/GAZE integration, Wyoming TTS fix

SQLite + sqlite-vec replaces JSON memory files with semantic search,
follow-up injection, privacy levels, and lifecycle management.

Six prompt styles (quick/standard/creative/roleplayer/game-master/storyteller)
with per-style Claude model tiering (Haiku/Sonnet/Opus), temperature control,
and section stripping. Characters can set default style and per-style overrides.

Dream character import and GAZE character linking in the dashboard editor
with auto-populated fields, cover image resolution, and preset assignment.

Bridge: session isolation (conversation_id / 12h satellite buckets),
model routing refactor, PUT/DELETE support, memory REST endpoints.

Dashboard: mobile-responsive sidebar, retry button, style picker in chat,
follow-up banner, memory lifecycle/privacy UI, cloud model options in editor.

Wyoming TTS: upgraded to v1.8.0 for HA 1.7.2 compatibility.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-03-24 22:31:04 +00:00

13 KiB

Raw Blame History

CLAUDE.md — Home AI Assistant Project

Project Overview

A self-hosted, always-on personal AI assistant running on a Mac Mini M4 Pro (64GB RAM, 1TB SSD). The goal is a modular, expandable system that replaces commercial smart home speakers (Google Home etc.) with a locally-run AI that has a defined personality, voice, visual representation, and full smart home integration.

Hardware

Component	Spec
Chip	Apple M4 Pro
CPU	14-core
GPU	20-core
Neural Engine	16-core
RAM	64GB unified memory
Storage	1TB SSD
Network	Gigabit Ethernet

Primary LLMs are Claude 4.5/4.6 family via Anthropic API (Haiku for quick, Sonnet for standard, Opus for creative/RP). Local Ollama models available as fallback. All other inference (STT, TTS, image gen) runs locally.

Core Stack

AI & LLM

Claude 4.5/4.6 family — primary LLMs via Anthropic API, tiered per prompt style: Haiku 4.5 (quick commands), Sonnet 4.6 (standard/creative), Opus 4.6 (roleplay/storytelling)
Ollama — local LLM runtime (fallback models: Llama 3.3 70B, Qwen 3.5 35B-A3B, Qwen 2.5 7B)
Model keep-warm daemon — preload-models.sh runs as a loop, checks every 5 min, re-pins evicted models with keep_alive=-1. Keeps qwen2.5:7b (small/fast) and $HOMEAI_MEDIUM_MODEL (default: qwen3.5:35b-a3b) always loaded in VRAM. Medium model is configurable via env var for per-persona model assignment.
Open WebUI — browser-based chat interface, runs as Docker container

Image Generation

ComfyUI — primary image generation UI, node-based workflows
Target models: SDXL, Flux.1, ControlNet
Runs via Metal (Apple GPU API)

Speech

Whisper.cpp — speech-to-text, optimised for Apple Silicon/Neural Engine
Kokoro TTS — fast, lightweight text-to-speech (primary, low-latency, local)
ElevenLabs TTS — cloud voice cloning/synthesis (per-character voice ID, routed via state file)
Chatterbox TTS — voice cloning engine (Apple Silicon MPS optimised)
Qwen3-TTS — alternative voice cloning via MLX
openWakeWord — always-on wake word detection

Smart Home

Home Assistant — smart home control platform (Docker)
Wyoming Protocol — bridges Whisper STT + Kokoro/Piper TTS into Home Assistant
Music Assistant — self-hosted music control (Docker on Pi at 10.0.0.199:8095), Spotify + SMB library + Chromecast players
Snapcast — multi-room synchronised audio output

AI Agent / Orchestration

OpenClaw — primary AI agent layer; receives voice commands, calls tools, manages personality
OpenClaw Skills — 13 skills total: home-assistant, image-generation, voice-assistant, vtube-studio, memory, service-monitor, character, routine, music, workflow, gitea, calendar, mode
n8n — visual workflow automation (Docker), chains AI actions
Character Memory System — SQLite + sqlite-vec semantic search (personal per-character + general shared + follow-ups), injected into LLM system prompt with context-aware retrieval
Prompt Styles — 6 styles (quick, standard, creative, roleplayer, game-master, storyteller) with per-style model routing, temperature, and section stripping. JSON templates in homeai-agent/prompt-styles/
Public/Private Mode — routes requests to local Ollama (private) or cloud LLMs (public) with per-category overrides via active-mode.json. Default primary model is Claude Sonnet 4.6, with per-style model tiering (Haiku/Sonnet/Opus).

Character & Personality

Character Schema v2 — JSON spec with background, dialogue_style, appearance, skills, gaze_presets, dream_id, gaze_character, prompt style overrides (v1 auto-migrated)
HomeAI Dashboard — unified web app: character editor, chat, memory manager, service dashboard
Dream — character management service (http://10.0.0.101:3000), REST API for character CRUD with GAZE integration for cover images
Character MCP Server — LLM-assisted character creation via Fandom wiki/Wikipedia lookup (Docker)
GAZE — image generation service (http://10.0.0.101:5782), REST API for presets, characters, and job-based image generation
Character config stored as JSON files in ~/homeai-data/characters/, consumed by bridge for system prompt construction

Visual Representation

VTube Studio — Live2D model display on desktop (macOS) and mobile (iOS/Android)
VTube Studio WebSocket API used to drive expressions from the AI pipeline
LVGL — simplified animated face on ESP32-S3-BOX-3 units
Live2D model: to be sourced/commissioned (nizima.com or booth.pm)

Room Presence (Smart Speaker Replacement)

ESP32-S3-BOX-3 units — one per room
Flashed with ESPHome
Acts as Wyoming Satellite (mic input → Mac Mini → TTS audio back)
LVGL display shows animated face + status info
Communicates over local WiFi

Infrastructure

Docker Desktop for Mac — containerises Home Assistant, Open WebUI, n8n, etc.
Tailscale — secure remote access to all services, no port forwarding
Authelia — 2FA authentication layer for exposed web UIs
Portainer — Docker container management UI
Uptime Kuma — service health monitoring and mobile alerts
Gitea — self-hosted Git server for all project code and configs
code-server — browser-based VS Code for remote development

Voice Pipeline (End-to-End)

ESP32-S3-BOX-3 (room)
  → Wake word detected (openWakeWord, runs locally on device or Mac Mini)
  → Audio streamed to Mac Mini via Wyoming Satellite
  → Whisper MLX transcribes speech to text
  → HA conversation agent → OpenClaw HTTP Bridge
  → Bridge resolves character (satellite_id → character mapping)
  → Bridge builds system prompt (profile + memories) and writes TTS config to state file
  → Bridge checks active-mode.json for model routing (private=local, public=cloud)
  → OpenClaw CLI → LLM generates response (Claude Haiku/Sonnet/Opus per style, Ollama fallback)
  → Response dispatched:
      → Wyoming TTS reads state file → routes to Kokoro (local) or ElevenLabs (cloud)
      → Audio sent back to ESP32-S3-BOX-3 (spoken response)
      → VTube Studio API triggered (expression + lip sync on desktop/mobile)
      → Home Assistant action called if applicable (lights, music, etc.)

Timeout Strategy

The HTTP bridge checks Ollama /api/ps before each request to determine if the LLM is already loaded:

Layer	Warm (model loaded)	Cold (model loading)
HA conversation component	200s	200s
OpenClaw HTTP bridge	60s	180s
OpenClaw agent	60s	60s

The keep-warm daemon ensures models stay loaded, so cold starts should be rare (only after Ollama restarts or VRAM pressure).

Character System

The AI assistant has a defined personality managed via the HomeAI Dashboard (character editor + memory manager).

Character Schema v2

Each character is a JSON file in ~/homeai-data/characters/ with:

System prompt — core personality, injected into every LLM request
Profile fields — background, appearance, dialogue_style, skills array
TTS config — engine (kokoro/elevenlabs), kokoro_voice, elevenlabs_voice_id, elevenlabs_model, speed
GAZE presets — array of {preset, trigger} for image generation styles
Dream link — dream_id for syncing character data from Dream service
GAZE link — gaze_character for auto-assigned cover image and presets
Prompt style config — default_prompt_style, prompt_style_overrides for per-style tuning
Custom prompt rules — trigger/response overrides for specific contexts

Memory System

SQLite + sqlite-vec database at ~/homeai-data/memories/memories.db:

Personal memories — per-character, semantic/episodic/relational/opinion types
General memories — shared operational knowledge (character_id = "general")
Follow-ups — LLM-driven questions injected into system prompt, auto-resolve after 2 surfacings or 48h
Privacy levels — public, sensitive, local_only (local_only excluded from cloud model requests)
Semantic search — sentence-transformers all-MiniLM-L6-v2 embeddings (384 dims) for context-aware retrieval
Core module: homeai-agent/memory_store.py (imported by bridge + memory-ctl skill)

Prompt Styles

Six response styles in homeai-agent/prompt-styles/, each a JSON template with model, temperature, and instructions:

quick — Claude Haiku 4.5, low temp, brief responses, strips profile sections
standard — Claude Sonnet 4.6, balanced
creative — Claude Sonnet 4.6, higher temp, elaborative
roleplayer — Claude Opus 4.6, full personality injection
game-master — Claude Opus 4.6, narrative-focused
storyteller — Claude Opus 4.6, story-centric

Style selection: dashboard chat has a style picker; characters can set default_prompt_style; satellites use the global active style. Bridge resolves model per style → group → mode → default.

TTS Voice Routing

The bridge writes the active character's TTS config to ~/homeai-data/active-tts-voice.json before each request. The Wyoming TTS server reads this state file to determine which engine/voice to use:

Kokoro — local, fast, uses kokoro_voice field (e.g., af_heart)
ElevenLabs — cloud, uses elevenlabs_voice_id + elevenlabs_model, returns PCM 24kHz

This works for both ESP32/HA pipeline and dashboard chat.

Project Priorities

Foundation — Docker stack up (Home Assistant, Open WebUI, Portainer, Uptime Kuma) ✅
LLM — Ollama running with target models, Open WebUI connected ✅
Voice pipeline — Whisper → Ollama → Kokoro → Wyoming → Home Assistant ✅
OpenClaw — installed, onboarded, connected to Ollama and Home Assistant ✅
ESP32-S3-BOX-3 — ESPHome flash, Wyoming Satellite, display faces ✅
Character system — schema v2, dashboard editor, memory system, per-character TTS routing ✅
OpenClaw skills expansion — 9 new skills (memory, monitor, character, routine, music, workflow, gitea, calendar, mode) + public/private mode routing ✅
Music Assistant — deployed on Pi (10.0.0.199:8095), Spotify + SMB + Chromecast players ✅
Memory v2 + Prompt Styles + Dream/GAZE — SQLite memory with semantic search, 6 prompt styles with model tiering, Dream character import, GAZE character linking ✅
Animated visual — PNG/GIF character visual for the web assistant (initial visual layer)
Android app — companion app for mobile access to the assistant
ComfyUI — image generation online, character-consistent model workflows
Extended integrations — Snapcast, code-server
Polish — Authelia, Tailscale hardening, iOS widgets

Stretch Goals

Live2D / VTube Studio — full Live2D model with WebSocket API bridge (requires learning Live2D tooling)

Key Paths & Conventions

Launchd plists (source): homeai-*/launchd/ (symlinked to ~/Library/LaunchAgents/)
Docker compose (Mac Mini): homeai-infra/docker/docker-compose.yml
Docker compose (Pi/SELBINA): ~/docker/selbina/ on 10.0.0.199
OpenClaw skills: ~/.openclaw/skills/
OpenClaw workspace tools: ~/.openclaw/workspace/TOOLS.md
OpenClaw config: ~/.openclaw/openclaw.json
Character configs: ~/homeai-data/characters/
Character memories DB: ~/homeai-data/memories/memories.db
Memory store module: homeai-agent/memory_store.py
Prompt style templates: homeai-agent/prompt-styles/
Active prompt style: ~/homeai-data/active-prompt-style.json
Conversation history: ~/homeai-data/conversations/
Active TTS state: ~/homeai-data/active-tts-voice.json
Active mode state: ~/homeai-data/active-mode.json
Satellite → character map: ~/homeai-data/satellite-map.json
Local routines: ~/homeai-data/routines/
Voice reminders: ~/homeai-data/reminders.json
Whisper models: ~/models/whisper/
Ollama models: managed by Ollama at ~/.ollama/models/
ComfyUI models: ~/ComfyUI/models/
Voice reference audio: ~/voices/
Gitea repos root: ~/gitea/
Music Assistant (Pi): ~/docker/selbina/music-assistant/ on 10.0.0.199
Skills user guide: homeai-agent/SKILLS_GUIDE.md
Dream service: http://10.0.0.101:3000 (character management, REST API)
GAZE service: http://10.0.0.101:5782 (image generation, REST API)

Notes for Planning

All services should survive a Mac Mini reboot (launchd or Docker restart policies)
ESP32-S3-BOX-3 units are dumb satellites — all intelligence stays on Mac Mini
The character JSON schema (from Character Manager) should be treated as a versioned spec; pipeline components read from it, never hardcode personality values
OpenClaw skills are the primary extension mechanism — new capabilities = new skills
Primary LLMs are Claude 4.5/4.6 family (Anthropic API) with per-style tiering; local Ollama models are available as fallback
Launchd plists are symlinked from repo source to ~/Library/LaunchAgents/ — edit source, then bootout/bootstrap to reload
Music Assistant runs on Pi (10.0.0.199), not Mac Mini — needs host networking for Chromecast mDNS discovery
VTube Studio API bridge should be a standalone OpenClaw skill with clear event interface
Memory DB (memories.db) should be backed up as part of regular Gitea commits
Dream characters can be linked to GAZE characters for cover image fallback and cross-referencing
Prompt style selection hierarchy: explicit user pick → character default → global active style

13 KiB Raw Blame History