Files
homeai/CLAUDE.md
Aodhan Collins 38247d7cc4 Initial project structure and planning docs
Full project plan across 8 sub-projects (homeai-infra, homeai-llm,
homeai-voice, homeai-agent, homeai-character, homeai-esp32,
homeai-visual, homeai-images). Includes per-project PLAN.md files,
top-level PROJECT_PLAN.md, and master TODO.md.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-04 01:11:37 +00:00

6.5 KiB

CLAUDE.md — Home AI Assistant Project

Project Overview

A self-hosted, always-on personal AI assistant running on a Mac Mini M4 Pro (64GB RAM, 1TB SSD). The goal is a modular, expandable system that replaces commercial smart home speakers (Google Home etc.) with a locally-run AI that has a defined personality, voice, visual representation, and full smart home integration.


Hardware

Component Spec
Chip Apple M4 Pro
CPU 14-core
GPU 20-core
Neural Engine 16-core
RAM 64GB unified memory
Storage 1TB SSD
Network Gigabit Ethernet

All AI inference runs locally on this machine. No cloud dependency required (cloud APIs optional).


Core Stack

AI & LLM

  • Ollama — local LLM runtime (target models: Llama 3.3 70B, Qwen 2.5 72B)
  • Open WebUI — browser-based chat interface, runs as Docker container

Image Generation

  • ComfyUI — primary image generation UI, node-based workflows
  • Target models: SDXL, Flux.1, ControlNet
  • Runs via Metal (Apple GPU API)

Speech

  • Whisper.cpp — speech-to-text, optimised for Apple Silicon/Neural Engine
  • Kokoro TTS — fast, lightweight text-to-speech (primary, low-latency)
  • Chatterbox TTS — voice cloning engine (Apple Silicon MPS optimised)
  • Qwen3-TTS — alternative voice cloning via MLX
  • openWakeWord — always-on wake word detection

Smart Home

  • Home Assistant — smart home control platform (Docker)
  • Wyoming Protocol — bridges Whisper STT + Kokoro/Piper TTS into Home Assistant
  • Music Assistant — self-hosted music control, integrates with Home Assistant
  • Snapcast — multi-room synchronised audio output

AI Agent / Orchestration

  • OpenClaw — primary AI agent layer; receives voice commands, calls tools, manages personality
  • n8n — visual workflow automation (Docker), chains AI actions
  • mem0 — long-term memory layer for the AI character

Character & Personality

  • Character Manager (built — see character-manager.jsx) — single config UI for personality, prompts, models, Live2D mappings, and notes
  • Character config exports to JSON, consumed by OpenClaw system prompt and pipeline

Visual Representation

  • VTube Studio — Live2D model display on desktop (macOS) and mobile (iOS/Android)
  • VTube Studio WebSocket API used to drive expressions from the AI pipeline
  • LVGL — simplified animated face on ESP32-S3-BOX-3 units
  • Live2D model: to be sourced/commissioned (nizima.com or booth.pm)

Room Presence (Smart Speaker Replacement)

  • ESP32-S3-BOX-3 units — one per room
  • Flashed with ESPHome
  • Acts as Wyoming Satellite (mic input → Mac Mini → TTS audio back)
  • LVGL display shows animated face + status info
  • Communicates over local WiFi

Infrastructure

  • Docker Desktop for Mac — containerises Home Assistant, Open WebUI, n8n, etc.
  • Tailscale — secure remote access to all services, no port forwarding
  • Authelia — 2FA authentication layer for exposed web UIs
  • Portainer — Docker container management UI
  • Uptime Kuma — service health monitoring and mobile alerts
  • Gitea — self-hosted Git server for all project code and configs
  • code-server — browser-based VS Code for remote development

Voice Pipeline (End-to-End)

ESP32-S3-BOX-3 (room)
  → Wake word detected (openWakeWord, runs locally on device or Mac Mini)
  → Audio streamed to Mac Mini via Wyoming Satellite
  → Whisper.cpp transcribes speech to text
  → OpenClaw receives text + context
  → Ollama LLM generates response (with character persona from system prompt)
  → mem0 updates long-term memory
  → Response dispatched:
      → Kokoro/Chatterbox renders TTS audio
      → Audio sent back to ESP32-S3-BOX-3 (spoken response)
      → VTube Studio API triggered (expression + lip sync on desktop/mobile)
      → Home Assistant action called if applicable (lights, music, etc.)

Character System

The AI assistant has a defined personality managed via the Character Manager tool.

Key config surfaces:

  • System prompt — injected into every Ollama request
  • Voice clone reference.wav file path for Chatterbox/Qwen3-TTS
  • Live2D expression mappings — idle, speaking, thinking, happy, error states
  • VTube Studio WebSocket triggers — JSON map of events to expressions
  • Custom prompt rules — trigger/response overrides for specific contexts
  • mem0 — persistent memory that evolves over time

Character config JSON (exported from Character Manager) is the single source of truth consumed by all pipeline components.


Project Priorities

  1. Foundation — Docker stack up (Home Assistant, Open WebUI, Portainer, Uptime Kuma)
  2. LLM — Ollama running with target models, Open WebUI connected
  3. Voice pipeline — Whisper → Ollama → Kokoro → Wyoming → Home Assistant
  4. OpenClaw — installed, onboarded, connected to Ollama and Home Assistant
  5. ESP32-S3-BOX-3 — ESPHome flash, Wyoming Satellite, LVGL face
  6. Character system — system prompt wired up, mem0 integrated, voice cloned
  7. VTube Studio — model loaded, WebSocket API bridge written as OpenClaw skill
  8. ComfyUI — image generation online, character-consistent model workflows
  9. Extended integrations — n8n workflows, Music Assistant, Snapcast, Gitea, code-server
  10. Polish — Authelia, Tailscale hardening, mobile companion, iOS widgets

Key Paths & Conventions

  • All Docker compose files: ~/server/docker/
  • OpenClaw skills: ~/.openclaw/skills/
  • Character configs: ~/.openclaw/characters/
  • Whisper models: ~/models/whisper/
  • Ollama models: managed by Ollama at ~/.ollama/models/
  • ComfyUI models: ~/ComfyUI/models/
  • Voice reference audio: ~/voices/
  • Gitea repos root: ~/gitea/

Notes for Planning

  • All services should survive a Mac Mini reboot (launchd or Docker restart policies)
  • ESP32-S3-BOX-3 units are dumb satellites — all intelligence stays on Mac Mini
  • The character JSON schema (from Character Manager) should be treated as a versioned spec; pipeline components read from it, never hardcode personality values
  • OpenClaw skills are the primary extension mechanism — new capabilities = new skills
  • Prefer local models; cloud API keys (Anthropic, OpenAI) are fallback only
  • VTube Studio API bridge should be a standalone OpenClaw skill with clear event interface
  • mem0 memory store should be backed up as part of regular Gitea commits