Files

Aodhan Collins 6a0bae2a0b feat(phase-04): Wyoming Satellite integration + OpenClaw HA components

## Voice Pipeline (P3)
- Replace openWakeWord daemon with Wyoming Satellite approach
- Add Wyoming Satellite service on port 10700 for HA voice pipeline
- Update setup.sh with cross-platform sed compatibility (macOS/Linux)
- Add version field to Kokoro TTS voice info
- Update launchd service loader to use Wyoming Satellite

## Home Assistant Integration (P4)
- Add custom conversation agent component (openclaw_conversation)
  - Fix: Use IntentResponse instead of plain strings (HA API requirement)
  - Support both HTTP API and CLI fallback modes
  - Config flow for easy HA UI setup
- Add OpenClaw bridge scripts (Python + Bash)
- Add ha-ctl utility for HA entity control
  - Fix: Use context manager for token file reading
- Add HA configuration examples and documentation

## Infrastructure
- Add mem0 backup automation (launchd + script)
- Add n8n workflow templates (morning briefing, notification router)
- Add VS Code workspace configuration
- Reorganize model files into categorized folders:
  - lmstudio-community/
  - mlx-community/
  - bartowski/
  - mradermacher/

## Documentation
- Update PROJECT_PLAN.md with Wyoming Satellite architecture
- Update TODO.md with completed Wyoming integration tasks
- Add OPENCLAW_INTEGRATION.md for HA setup guide

## Testing
- Verified Wyoming services running (STT:10300, TTS:10301, Satellite:10700)
- Verified OpenClaw CLI accessibility
- Confirmed cross-platform compatibility fixes

2026-03-08 02:06:37 +00:00

16 KiB

Raw Blame History

HomeAI — Full Project Plan

Last updated: 2026-03-04

Overview

This project builds a self-hosted, always-on AI assistant running entirely on a Mac Mini M4 Pro. It is decomposed into 8 sub-projects that can be developed in parallel where dependencies allow, then bridged via well-defined interfaces.

The guiding principle: each sub-project exposes a clean API/config surface. No project hard-codes knowledge of another's internals.

Sub-Project Map

ID	Name	Description	Primary Language
P1	`homeai-infra`	Docker stack, networking, monitoring, secrets	YAML / Shell
P2	`homeai-llm`	Ollama + Open WebUI setup, model management	YAML / Shell
P3	`homeai-voice`	STT, TTS, Wyoming bridge, wake word	Python / Shell
P4	`homeai-agent`	OpenClaw config, skills, n8n workflows, mem0	Python / JSON
P5	`homeai-character`	Character Manager UI, persona JSON schema, voice clone	React / JSON
P6	`homeai-esp32`	ESPHome firmware, Wyoming Satellite, LVGL face	C++ / YAML
P7	`homeai-visual`	VTube Studio bridge, Live2D expression mapping	Python / JSON
P8	`homeai-images`	ComfyUI workflows, model management, ControlNet	Python / JSON

All repos live under ~/gitea/homeai/ on the Mac Mini and are mirrored to the self-hosted Gitea instance (set up in P1).

Phase 1 — Foundation (P1 + P2)

Goal: Everything containerised, stable, accessible remotely. LLM responsive via browser.

P1: `homeai-infra`

Deliverables:

docker-compose.yml — master compose file (or per-service files under ~/server/docker/)
Services: Home Assistant, Portainer, Uptime Kuma, Gitea, code-server
Tailscale installed on Mac Mini, all services on Tailnet
Gitea repos initialised, SSH keys configured
Uptime Kuma monitors all service endpoints
Docker restart policies: unless-stopped on all containers
Documented .env file pattern (secrets never committed)

Key decisions:

Single docker-compose.yml vs per-service compose files — recommend per-service files in ~/server/docker/<service>/ orchestrated by a root Makefile
Tailscale as sole remote access method (no public port forwarding)
Authelia deferred to Phase 4 polish (internal LAN services don't need 2FA immediately)

Interface contract: Exposes service URLs as env vars (e.g. HA_URL, GITEA_URL) written to ~/server/.env.services — consumed by all other projects.

P2: `homeai-llm`

Deliverables:

Ollama installed natively on Mac Mini (not Docker — needs Metal GPU access)
Models pulled: llama3.3:70b, qwen2.5:72b (and a fast small model: qwen2.5:7b for low-latency tasks)
Open WebUI running as Docker container, connected to Ollama
Model benchmark script — measures tokens/sec per model
ollama-models.txt — pinned model manifest for reproducibility

Key decisions:

Ollama runs as a launchd service (~/Library/LaunchAgents/) to survive reboots
Open WebUI exposed only on Tailnet
API endpoint: http://localhost:11434 (Ollama default)

Interface contract: Ollama OpenAI-compatible API at http://localhost:11434/v1 — used by P3, P4, P7.

Phase 2 — Voice Pipeline (P3)

Goal: Full end-to-end voice: speak → transcribe → LLM → TTS → hear response. No ESP32 yet — test with a USB mic on Mac Mini.

P3: `homeai-voice`

Deliverables:

Whisper.cpp compiled for Apple Silicon, model downloaded (medium.en or large-v3)
Kokoro TTS installed, tested, latency benchmarked
Chatterbox TTS installed (MPS optimised build), voice reference .wav ready
Qwen3-TTS via MLX installed as fallback
openWakeWord running on Mac Mini, detecting wake word
Wyoming protocol server running — bridges STT+TTS into Home Assistant
Home Assistant voice_assistant pipeline configured end-to-end
Test script: test_voice_pipeline.sh — mic in → spoken response out

Sub-components:

[Mic] → Wyoming Satellite (port 10700) → Home Assistant Voice Pipeline → Wyoming STT (Whisper)
                                                                      ↓
[Speaker] ← Wyoming TTS (Kokoro) ← OpenClaw Agent ← transcribed text

Note: The original openWakeWord daemon has been replaced by the Wyoming satellite approach, which handles wake word detection through Home Assistant's voice pipeline.

Key decisions:

Whisper.cpp runs as a Wyoming STT provider (via wyoming-faster-whisper)
Kokoro is primary TTS; Chatterbox used when voice cloning is active (P5)
Wyoming satellite runs on port 10700 — handles audio I/O and connects to HA voice pipeline
openWakeWord daemon disabled — wake word detection now handled by HA via Wyoming satellite
Wyoming server ports: 10300 (STT), 10301 (TTS), 10700 (Satellite) — standard Wyoming ports

Interface contract:

Wyoming STT: tcp://localhost:10300 (Whisper large-v3)
Wyoming TTS: tcp://localhost:10301 (Kokoro ONNX)
Wyoming Satellite: tcp://localhost:10700 (Mac Mini audio I/O)
Direct Python API for P4 (agent bypasses Wyoming for non-HA calls)
OpenClaw Bridge: homeai-agent/skills/home-assistant/openclaw_bridge.py (HA integration)

Phase 3 — AI Agent & Character (P4 + P5)

Goal: OpenClaw receives voice/text input, applies character persona, calls tools, returns rich responses.

P4: `homeai-agent`

Deliverables:

OpenClaw installed and configured
Connected to Ollama (llama3.3:70b as primary model)
Connected to Home Assistant (long-lived access token in config)
mem0 installed, configured with local storage backend
mem0 backup job: daily git commit to Gitea
Core skills written:
- home_assistant.py — call HA services (lights, switches, scenes)
- memory.py — read/write mem0 memories
- weather.py — local weather via HA sensor data
- timer.py — set timers/reminders
- music.py — stub for Music Assistant (P9)
n8n running as Docker container, webhook trigger from OpenClaw
Sample n8n workflow: morning briefing (time + weather + calendar)
System prompt template: loads character JSON from P5

Key decisions:

OpenClaw config at ~/.openclaw/config.yaml
Skills at ~/.openclaw/skills/ — one file per skill, auto-discovered
System prompt: ~/.openclaw/characters/<active>.json loaded at startup
mem0 store: local file backend at ~/.openclaw/memory/ (SQLite)

Interface contract:

OpenClaw exposes a local HTTP API (default port 8080) — used by P3 (voice pipeline hands off transcribed text here)
Consumes character JSON from P5

P5: `homeai-character`

Deliverables:

Character Manager UI (character-manager.jsx) — already exists, needs wiring
Character JSON schema v1 defined and documented
Export produces ~/.openclaw/characters/<name>.json
Fields: name, system_prompt, voice_ref_path, tts_engine, live2d_expressions, vtube_ws_triggers, custom_rules, model_overrides
Validation: schema validator script rejects malformed exports
Sample character: aria.json (default assistant persona)
Voice clone: reference .wav recorded/sourced, placed at ~/voices/<name>.wav

Key decisions:

JSON schema is versioned ("schema_version": 1) — pipeline components check version before loading
Character Manager is a local React app (served by Vite dev server or built to static files)
Single active character at a time; OpenClaw watches the file for changes (hot reload)

Interface contract:

Output: ~/.openclaw/characters/<name>.json — consumed by P4, P3 (TTS voice selection), P7 (expression mapping)
Schema published in homeai-character/schema/character.schema.json

Phase 4 — Hardware Satellites (P6)

Goal: ESP32-S3-BOX-3 units act as room presence nodes — wake word, mic input, audio output, animated face.

P6: `homeai-esp32`

Deliverables:

ESPHome config for ESP32-S3-BOX-3 (esphome/s3-box-living-room.yaml, etc.)
Wyoming Satellite component configured — streams mic audio to Mac Mini Wyoming STT
Audio playback: receives TTS audio from Mac Mini, plays via built-in speaker
LVGL face: animated idle/speaking/thinking states
Wake word: either on-device (microWakeWord via ESPHome) or forwarded to Mac Mini openWakeWord
OTA update mechanism configured
One unit per room — config templated with room name as variable

LVGL Face States:

State	Animation
Idle	Slow blink, gentle sway
Listening	Eyes wide, mic indicator
Thinking	Eyes narrow, loading dots
Speaking	Mouth animation synced to audio
Error	Red eyes, shake

Key decisions:

Wake word on-device preferred (lower latency, no always-on network stream)
microWakeWord model: hey_jarvis or custom trained word
LVGL animations compiled into ESPHome firmware (no runtime asset loading)
Each unit has a unique device name for HA entity naming

Interface contract:

Wyoming Satellite → Mac Mini Wyoming STT server (tcp://<mac-mini-ip>:10300)
Receives audio back via Wyoming TTS response
LVGL state driven by Home Assistant entity state (HA → ESPHome event)

Phase 5 — Visual Layer (P7)

Goal: VTube Studio shows Live2D model on desktop/mobile; expressions driven by AI pipeline state.

P7: `homeai-visual`

Deliverables:

VTube Studio installed on Mac Mini (macOS app)
Live2D model loaded (sourced from nizima.com or booth.pm)
VTube Studio WebSocket API enabled (port 8001)
OpenClaw skill: vtube_studio.py
- Connects to VTube Studio WebSocket
- Auth token exchange and persistence
- Methods: trigger_expression(name), trigger_hotkey(name), set_parameter(name, value)
Expression map in character JSON → VTube hotkey IDs
Lip sync: driven by audio envelope or TTS phoneme timing
Mobile: VTube Studio on iOS/Android connected to same model via Tailscale

Key decisions:

Expression trigger events: idle, speaking, thinking, happy, sad, error
Lip sync approach: simple amplitude-based (fast) rather than phoneme-based (complex) initially
Auth token stored at ~/.openclaw/vtube_token.json

Interface contract:

OpenClaw calls vtube_studio.trigger_expression(event) from within response pipeline
Event names defined in character JSON live2d_expressions field

Phase 6 — Image Generation (P8)

Goal: ComfyUI online with character-consistent image generation workflows.

P8: `homeai-images`

Deliverables:

ComfyUI installed at ~/ComfyUI/, running via launchd
Models downloaded: SDXL base, Flux.1-dev (or schnell), ControlNet (canny, depth)
Character LoRA: trained on character reference images for consistent appearance
Saved workflows:
- workflows/portrait.json — character portrait, controllable expression
- workflows/scene.json — character in scene with ControlNet pose
- workflows/quick.json — fast draft via Flux.1-schnell
OpenClaw skill: comfyui.py — submits workflow via ComfyUI REST API, returns image path
ComfyUI API port: 8188

Interface contract:

OpenClaw calls comfyui.generate(workflow_name, params) → returns local image path
ComfyUI REST API: http://localhost:8188

Phase 7 — Extended Integrations & Polish

Deliverables:

Music Assistant — Docker container, integrated with HA, OpenClaw music.py skill updated
Snapcast — server on Mac Mini, clients on ESP32 units (multi-room sync)
Authelia — 2FA in front of all web UIs exposed via Tailscale
n8n advanced workflows: daily briefing, calendar reminders, notification routing
iOS Shortcuts companion: trigger OpenClaw from iPhone widget
Uptime Kuma alerts: pushover/ntfy notifications on service down
Backup automation: daily Gitea commits of mem0, character configs, n8n workflows

Dependency Graph

P1 (infra) ─────────────────────────────┐
P2 (llm)   ──────────────────────┐      │
P3 (voice) ────────────────┐     │      │
P5 (character) ──────┐     │     │      │
                      ↓     ↓     ↓      ↓
                      P4 (agent) ─────→ HA
                      ↓
            P6 (esp32) ← Wyoming
            P7 (visual) ← vtube skill
            P8 (images) ← comfyui skill

Hard dependencies:

P4 requires P1 (HA URL), P2 (Ollama), P5 (character JSON)
P3 requires P2 (LLM), P4 (agent endpoint)
P6 requires P3 (Wyoming server), P1 (HA)
P7 requires P4 (OpenClaw skill runner), P5 (expression map)
P8 requires P4 (OpenClaw skill runner)

Can be done in parallel:

P1 + P5 (infra and character manager are independent)
P2 + P5 (LLM setup and character UI are independent)
P7 + P8 (visual and images are both P4 dependents but independent of each other)

Interface Contracts Summary

Contract	Type	Defined In	Consumed By
`~/server/.env.services`	env file	P1	All
Ollama API `localhost:11434/v1`	HTTP (OpenAI compat)	P2	P3, P4, P7
Wyoming STT `localhost:10300`	TCP/Wyoming	P3	P6, HA
Wyoming TTS `localhost:10301`	TCP/Wyoming	P3	P6, HA
Wyoming Satellite `localhost:10700`	TCP/Wyoming	P3	HA
OpenClaw API `localhost:8080`	HTTP	P4	P3, P7, P8
Character JSON `~/.openclaw/characters/`	JSON file	P5	P4, P3, P7
`character.schema.json` v1	JSON Schema	P5	P4, P3, P7
VTube Studio WS `localhost:8001`	WebSocket	VTube Studio	P7
ComfyUI API `localhost:8188`	HTTP	ComfyUI	P8
Home Assistant API	HTTP/WS	P1 (HA)	P4, P6

Repo Structure (Gitea)

~/gitea/homeai/
├── homeai-infra/          # P1
│   ├── docker/            # per-service compose files
│   ├── scripts/           # setup/teardown helpers
│   └── Makefile
├── homeai-llm/            # P2
│   ├── ollama-models.txt
│   └── scripts/
├── homeai-voice/          # P3
│   ├── whisper/
│   ├── tts/
│   ├── wyoming/
│   └── scripts/
├── homeai-agent/          # P4
│   ├── skills/
│   ├── workflows/         # n8n exports
│   └── config/
├── homeai-character/      # P5
│   ├── src/               # React character manager
│   ├── schema/
│   └── characters/        # exported JSONs
├── homeai-esp32/          # P6
│   └── esphome/
├── homeai-visual/         # P7
│   └── skills/
└── homeai-images/         # P8
    ├── workflows/          # ComfyUI workflow JSONs
    └── skills/

Suggested Build Order

Week	Focus	Projects
1	Infrastructure up, LLM running	P1, P2
2	Voice pipeline end-to-end (desktop mic test)	P3
3	Character Manager wired, OpenClaw connected	P4, P5
4	ESP32 firmware, first satellite running	P6
5	VTube Studio live, expressions working	P7
6	ComfyUI online, character LoRA trained	P8
7+	Extended integrations, polish, Authelia	Phase 7

Open Questions / Decisions Needed

Which OpenClaw version/fork to use? (confirm it supports Ollama natively)
Wake word: hey_jarvis vs custom trained word — what should the character's name be?
Live2D model: commission custom or buy from nizima.com? Budget?
Snapcast: output to ESP32 speakers or separate audio hardware per room?
n8n: self-hosted Docker vs n8n Cloud (given local-first preference → Docker)
Authelia: local user store or LDAP backend? (local store is simpler)
mem0: local SQLite or run Qdrant vector DB for better semantic search?

16 KiB Raw Blame History

HomeAI — Full Project Plan

Overview

Sub-Project Map

Phase 1 — Foundation (P1 + P2)

P1: homeai-infra

P2: homeai-llm

Phase 2 — Voice Pipeline (P3)

P3: homeai-voice

Phase 3 — AI Agent & Character (P4 + P5)

P4: homeai-agent

P5: homeai-character

Phase 4 — Hardware Satellites (P6)

P6: homeai-esp32

Phase 5 — Visual Layer (P7)

P7: homeai-visual

Phase 6 — Image Generation (P8)

P8: homeai-images

Phase 7 — Extended Integrations & Polish

Dependency Graph

Interface Contracts Summary

Repo Structure (Gitea)

Suggested Build Order

Open Questions / Decisions Needed

16 KiB

Raw Blame History

P1: `homeai-infra`

P2: `homeai-llm`

P3: `homeai-voice`

P4: `homeai-agent`

P5: `homeai-character`

P6: `homeai-esp32`

P7: `homeai-visual`

P8: `homeai-images`