## Voice Pipeline (P3) - Replace openWakeWord daemon with Wyoming Satellite approach - Add Wyoming Satellite service on port 10700 for HA voice pipeline - Update setup.sh with cross-platform sed compatibility (macOS/Linux) - Add version field to Kokoro TTS voice info - Update launchd service loader to use Wyoming Satellite ## Home Assistant Integration (P4) - Add custom conversation agent component (openclaw_conversation) - Fix: Use IntentResponse instead of plain strings (HA API requirement) - Support both HTTP API and CLI fallback modes - Config flow for easy HA UI setup - Add OpenClaw bridge scripts (Python + Bash) - Add ha-ctl utility for HA entity control - Fix: Use context manager for token file reading - Add HA configuration examples and documentation ## Infrastructure - Add mem0 backup automation (launchd + script) - Add n8n workflow templates (morning briefing, notification router) - Add VS Code workspace configuration - Reorganize model files into categorized folders: - lmstudio-community/ - mlx-community/ - bartowski/ - mradermacher/ ## Documentation - Update PROJECT_PLAN.md with Wyoming Satellite architecture - Update TODO.md with completed Wyoming integration tasks - Add OPENCLAW_INTEGRATION.md for HA setup guide ## Testing - Verified Wyoming services running (STT:10300, TTS:10301, Satellite:10700) - Verified OpenClaw CLI accessibility - Confirmed cross-platform compatibility fixes
7.9 KiB
HomeAI — Next Steps Plan
Created: 2026-03-07 | Priority: Voice Loop → Foundation Hardening → Character System
Current State Summary
| Sub-Project | Status | Done / Total |
|---|---|---|
| P1 homeai-infra | Core done, tail items remain | 6 / 9 |
| P2 homeai-llm | Core done, tail items remain | 6 / 8 |
| P3 homeai-voice | STT + TTS + wake word running, HA integration pending | 7 / 13 |
| P4 homeai-agent | OpenClaw + HA skill working, mem0 + n8n pending | 10 / 16 |
| P5 homeai-character | Not started | 0 / 11 |
| P6–P8 | Not started | 0 / * |
Key milestone reached: OpenClaw can receive text, call qwen2.5:7b via Ollama, execute tool calls, and control Home Assistant entities. The voice pipeline components (STT, TTS, wake word) are all running as launchd services.
Critical gap: The voice pipeline is not yet connected through Home Assistant to the agent. The pieces exist but the end-to-end flow is untested.
Sprint 1 — Complete the Voice → Agent → HA Loop
Goal: Speak a command → hear a spoken response + see the HA action execute.
This is the highest-value work because it closes the core loop that every future feature builds on.
Tasks
1A. Finish HA Wyoming Integration (P3)
The Wyoming STT (port 10300) and TTS (port 10301) services are running. They need to be registered in Home Assistant.
- Open HA UI → Settings → Integrations → Add Integration → Wyoming Protocol
- Add STT provider: host
10.0.0.199(orlocalhostif HA is on same machine), port10300 - Add TTS provider: host
10.0.0.199, port10301 - Verify both appear as STT/TTS providers in HA
1B. Create HA Voice Assistant Pipeline (P3)
- HA → Settings → Voice Assistants → Add Assistant
- Configure: STT = Wyoming Whisper, TTS = Wyoming Kokoro, Conversation Agent = Home Assistant default (or OpenClaw if wired)
- Set as default voice assistant pipeline
1C. Test HA Assist via Browser (P3)
- Open HA dashboard → Assist panel
- Type a query (e.g. "What time is it?") → verify spoken response plays back
- Type a device command (e.g. "Turn on the reading lamp") → verify HA executes it
1D. Set Up mem0 with Chroma Backend (P4)
- Install mem0:
pip install mem0ai - Install chromadb:
pip install chromadb - Pull embedding model:
ollama pull nomic-embed-text - Write mem0 config pointing at Ollama for LLM + embeddings, Chroma for vector store
- Test: store a memory, recall it via semantic search
- Verify mem0 data persists at
~/.openclaw/memory/chroma/
1E. Write Memory Backup launchd Job (P4)
- Create git repo at
~/.openclaw/memory/(or a subdirectory) - Write backup script:
git add . && git commit -m "mem0 backup $(date)" && git push - Write launchd plist:
com.homeai.mem0-backup.plist— daily schedule - Load plist, verify it runs
1F. Build Morning Briefing n8n Workflow (P4)
- Verify n8n is running (Docker, deployed in P1)
- Create workflow: time trigger → fetch weather from HA → compose briefing text → POST to OpenClaw
/speakendpoint - Export workflow JSON to
homeai-agent/workflows/morning-briefing.json - Test: manually trigger → hear spoken briefing
1G. Build Notification Router n8n Workflow (P4)
- Create workflow: HA webhook trigger → classify urgency → high: TTS immediately, low: queue
- Export to
homeai-agent/workflows/notification-router.json
1H. Verify Full Voice → Agent → HA Action Flow (P3 + P4)
- Trigger wake word ("hey jarvis") via USB mic
- Speak a command: "Turn on the reading lamp"
- Verify: wake word detected → audio captured → STT transcribes → OpenClaw receives text → tool call to HA → lamp turns on → TTS response plays back
- Document any latency issues or failure points
Sprint 1 Flow Diagram
flowchart LR
A[USB Mic] -->|wake word| B[openWakeWord]
B -->|audio stream| C[Wyoming STT - Whisper]
C -->|transcribed text| D[Home Assistant Pipeline]
D -->|text| E[OpenClaw Agent]
E -->|tool call| F[HA REST API]
F -->|action| G[Smart Device]
E -->|response text| H[Wyoming TTS - Kokoro]
H -->|audio| I[Speaker]
Sprint 2 — Foundation Hardening
Goal: All services survive a reboot, are monitored, and are remotely accessible.
Tasks
2A. Install and Configure Tailscale (P1)
- Install Tailscale on Mac Mini:
brew install tailscale - Authenticate and join Tailnet
- Verify all services reachable via Tailscale IP (HA, Open WebUI, Portainer, Gitea, n8n, code-server)
- Document Tailscale IP → service URL mapping
2B. Configure Uptime Kuma Monitors (P1 + P2)
- Add monitors for: Home Assistant, Portainer, Gitea, code-server, n8n
- Add monitors for: Ollama API (port 11434), Open WebUI (port 3030)
- Add monitors for: Wyoming STT (port 10300), Wyoming TTS (port 10301)
- Add monitor for: OpenClaw (port 8080)
- Configure mobile push alerts (ntfy or Pushover)
2C. Cold Reboot Verification (P1)
- Reboot Mac Mini
- Verify all Docker containers come back up (restart policy:
unless-stopped) - Verify launchd services start: Ollama, Wyoming STT, Wyoming TTS, openWakeWord, OpenClaw
- Check Uptime Kuma — all monitors green within 2 minutes
- Document any services that failed to restart and fix
2D. Run LLM Benchmarks (P2)
- Run
homeai-llm/scripts/benchmark.sh - Record results: tokens/sec for each model (qwen2.5:7b, llama3.3:70b, etc.)
- Write results to
homeai-llm/benchmark-results.md
Sprint 3 — Character System (P5)
Goal: Character schema defined, default character created, Character Manager UI functional.
Tasks
3A. Define Character Schema (P5)
- Write
homeai-character/schema/character.schema.json(v1) — based on the spec in PLAN.md - Write
homeai-character/schema/README.mddocumenting each field
3B. Create Default Character (P5)
- Write
homeai-character/characters/aria.jsonwith placeholder expression IDs - Validate aria.json against schema (manual or script)
3C. Set Up Vite Project (P5)
- Initialize Vite + React project in
homeai-character/ - Install deps:
npm install react react-dom ajv - Move existing
character-manager.jsxintosrc/ - Verify dev server runs at
http://localhost:5173
3D. Wire Character Manager Features (P5)
- Integrate schema validation on export (ajv)
- Add expression mapping UI section
- Add custom rules editor
- Test full edit → export → validate → load cycle
3E. Wire Character into OpenClaw (P4 + P5)
- Copy/symlink
aria.jsonto~/.openclaw/characters/aria.json - Configure OpenClaw to load system prompt from character JSON
- Verify OpenClaw uses Aria's system prompt in responses
Open Decisions to Resolve During These Sprints
| Decision | Options | Recommendation |
|---|---|---|
| Character name / wake word | "Aria" vs custom | Decide during Sprint 3 — affects wake word training later |
| mem0 backend | Chroma vs Qdrant | Start with Chroma (Sprint 1D) — migrate if recall quality is poor |
| HA conversation agent | Default HA vs OpenClaw | Test with HA default first, then wire OpenClaw as custom conversation agent |
What This Unlocks
After these 3 sprints, the system will have:
- End-to-end voice control: speak → understand → act → respond
- Persistent memory: the assistant remembers across sessions
- Automated workflows: morning briefings, notification routing
- Monitoring: all services tracked, alerts on failure
- Remote access: everything reachable via Tailscale
- Character identity: Aria persona loaded into the agent pipeline
- Reboot resilience: everything survives a cold restart
This positions the project to move into Phase 4 (ESP32 hardware) and Phase 5 (VTube Studio visual layer) with confidence that the core pipeline is solid.