## Voice Pipeline (P3) - Replace openWakeWord daemon with Wyoming Satellite approach - Add Wyoming Satellite service on port 10700 for HA voice pipeline - Update setup.sh with cross-platform sed compatibility (macOS/Linux) - Add version field to Kokoro TTS voice info - Update launchd service loader to use Wyoming Satellite ## Home Assistant Integration (P4) - Add custom conversation agent component (openclaw_conversation) - Fix: Use IntentResponse instead of plain strings (HA API requirement) - Support both HTTP API and CLI fallback modes - Config flow for easy HA UI setup - Add OpenClaw bridge scripts (Python + Bash) - Add ha-ctl utility for HA entity control - Fix: Use context manager for token file reading - Add HA configuration examples and documentation ## Infrastructure - Add mem0 backup automation (launchd + script) - Add n8n workflow templates (morning briefing, notification router) - Add VS Code workspace configuration - Reorganize model files into categorized folders: - lmstudio-community/ - mlx-community/ - bartowski/ - mradermacher/ ## Documentation - Update PROJECT_PLAN.md with Wyoming Satellite architecture - Update TODO.md with completed Wyoming integration tasks - Add OPENCLAW_INTEGRATION.md for HA setup guide ## Testing - Verified Wyoming services running (STT:10300, TTS:10301, Satellite:10700) - Verified OpenClaw CLI accessibility - Confirmed cross-platform compatibility fixes
202 lines
7.9 KiB
Markdown
202 lines
7.9 KiB
Markdown
# HomeAI — Next Steps Plan
|
||
|
||
> Created: 2026-03-07 | Priority: Voice Loop → Foundation Hardening → Character System
|
||
|
||
---
|
||
|
||
## Current State Summary
|
||
|
||
| Sub-Project | Status | Done / Total |
|
||
|---|---|---|
|
||
| P1 homeai-infra | Core done, tail items remain | 6 / 9 |
|
||
| P2 homeai-llm | Core done, tail items remain | 6 / 8 |
|
||
| P3 homeai-voice | STT + TTS + wake word running, HA integration pending | 7 / 13 |
|
||
| P4 homeai-agent | OpenClaw + HA skill working, mem0 + n8n pending | 10 / 16 |
|
||
| P5 homeai-character | Not started | 0 / 11 |
|
||
| P6–P8 | Not started | 0 / * |
|
||
|
||
**Key milestone reached:** OpenClaw can receive text, call `qwen2.5:7b` via Ollama, execute tool calls, and control Home Assistant entities. The voice pipeline components (STT, TTS, wake word) are all running as launchd services.
|
||
|
||
**Critical gap:** The voice pipeline is not yet connected through Home Assistant to the agent. The pieces exist but the end-to-end flow is untested.
|
||
|
||
---
|
||
|
||
## Sprint 1 — Complete the Voice → Agent → HA Loop
|
||
|
||
**Goal:** Speak a command → hear a spoken response + see the HA action execute.
|
||
|
||
This is the highest-value work because it closes the core loop that every future feature builds on.
|
||
|
||
### Tasks
|
||
|
||
#### 1A. Finish HA Wyoming Integration (P3)
|
||
|
||
The Wyoming STT (port 10300) and TTS (port 10301) services are running. They need to be registered in Home Assistant.
|
||
|
||
- [ ] Open HA UI → Settings → Integrations → Add Integration → Wyoming Protocol
|
||
- [ ] Add STT provider: host `10.0.0.199` (or `localhost` if HA is on same machine), port `10300`
|
||
- [ ] Add TTS provider: host `10.0.0.199`, port `10301`
|
||
- [ ] Verify both appear as STT/TTS providers in HA
|
||
|
||
#### 1B. Create HA Voice Assistant Pipeline (P3)
|
||
|
||
- [ ] HA → Settings → Voice Assistants → Add Assistant
|
||
- [ ] Configure: STT = Wyoming Whisper, TTS = Wyoming Kokoro, Conversation Agent = Home Assistant default (or OpenClaw if wired)
|
||
- [ ] Set as default voice assistant pipeline
|
||
|
||
#### 1C. Test HA Assist via Browser (P3)
|
||
|
||
- [ ] Open HA dashboard → Assist panel
|
||
- [ ] Type a query (e.g. "What time is it?") → verify spoken response plays back
|
||
- [ ] Type a device command (e.g. "Turn on the reading lamp") → verify HA executes it
|
||
|
||
#### 1D. Set Up mem0 with Chroma Backend (P4)
|
||
|
||
- [ ] Install mem0: `pip install mem0ai`
|
||
- [ ] Install chromadb: `pip install chromadb`
|
||
- [ ] Pull embedding model: `ollama pull nomic-embed-text`
|
||
- [ ] Write mem0 config pointing at Ollama for LLM + embeddings, Chroma for vector store
|
||
- [ ] Test: store a memory, recall it via semantic search
|
||
- [ ] Verify mem0 data persists at `~/.openclaw/memory/chroma/`
|
||
|
||
#### 1E. Write Memory Backup launchd Job (P4)
|
||
|
||
- [ ] Create git repo at `~/.openclaw/memory/` (or a subdirectory)
|
||
- [ ] Write backup script: `git add . && git commit -m "mem0 backup $(date)" && git push`
|
||
- [ ] Write launchd plist: `com.homeai.mem0-backup.plist` — daily schedule
|
||
- [ ] Load plist, verify it runs
|
||
|
||
#### 1F. Build Morning Briefing n8n Workflow (P4)
|
||
|
||
- [ ] Verify n8n is running (Docker, deployed in P1)
|
||
- [ ] Create workflow: time trigger → fetch weather from HA → compose briefing text → POST to OpenClaw `/speak` endpoint
|
||
- [ ] Export workflow JSON to `homeai-agent/workflows/morning-briefing.json`
|
||
- [ ] Test: manually trigger → hear spoken briefing
|
||
|
||
#### 1G. Build Notification Router n8n Workflow (P4)
|
||
|
||
- [ ] Create workflow: HA webhook trigger → classify urgency → high: TTS immediately, low: queue
|
||
- [ ] Export to `homeai-agent/workflows/notification-router.json`
|
||
|
||
#### 1H. Verify Full Voice → Agent → HA Action Flow (P3 + P4)
|
||
|
||
- [ ] Trigger wake word ("hey jarvis") via USB mic
|
||
- [ ] Speak a command: "Turn on the reading lamp"
|
||
- [ ] Verify: wake word detected → audio captured → STT transcribes → OpenClaw receives text → tool call to HA → lamp turns on → TTS response plays back
|
||
- [ ] Document any latency issues or failure points
|
||
|
||
### Sprint 1 Flow Diagram
|
||
|
||
```mermaid
|
||
flowchart LR
|
||
A[USB Mic] -->|wake word| B[openWakeWord]
|
||
B -->|audio stream| C[Wyoming STT - Whisper]
|
||
C -->|transcribed text| D[Home Assistant Pipeline]
|
||
D -->|text| E[OpenClaw Agent]
|
||
E -->|tool call| F[HA REST API]
|
||
F -->|action| G[Smart Device]
|
||
E -->|response text| H[Wyoming TTS - Kokoro]
|
||
H -->|audio| I[Speaker]
|
||
```
|
||
|
||
---
|
||
|
||
## Sprint 2 — Foundation Hardening
|
||
|
||
**Goal:** All services survive a reboot, are monitored, and are remotely accessible.
|
||
|
||
### Tasks
|
||
|
||
#### 2A. Install and Configure Tailscale (P1)
|
||
|
||
- [ ] Install Tailscale on Mac Mini: `brew install tailscale`
|
||
- [ ] Authenticate and join Tailnet
|
||
- [ ] Verify all services reachable via Tailscale IP (HA, Open WebUI, Portainer, Gitea, n8n, code-server)
|
||
- [ ] Document Tailscale IP → service URL mapping
|
||
|
||
#### 2B. Configure Uptime Kuma Monitors (P1 + P2)
|
||
|
||
- [ ] Add monitors for: Home Assistant, Portainer, Gitea, code-server, n8n
|
||
- [ ] Add monitors for: Ollama API (port 11434), Open WebUI (port 3030)
|
||
- [ ] Add monitors for: Wyoming STT (port 10300), Wyoming TTS (port 10301)
|
||
- [ ] Add monitor for: OpenClaw (port 8080)
|
||
- [ ] Configure mobile push alerts (ntfy or Pushover)
|
||
|
||
#### 2C. Cold Reboot Verification (P1)
|
||
|
||
- [ ] Reboot Mac Mini
|
||
- [ ] Verify all Docker containers come back up (restart policy: `unless-stopped`)
|
||
- [ ] Verify launchd services start: Ollama, Wyoming STT, Wyoming TTS, openWakeWord, OpenClaw
|
||
- [ ] Check Uptime Kuma — all monitors green within 2 minutes
|
||
- [ ] Document any services that failed to restart and fix
|
||
|
||
#### 2D. Run LLM Benchmarks (P2)
|
||
|
||
- [ ] Run `homeai-llm/scripts/benchmark.sh`
|
||
- [ ] Record results: tokens/sec for each model (qwen2.5:7b, llama3.3:70b, etc.)
|
||
- [ ] Write results to `homeai-llm/benchmark-results.md`
|
||
|
||
---
|
||
|
||
## Sprint 3 — Character System (P5)
|
||
|
||
**Goal:** Character schema defined, default character created, Character Manager UI functional.
|
||
|
||
### Tasks
|
||
|
||
#### 3A. Define Character Schema (P5)
|
||
|
||
- [ ] Write `homeai-character/schema/character.schema.json` (v1) — based on the spec in PLAN.md
|
||
- [ ] Write `homeai-character/schema/README.md` documenting each field
|
||
|
||
#### 3B. Create Default Character (P5)
|
||
|
||
- [ ] Write `homeai-character/characters/aria.json` with placeholder expression IDs
|
||
- [ ] Validate aria.json against schema (manual or script)
|
||
|
||
#### 3C. Set Up Vite Project (P5)
|
||
|
||
- [ ] Initialize Vite + React project in `homeai-character/`
|
||
- [ ] Install deps: `npm install react react-dom ajv`
|
||
- [ ] Move existing `character-manager.jsx` into `src/`
|
||
- [ ] Verify dev server runs at `http://localhost:5173`
|
||
|
||
#### 3D. Wire Character Manager Features (P5)
|
||
|
||
- [ ] Integrate schema validation on export (ajv)
|
||
- [ ] Add expression mapping UI section
|
||
- [ ] Add custom rules editor
|
||
- [ ] Test full edit → export → validate → load cycle
|
||
|
||
#### 3E. Wire Character into OpenClaw (P4 + P5)
|
||
|
||
- [ ] Copy/symlink `aria.json` to `~/.openclaw/characters/aria.json`
|
||
- [ ] Configure OpenClaw to load system prompt from character JSON
|
||
- [ ] Verify OpenClaw uses Aria's system prompt in responses
|
||
|
||
---
|
||
|
||
## Open Decisions to Resolve During These Sprints
|
||
|
||
| Decision | Options | Recommendation |
|
||
|---|---|---|
|
||
| Character name / wake word | "Aria" vs custom | Decide during Sprint 3 — affects wake word training later |
|
||
| mem0 backend | Chroma vs Qdrant | Start with Chroma (Sprint 1D) — migrate if recall quality is poor |
|
||
| HA conversation agent | Default HA vs OpenClaw | Test with HA default first, then wire OpenClaw as custom conversation agent |
|
||
|
||
---
|
||
|
||
## What This Unlocks
|
||
|
||
After these 3 sprints, the system will have:
|
||
|
||
- **End-to-end voice control**: speak → understand → act → respond
|
||
- **Persistent memory**: the assistant remembers across sessions
|
||
- **Automated workflows**: morning briefings, notification routing
|
||
- **Monitoring**: all services tracked, alerts on failure
|
||
- **Remote access**: everything reachable via Tailscale
|
||
- **Character identity**: Aria persona loaded into the agent pipeline
|
||
- **Reboot resilience**: everything survives a cold restart
|
||
|
||
This positions the project to move into **Phase 4 (ESP32 hardware)** and **Phase 5 (VTube Studio visual layer)** with confidence that the core pipeline is solid.
|