Files
homeai/plans/next-steps.md
Aodhan Collins 6a0bae2a0b feat(phase-04): Wyoming Satellite integration + OpenClaw HA components
## Voice Pipeline (P3)
- Replace openWakeWord daemon with Wyoming Satellite approach
- Add Wyoming Satellite service on port 10700 for HA voice pipeline
- Update setup.sh with cross-platform sed compatibility (macOS/Linux)
- Add version field to Kokoro TTS voice info
- Update launchd service loader to use Wyoming Satellite

## Home Assistant Integration (P4)
- Add custom conversation agent component (openclaw_conversation)
  - Fix: Use IntentResponse instead of plain strings (HA API requirement)
  - Support both HTTP API and CLI fallback modes
  - Config flow for easy HA UI setup
- Add OpenClaw bridge scripts (Python + Bash)
- Add ha-ctl utility for HA entity control
  - Fix: Use context manager for token file reading
- Add HA configuration examples and documentation

## Infrastructure
- Add mem0 backup automation (launchd + script)
- Add n8n workflow templates (morning briefing, notification router)
- Add VS Code workspace configuration
- Reorganize model files into categorized folders:
  - lmstudio-community/
  - mlx-community/
  - bartowski/
  - mradermacher/

## Documentation
- Update PROJECT_PLAN.md with Wyoming Satellite architecture
- Update TODO.md with completed Wyoming integration tasks
- Add OPENCLAW_INTEGRATION.md for HA setup guide

## Testing
- Verified Wyoming services running (STT:10300, TTS:10301, Satellite:10700)
- Verified OpenClaw CLI accessibility
- Confirmed cross-platform compatibility fixes
2026-03-08 02:06:37 +00:00

202 lines
7.9 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# HomeAI — Next Steps Plan
> Created: 2026-03-07 | Priority: Voice Loop → Foundation Hardening → Character System
---
## Current State Summary
| Sub-Project | Status | Done / Total |
|---|---|---|
| P1 homeai-infra | Core done, tail items remain | 6 / 9 |
| P2 homeai-llm | Core done, tail items remain | 6 / 8 |
| P3 homeai-voice | STT + TTS + wake word running, HA integration pending | 7 / 13 |
| P4 homeai-agent | OpenClaw + HA skill working, mem0 + n8n pending | 10 / 16 |
| P5 homeai-character | Not started | 0 / 11 |
| P6P8 | Not started | 0 / * |
**Key milestone reached:** OpenClaw can receive text, call `qwen2.5:7b` via Ollama, execute tool calls, and control Home Assistant entities. The voice pipeline components (STT, TTS, wake word) are all running as launchd services.
**Critical gap:** The voice pipeline is not yet connected through Home Assistant to the agent. The pieces exist but the end-to-end flow is untested.
---
## Sprint 1 — Complete the Voice → Agent → HA Loop
**Goal:** Speak a command → hear a spoken response + see the HA action execute.
This is the highest-value work because it closes the core loop that every future feature builds on.
### Tasks
#### 1A. Finish HA Wyoming Integration (P3)
The Wyoming STT (port 10300) and TTS (port 10301) services are running. They need to be registered in Home Assistant.
- [ ] Open HA UI → Settings → Integrations → Add Integration → Wyoming Protocol
- [ ] Add STT provider: host `10.0.0.199` (or `localhost` if HA is on same machine), port `10300`
- [ ] Add TTS provider: host `10.0.0.199`, port `10301`
- [ ] Verify both appear as STT/TTS providers in HA
#### 1B. Create HA Voice Assistant Pipeline (P3)
- [ ] HA → Settings → Voice Assistants → Add Assistant
- [ ] Configure: STT = Wyoming Whisper, TTS = Wyoming Kokoro, Conversation Agent = Home Assistant default (or OpenClaw if wired)
- [ ] Set as default voice assistant pipeline
#### 1C. Test HA Assist via Browser (P3)
- [ ] Open HA dashboard → Assist panel
- [ ] Type a query (e.g. "What time is it?") → verify spoken response plays back
- [ ] Type a device command (e.g. "Turn on the reading lamp") → verify HA executes it
#### 1D. Set Up mem0 with Chroma Backend (P4)
- [ ] Install mem0: `pip install mem0ai`
- [ ] Install chromadb: `pip install chromadb`
- [ ] Pull embedding model: `ollama pull nomic-embed-text`
- [ ] Write mem0 config pointing at Ollama for LLM + embeddings, Chroma for vector store
- [ ] Test: store a memory, recall it via semantic search
- [ ] Verify mem0 data persists at `~/.openclaw/memory/chroma/`
#### 1E. Write Memory Backup launchd Job (P4)
- [ ] Create git repo at `~/.openclaw/memory/` (or a subdirectory)
- [ ] Write backup script: `git add . && git commit -m "mem0 backup $(date)" && git push`
- [ ] Write launchd plist: `com.homeai.mem0-backup.plist` — daily schedule
- [ ] Load plist, verify it runs
#### 1F. Build Morning Briefing n8n Workflow (P4)
- [ ] Verify n8n is running (Docker, deployed in P1)
- [ ] Create workflow: time trigger → fetch weather from HA → compose briefing text → POST to OpenClaw `/speak` endpoint
- [ ] Export workflow JSON to `homeai-agent/workflows/morning-briefing.json`
- [ ] Test: manually trigger → hear spoken briefing
#### 1G. Build Notification Router n8n Workflow (P4)
- [ ] Create workflow: HA webhook trigger → classify urgency → high: TTS immediately, low: queue
- [ ] Export to `homeai-agent/workflows/notification-router.json`
#### 1H. Verify Full Voice → Agent → HA Action Flow (P3 + P4)
- [ ] Trigger wake word ("hey jarvis") via USB mic
- [ ] Speak a command: "Turn on the reading lamp"
- [ ] Verify: wake word detected → audio captured → STT transcribes → OpenClaw receives text → tool call to HA → lamp turns on → TTS response plays back
- [ ] Document any latency issues or failure points
### Sprint 1 Flow Diagram
```mermaid
flowchart LR
A[USB Mic] -->|wake word| B[openWakeWord]
B -->|audio stream| C[Wyoming STT - Whisper]
C -->|transcribed text| D[Home Assistant Pipeline]
D -->|text| E[OpenClaw Agent]
E -->|tool call| F[HA REST API]
F -->|action| G[Smart Device]
E -->|response text| H[Wyoming TTS - Kokoro]
H -->|audio| I[Speaker]
```
---
## Sprint 2 — Foundation Hardening
**Goal:** All services survive a reboot, are monitored, and are remotely accessible.
### Tasks
#### 2A. Install and Configure Tailscale (P1)
- [ ] Install Tailscale on Mac Mini: `brew install tailscale`
- [ ] Authenticate and join Tailnet
- [ ] Verify all services reachable via Tailscale IP (HA, Open WebUI, Portainer, Gitea, n8n, code-server)
- [ ] Document Tailscale IP → service URL mapping
#### 2B. Configure Uptime Kuma Monitors (P1 + P2)
- [ ] Add monitors for: Home Assistant, Portainer, Gitea, code-server, n8n
- [ ] Add monitors for: Ollama API (port 11434), Open WebUI (port 3030)
- [ ] Add monitors for: Wyoming STT (port 10300), Wyoming TTS (port 10301)
- [ ] Add monitor for: OpenClaw (port 8080)
- [ ] Configure mobile push alerts (ntfy or Pushover)
#### 2C. Cold Reboot Verification (P1)
- [ ] Reboot Mac Mini
- [ ] Verify all Docker containers come back up (restart policy: `unless-stopped`)
- [ ] Verify launchd services start: Ollama, Wyoming STT, Wyoming TTS, openWakeWord, OpenClaw
- [ ] Check Uptime Kuma — all monitors green within 2 minutes
- [ ] Document any services that failed to restart and fix
#### 2D. Run LLM Benchmarks (P2)
- [ ] Run `homeai-llm/scripts/benchmark.sh`
- [ ] Record results: tokens/sec for each model (qwen2.5:7b, llama3.3:70b, etc.)
- [ ] Write results to `homeai-llm/benchmark-results.md`
---
## Sprint 3 — Character System (P5)
**Goal:** Character schema defined, default character created, Character Manager UI functional.
### Tasks
#### 3A. Define Character Schema (P5)
- [ ] Write `homeai-character/schema/character.schema.json` (v1) — based on the spec in PLAN.md
- [ ] Write `homeai-character/schema/README.md` documenting each field
#### 3B. Create Default Character (P5)
- [ ] Write `homeai-character/characters/aria.json` with placeholder expression IDs
- [ ] Validate aria.json against schema (manual or script)
#### 3C. Set Up Vite Project (P5)
- [ ] Initialize Vite + React project in `homeai-character/`
- [ ] Install deps: `npm install react react-dom ajv`
- [ ] Move existing `character-manager.jsx` into `src/`
- [ ] Verify dev server runs at `http://localhost:5173`
#### 3D. Wire Character Manager Features (P5)
- [ ] Integrate schema validation on export (ajv)
- [ ] Add expression mapping UI section
- [ ] Add custom rules editor
- [ ] Test full edit → export → validate → load cycle
#### 3E. Wire Character into OpenClaw (P4 + P5)
- [ ] Copy/symlink `aria.json` to `~/.openclaw/characters/aria.json`
- [ ] Configure OpenClaw to load system prompt from character JSON
- [ ] Verify OpenClaw uses Aria's system prompt in responses
---
## Open Decisions to Resolve During These Sprints
| Decision | Options | Recommendation |
|---|---|---|
| Character name / wake word | "Aria" vs custom | Decide during Sprint 3 — affects wake word training later |
| mem0 backend | Chroma vs Qdrant | Start with Chroma (Sprint 1D) — migrate if recall quality is poor |
| HA conversation agent | Default HA vs OpenClaw | Test with HA default first, then wire OpenClaw as custom conversation agent |
---
## What This Unlocks
After these 3 sprints, the system will have:
- **End-to-end voice control**: speak → understand → act → respond
- **Persistent memory**: the assistant remembers across sessions
- **Automated workflows**: morning briefings, notification routing
- **Monitoring**: all services tracked, alerts on failure
- **Remote access**: everything reachable via Tailscale
- **Character identity**: Aria persona loaded into the agent pipeline
- **Reboot resilience**: everything survives a cold restart
This positions the project to move into **Phase 4 (ESP32 hardware)** and **Phase 5 (VTube Studio visual layer)** with confidence that the core pipeline is solid.