- Add openclaw-http-bridge.py: HTTP server translating POST requests to OpenClaw CLI calls - Add launchd plist for HTTP bridge (port 8081, auto-start) - Add install-to-docker-ha.sh: deploy custom component to Docker HA via SSH - Add package-for-ha.sh: create distributable tarball of custom component - Add test-services.sh: comprehensive voice pipeline service checker Fixes from code review: - Use OpenClawAgent (HTTP) in async_setup_entry instead of OpenClawCLIAgent (CLI agent fails inside Docker HA where openclaw binary doesn't exist) - Update all port references from 8080 to 8081 (HTTP bridge port) - Remove overly permissive CORS headers from HTTP bridge - Fix zombie process leak: kill child process on CLI timeout - Remove unused subprocess import in conversation.py - Add version field to Kokoro TTS Wyoming info - Update TODO.md with voice pipeline progress
350 lines
12 KiB
Markdown
350 lines
12 KiB
Markdown
# Voice Pipeline Status Report
|
|
|
|
> Last Updated: 2026-03-08
|
|
|
|
---
|
|
|
|
## Executive Summary
|
|
|
|
The voice pipeline backend is **fully operational** on the Mac Mini. All services are running and tested:
|
|
|
|
- ✅ Wyoming STT (Whisper large-v3) - Port 10300
|
|
- ✅ Wyoming TTS (Kokoro ONNX) - Port 10301
|
|
- ✅ Wyoming Satellite (wake word + audio) - Port 10700
|
|
- ✅ OpenClaw Agent (LLM + skills) - Port 8080
|
|
- ✅ Ollama (local LLM runtime) - Port 11434
|
|
|
|
**Next Step**: Manual Home Assistant UI configuration to connect the pipeline.
|
|
|
|
---
|
|
|
|
## What's Working ✅
|
|
|
|
### 1. Speech-to-Text (STT)
|
|
- **Service**: Wyoming Faster Whisper
|
|
- **Model**: large-v3 (multilingual, high accuracy)
|
|
- **Port**: 10300
|
|
- **Status**: Running via launchd (`com.homeai.wyoming-stt`)
|
|
- **Test**: `nc -z localhost 10300` ✓
|
|
|
|
### 2. Text-to-Speech (TTS)
|
|
- **Service**: Wyoming Kokoro ONNX
|
|
- **Voice**: af_heart (default, configurable)
|
|
- **Port**: 10301
|
|
- **Status**: Running via launchd (`com.homeai.wyoming-tts`)
|
|
- **Test**: `nc -z localhost 10301` ✓
|
|
|
|
### 3. Wyoming Satellite
|
|
- **Function**: Wake word detection + audio capture/playback
|
|
- **Wake Word**: "hey_jarvis" (openWakeWord model)
|
|
- **Port**: 10700
|
|
- **Status**: Running via launchd (`com.homeai.wyoming-satellite`)
|
|
- **Test**: `nc -z localhost 10700` ✓
|
|
|
|
### 4. OpenClaw Agent
|
|
- **Function**: AI agent with tool calling (home automation, etc.)
|
|
- **Gateway**: WebSocket + CLI
|
|
- **Port**: 8080
|
|
- **Status**: Running via launchd (`com.homeai.openclaw`)
|
|
- **Skills**: home-assistant, voice-assistant
|
|
- **Test**: `openclaw agent --message "Hello" --agent main` ✓
|
|
|
|
### 5. Ollama LLM
|
|
- **Models**: llama3.3:70b, qwen2.5:7b, and others
|
|
- **Port**: 11434
|
|
- **Status**: Running natively
|
|
- **Test**: `ollama list` ✓
|
|
|
|
### 6. Home Assistant Integration
|
|
- **Custom Component**: OpenClaw Conversation agent created
|
|
- **Location**: `homeai-agent/custom_components/openclaw_conversation/`
|
|
- **Features**:
|
|
- Full conversation agent implementation
|
|
- Config flow for UI setup
|
|
- CLI fallback if HTTP unavailable
|
|
- Error handling and logging
|
|
- **Status**: Ready for installation
|
|
|
|
---
|
|
|
|
## What's Pending 🔄
|
|
|
|
### Manual Steps Required (Home Assistant UI)
|
|
|
|
These steps require access to the Home Assistant web interface at http://10.0.0.199:8123:
|
|
|
|
1. **Install OpenClaw Conversation Component**
|
|
- Copy component to HA server's `/config/custom_components/`
|
|
- Restart Home Assistant
|
|
- See: [`homeai-voice/VOICE_PIPELINE_SETUP.md`](homeai-voice/VOICE_PIPELINE_SETUP.md)
|
|
|
|
2. **Add Wyoming Integrations**
|
|
- Settings → Devices & Services → Add Integration → Wyoming Protocol
|
|
- Add STT (10.0.0.199:10300)
|
|
- Add TTS (10.0.0.199:10301)
|
|
- Add Satellite (10.0.0.199:10700)
|
|
|
|
3. **Add OpenClaw Conversation**
|
|
- Settings → Devices & Services → Add Integration → OpenClaw Conversation
|
|
- Configure: host=10.0.0.199, port=8080, agent=main
|
|
|
|
4. **Create Voice Assistant Pipeline**
|
|
- Settings → Voice Assistants → Add Assistant
|
|
- Name: "HomeAI with OpenClaw"
|
|
- STT: Mac Mini STT
|
|
- Conversation: OpenClaw Conversation
|
|
- TTS: Mac Mini TTS
|
|
- Set as preferred
|
|
|
|
5. **Test the Pipeline**
|
|
- Type test: "What time is it?" in HA Assist
|
|
- Voice test: "Hey Jarvis, turn on the reading lamp"
|
|
|
|
### Future Enhancements
|
|
|
|
6. **Chatterbox TTS** - Voice cloning for character personality
|
|
7. **Qwen3-TTS** - Alternative voice synthesis via MLX
|
|
8. **Custom Wake Word** - Train with character's name
|
|
9. **Uptime Kuma** - Add monitoring for all services
|
|
|
|
---
|
|
|
|
## Architecture
|
|
|
|
```
|
|
┌──────────────────────────────────────────────────────────────┐
|
|
│ Mac Mini M4 Pro │
|
|
│ (10.0.0.199) │
|
|
├──────────────────────────────────────────────────────────────┤
|
|
│ │
|
|
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
|
|
│ │ Wyoming │ │ Wyoming │ │ Wyoming │ │
|
|
│ │ STT │ │ TTS │ │ Satellite │ │
|
|
│ │ :10300 │ │ :10301 │ │ :10700 │ │
|
|
│ └─────────────┘ └─────────────┘ └─────────────┘ │
|
|
│ │
|
|
│ ┌─────────────┐ ┌─────────────┐ │
|
|
│ │ OpenClaw │ │ Ollama │ │
|
|
│ │ Gateway │ │ LLM │ │
|
|
│ │ :8080 │ │ :11434 │ │
|
|
│ └─────────────┘ └─────────────┘ │
|
|
│ │
|
|
└──────────────────────────────────────────────────────────────┘
|
|
▲
|
|
│ Wyoming Protocol + HTTP API
|
|
│
|
|
┌──────────────────────────────────────────────────────────────┐
|
|
│ Home Assistant Server │
|
|
│ (10.0.0.199) │
|
|
├──────────────────────────────────────────────────────────────┤
|
|
│ │
|
|
│ ┌─────────────────────────────────────────────────────┐ │
|
|
│ │ Voice Assistant Pipeline │ │
|
|
│ │ │ │
|
|
│ │ Wyoming STT → OpenClaw Conversation → Wyoming TTS │ │
|
|
│ └─────────────────────────────────────────────────────┘ │
|
|
│ │
|
|
│ ┌─────────────────────────────────────────────────────┐ │
|
|
│ │ OpenClaw Conversation Custom Component │ │
|
|
│ │ (Routes to OpenClaw Gateway on Mac Mini) │ │
|
|
│ └─────────────────────────────────────────────────────┘ │
|
|
│ │
|
|
└──────────────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
---
|
|
|
|
## Voice Flow Example
|
|
|
|
**User**: "Hey Jarvis, turn on the reading lamp"
|
|
|
|
1. **Wake Word Detection** (Wyoming Satellite)
|
|
- Detects "Hey Jarvis"
|
|
- Starts recording audio
|
|
|
|
2. **Speech-to-Text** (Wyoming STT)
|
|
- Transcribes: "turn on the reading lamp"
|
|
- Sends text to Home Assistant
|
|
|
|
3. **Conversation Processing** (HA → OpenClaw)
|
|
- HA Voice Pipeline receives text
|
|
- Routes to OpenClaw Conversation agent
|
|
- OpenClaw Gateway processes request
|
|
|
|
4. **LLM Processing** (Ollama)
|
|
- llama3.3:70b generates response
|
|
- Identifies intent: control light
|
|
- Calls home-assistant skill
|
|
|
|
5. **Action Execution** (Home Assistant API)
|
|
- OpenClaw calls HA REST API
|
|
- Turns on "reading lamp" entity
|
|
- Returns confirmation
|
|
|
|
6. **Text-to-Speech** (Wyoming TTS)
|
|
- Generates audio: "I've turned on the reading lamp"
|
|
- Sends to Wyoming Satellite
|
|
|
|
7. **Audio Playback** (Mac Mini Speaker)
|
|
- Plays confirmation audio
|
|
- User hears response
|
|
|
|
**Total Latency**: Target < 5 seconds
|
|
|
|
---
|
|
|
|
## Service Management
|
|
|
|
### Check All Services
|
|
|
|
```bash
|
|
# Quick health check
|
|
./homeai-voice/scripts/test-services.sh
|
|
|
|
# Individual service status
|
|
launchctl list | grep homeai
|
|
```
|
|
|
|
### Restart a Service
|
|
|
|
```bash
|
|
# Example: Restart STT
|
|
launchctl unload ~/Library/LaunchAgents/com.homeai.wyoming-stt.plist
|
|
launchctl load ~/Library/LaunchAgents/com.homeai.wyoming-stt.plist
|
|
```
|
|
|
|
### View Logs
|
|
|
|
```bash
|
|
# STT logs
|
|
tail -f /tmp/homeai-wyoming-stt.log
|
|
|
|
# TTS logs
|
|
tail -f /tmp/homeai-wyoming-tts.log
|
|
|
|
# Satellite logs
|
|
tail -f /tmp/homeai-wyoming-satellite.log
|
|
|
|
# OpenClaw logs
|
|
tail -f /tmp/homeai-openclaw.log
|
|
```
|
|
|
|
---
|
|
|
|
## Key Documentation
|
|
|
|
| Document | Purpose |
|
|
|----------|---------|
|
|
| [`homeai-voice/VOICE_PIPELINE_SETUP.md`](homeai-voice/VOICE_PIPELINE_SETUP.md) | Complete setup guide with step-by-step HA configuration |
|
|
| [`homeai-voice/RESUME_WORK.md`](homeai-voice/RESUME_WORK.md) | Quick reference for resuming work |
|
|
| [`homeai-agent/custom_components/openclaw_conversation/README.md`](homeai-agent/custom_components/openclaw_conversation/README.md) | Custom component documentation |
|
|
| [`plans/ha-voice-pipeline-implementation.md`](plans/ha-voice-pipeline-implementation.md) | Detailed implementation plan |
|
|
| [`plans/voice-loop-integration.md`](plans/voice-loop-integration.md) | Architecture options and decisions |
|
|
|
|
---
|
|
|
|
## Testing
|
|
|
|
### Automated Tests
|
|
|
|
```bash
|
|
# Service health check
|
|
./homeai-voice/scripts/test-services.sh
|
|
|
|
# OpenClaw test
|
|
openclaw agent --message "What time is it?" --agent main
|
|
|
|
# Home Assistant skill test
|
|
openclaw agent --message "Turn on the reading lamp" --agent main
|
|
```
|
|
|
|
### Manual Tests
|
|
|
|
1. **Type Test** (HA Assist)
|
|
- Open HA UI → Click Assist icon
|
|
- Type: "What time is it?"
|
|
- Expected: Hear spoken response
|
|
|
|
2. **Voice Test** (Wyoming Satellite)
|
|
- Say: "Hey Jarvis"
|
|
- Wait for beep
|
|
- Say: "What time is it?"
|
|
- Expected: Hear spoken response
|
|
|
|
3. **Home Control Test**
|
|
- Say: "Hey Jarvis"
|
|
- Say: "Turn on the reading lamp"
|
|
- Expected: Light turns on + confirmation
|
|
|
|
---
|
|
|
|
## Troubleshooting
|
|
|
|
### Services Not Running
|
|
|
|
```bash
|
|
# Check launchd
|
|
launchctl list | grep homeai
|
|
|
|
# Reload all services
|
|
./homeai-voice/scripts/load-all-launchd.sh
|
|
```
|
|
|
|
### Network Issues
|
|
|
|
```bash
|
|
# Test from Mac Mini to HA
|
|
curl http://10.0.0.199:8123/api/
|
|
|
|
# Test ports
|
|
nc -z localhost 10300 # STT
|
|
nc -z localhost 10301 # TTS
|
|
nc -z localhost 10700 # Satellite
|
|
nc -z localhost 8080 # OpenClaw
|
|
```
|
|
|
|
### Audio Issues
|
|
|
|
```bash
|
|
# Test microphone
|
|
rec -r 16000 -c 1 test.wav trim 0 5
|
|
|
|
# Test speaker
|
|
afplay /System/Library/Sounds/Glass.aiff
|
|
```
|
|
|
|
---
|
|
|
|
## Next Actions
|
|
|
|
1. **Access Home Assistant UI** at http://10.0.0.199:8123
|
|
2. **Follow setup guide**: [`homeai-voice/VOICE_PIPELINE_SETUP.md`](homeai-voice/VOICE_PIPELINE_SETUP.md)
|
|
3. **Install OpenClaw component** (see Step 1 in setup guide)
|
|
4. **Configure Wyoming integrations** (see Step 2 in setup guide)
|
|
5. **Create voice pipeline** (see Step 4 in setup guide)
|
|
6. **Test end-to-end** (see Step 5 in setup guide)
|
|
|
|
---
|
|
|
|
## Success Metrics
|
|
|
|
- [ ] All services show green in health check
|
|
- [ ] Wyoming integrations appear in HA
|
|
- [ ] OpenClaw Conversation agent registered
|
|
- [ ] Voice pipeline created and set as default
|
|
- [ ] Typed query returns spoken response
|
|
- [ ] Voice query via satellite works
|
|
- [ ] Home control via voice works
|
|
- [ ] End-to-end latency < 5 seconds
|
|
- [ ] Services survive Mac Mini reboot
|
|
|
|
---
|
|
|
|
## Project Context
|
|
|
|
This is **Phase 2** of the HomeAI project. See [`TODO.md`](TODO.md) for the complete project roadmap.
|
|
|
|
**Previous Phase**: Phase 1 - Foundation (Infrastructure + LLM) ✅ Complete
|
|
**Current Phase**: Phase 2 - Voice Pipeline 🔄 Backend Complete, HA Integration Pending
|
|
**Next Phase**: Phase 3 - Agent & Character (mem0, character system, workflows)
|