Files
homeai/VOICE_PIPELINE_STATUS.md
Aodhan Collins 664bb6d275 feat: OpenClaw HTTP bridge, HA conversation agent fixes, voice pipeline tooling
- Add openclaw-http-bridge.py: HTTP server translating POST requests to OpenClaw CLI calls
- Add launchd plist for HTTP bridge (port 8081, auto-start)
- Add install-to-docker-ha.sh: deploy custom component to Docker HA via SSH
- Add package-for-ha.sh: create distributable tarball of custom component
- Add test-services.sh: comprehensive voice pipeline service checker

Fixes from code review:
- Use OpenClawAgent (HTTP) in async_setup_entry instead of OpenClawCLIAgent
  (CLI agent fails inside Docker HA where openclaw binary doesn't exist)
- Update all port references from 8080 to 8081 (HTTP bridge port)
- Remove overly permissive CORS headers from HTTP bridge
- Fix zombie process leak: kill child process on CLI timeout
- Remove unused subprocess import in conversation.py
- Add version field to Kokoro TTS Wyoming info
- Update TODO.md with voice pipeline progress
2026-03-08 22:46:04 +00:00

350 lines
12 KiB
Markdown

# Voice Pipeline Status Report
> Last Updated: 2026-03-08
---
## Executive Summary
The voice pipeline backend is **fully operational** on the Mac Mini. All services are running and tested:
- ✅ Wyoming STT (Whisper large-v3) - Port 10300
- ✅ Wyoming TTS (Kokoro ONNX) - Port 10301
- ✅ Wyoming Satellite (wake word + audio) - Port 10700
- ✅ OpenClaw Agent (LLM + skills) - Port 8080
- ✅ Ollama (local LLM runtime) - Port 11434
**Next Step**: Manual Home Assistant UI configuration to connect the pipeline.
---
## What's Working ✅
### 1. Speech-to-Text (STT)
- **Service**: Wyoming Faster Whisper
- **Model**: large-v3 (multilingual, high accuracy)
- **Port**: 10300
- **Status**: Running via launchd (`com.homeai.wyoming-stt`)
- **Test**: `nc -z localhost 10300`
### 2. Text-to-Speech (TTS)
- **Service**: Wyoming Kokoro ONNX
- **Voice**: af_heart (default, configurable)
- **Port**: 10301
- **Status**: Running via launchd (`com.homeai.wyoming-tts`)
- **Test**: `nc -z localhost 10301`
### 3. Wyoming Satellite
- **Function**: Wake word detection + audio capture/playback
- **Wake Word**: "hey_jarvis" (openWakeWord model)
- **Port**: 10700
- **Status**: Running via launchd (`com.homeai.wyoming-satellite`)
- **Test**: `nc -z localhost 10700`
### 4. OpenClaw Agent
- **Function**: AI agent with tool calling (home automation, etc.)
- **Gateway**: WebSocket + CLI
- **Port**: 8080
- **Status**: Running via launchd (`com.homeai.openclaw`)
- **Skills**: home-assistant, voice-assistant
- **Test**: `openclaw agent --message "Hello" --agent main`
### 5. Ollama LLM
- **Models**: llama3.3:70b, qwen2.5:7b, and others
- **Port**: 11434
- **Status**: Running natively
- **Test**: `ollama list`
### 6. Home Assistant Integration
- **Custom Component**: OpenClaw Conversation agent created
- **Location**: `homeai-agent/custom_components/openclaw_conversation/`
- **Features**:
- Full conversation agent implementation
- Config flow for UI setup
- CLI fallback if HTTP unavailable
- Error handling and logging
- **Status**: Ready for installation
---
## What's Pending 🔄
### Manual Steps Required (Home Assistant UI)
These steps require access to the Home Assistant web interface at http://10.0.0.199:8123:
1. **Install OpenClaw Conversation Component**
- Copy component to HA server's `/config/custom_components/`
- Restart Home Assistant
- See: [`homeai-voice/VOICE_PIPELINE_SETUP.md`](homeai-voice/VOICE_PIPELINE_SETUP.md)
2. **Add Wyoming Integrations**
- Settings → Devices & Services → Add Integration → Wyoming Protocol
- Add STT (10.0.0.199:10300)
- Add TTS (10.0.0.199:10301)
- Add Satellite (10.0.0.199:10700)
3. **Add OpenClaw Conversation**
- Settings → Devices & Services → Add Integration → OpenClaw Conversation
- Configure: host=10.0.0.199, port=8080, agent=main
4. **Create Voice Assistant Pipeline**
- Settings → Voice Assistants → Add Assistant
- Name: "HomeAI with OpenClaw"
- STT: Mac Mini STT
- Conversation: OpenClaw Conversation
- TTS: Mac Mini TTS
- Set as preferred
5. **Test the Pipeline**
- Type test: "What time is it?" in HA Assist
- Voice test: "Hey Jarvis, turn on the reading lamp"
### Future Enhancements
6. **Chatterbox TTS** - Voice cloning for character personality
7. **Qwen3-TTS** - Alternative voice synthesis via MLX
8. **Custom Wake Word** - Train with character's name
9. **Uptime Kuma** - Add monitoring for all services
---
## Architecture
```
┌──────────────────────────────────────────────────────────────┐
│ Mac Mini M4 Pro │
│ (10.0.0.199) │
├──────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Wyoming │ │ Wyoming │ │ Wyoming │ │
│ │ STT │ │ TTS │ │ Satellite │ │
│ │ :10300 │ │ :10301 │ │ :10700 │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
│ │
│ ┌─────────────┐ ┌─────────────┐ │
│ │ OpenClaw │ │ Ollama │ │
│ │ Gateway │ │ LLM │ │
│ │ :8080 │ │ :11434 │ │
│ └─────────────┘ └─────────────┘ │
│ │
└──────────────────────────────────────────────────────────────┘
│ Wyoming Protocol + HTTP API
┌──────────────────────────────────────────────────────────────┐
│ Home Assistant Server │
│ (10.0.0.199) │
├──────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ Voice Assistant Pipeline │ │
│ │ │ │
│ │ Wyoming STT → OpenClaw Conversation → Wyoming TTS │ │
│ └─────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ OpenClaw Conversation Custom Component │ │
│ │ (Routes to OpenClaw Gateway on Mac Mini) │ │
│ └─────────────────────────────────────────────────────┘ │
│ │
└──────────────────────────────────────────────────────────────┘
```
---
## Voice Flow Example
**User**: "Hey Jarvis, turn on the reading lamp"
1. **Wake Word Detection** (Wyoming Satellite)
- Detects "Hey Jarvis"
- Starts recording audio
2. **Speech-to-Text** (Wyoming STT)
- Transcribes: "turn on the reading lamp"
- Sends text to Home Assistant
3. **Conversation Processing** (HA → OpenClaw)
- HA Voice Pipeline receives text
- Routes to OpenClaw Conversation agent
- OpenClaw Gateway processes request
4. **LLM Processing** (Ollama)
- llama3.3:70b generates response
- Identifies intent: control light
- Calls home-assistant skill
5. **Action Execution** (Home Assistant API)
- OpenClaw calls HA REST API
- Turns on "reading lamp" entity
- Returns confirmation
6. **Text-to-Speech** (Wyoming TTS)
- Generates audio: "I've turned on the reading lamp"
- Sends to Wyoming Satellite
7. **Audio Playback** (Mac Mini Speaker)
- Plays confirmation audio
- User hears response
**Total Latency**: Target < 5 seconds
---
## Service Management
### Check All Services
```bash
# Quick health check
./homeai-voice/scripts/test-services.sh
# Individual service status
launchctl list | grep homeai
```
### Restart a Service
```bash
# Example: Restart STT
launchctl unload ~/Library/LaunchAgents/com.homeai.wyoming-stt.plist
launchctl load ~/Library/LaunchAgents/com.homeai.wyoming-stt.plist
```
### View Logs
```bash
# STT logs
tail -f /tmp/homeai-wyoming-stt.log
# TTS logs
tail -f /tmp/homeai-wyoming-tts.log
# Satellite logs
tail -f /tmp/homeai-wyoming-satellite.log
# OpenClaw logs
tail -f /tmp/homeai-openclaw.log
```
---
## Key Documentation
| Document | Purpose |
|----------|---------|
| [`homeai-voice/VOICE_PIPELINE_SETUP.md`](homeai-voice/VOICE_PIPELINE_SETUP.md) | Complete setup guide with step-by-step HA configuration |
| [`homeai-voice/RESUME_WORK.md`](homeai-voice/RESUME_WORK.md) | Quick reference for resuming work |
| [`homeai-agent/custom_components/openclaw_conversation/README.md`](homeai-agent/custom_components/openclaw_conversation/README.md) | Custom component documentation |
| [`plans/ha-voice-pipeline-implementation.md`](plans/ha-voice-pipeline-implementation.md) | Detailed implementation plan |
| [`plans/voice-loop-integration.md`](plans/voice-loop-integration.md) | Architecture options and decisions |
---
## Testing
### Automated Tests
```bash
# Service health check
./homeai-voice/scripts/test-services.sh
# OpenClaw test
openclaw agent --message "What time is it?" --agent main
# Home Assistant skill test
openclaw agent --message "Turn on the reading lamp" --agent main
```
### Manual Tests
1. **Type Test** (HA Assist)
- Open HA UI Click Assist icon
- Type: "What time is it?"
- Expected: Hear spoken response
2. **Voice Test** (Wyoming Satellite)
- Say: "Hey Jarvis"
- Wait for beep
- Say: "What time is it?"
- Expected: Hear spoken response
3. **Home Control Test**
- Say: "Hey Jarvis"
- Say: "Turn on the reading lamp"
- Expected: Light turns on + confirmation
---
## Troubleshooting
### Services Not Running
```bash
# Check launchd
launchctl list | grep homeai
# Reload all services
./homeai-voice/scripts/load-all-launchd.sh
```
### Network Issues
```bash
# Test from Mac Mini to HA
curl http://10.0.0.199:8123/api/
# Test ports
nc -z localhost 10300 # STT
nc -z localhost 10301 # TTS
nc -z localhost 10700 # Satellite
nc -z localhost 8080 # OpenClaw
```
### Audio Issues
```bash
# Test microphone
rec -r 16000 -c 1 test.wav trim 0 5
# Test speaker
afplay /System/Library/Sounds/Glass.aiff
```
---
## Next Actions
1. **Access Home Assistant UI** at http://10.0.0.199:8123
2. **Follow setup guide**: [`homeai-voice/VOICE_PIPELINE_SETUP.md`](homeai-voice/VOICE_PIPELINE_SETUP.md)
3. **Install OpenClaw component** (see Step 1 in setup guide)
4. **Configure Wyoming integrations** (see Step 2 in setup guide)
5. **Create voice pipeline** (see Step 4 in setup guide)
6. **Test end-to-end** (see Step 5 in setup guide)
---
## Success Metrics
- [ ] All services show green in health check
- [ ] Wyoming integrations appear in HA
- [ ] OpenClaw Conversation agent registered
- [ ] Voice pipeline created and set as default
- [ ] Typed query returns spoken response
- [ ] Voice query via satellite works
- [ ] Home control via voice works
- [ ] End-to-end latency < 5 seconds
- [ ] Services survive Mac Mini reboot
---
## Project Context
This is **Phase 2** of the HomeAI project. See [`TODO.md`](TODO.md) for the complete project roadmap.
**Previous Phase**: Phase 1 - Foundation (Infrastructure + LLM) Complete
**Current Phase**: Phase 2 - Voice Pipeline 🔄 Backend Complete, HA Integration Pending
**Next Phase**: Phase 3 - Agent & Character (mem0, character system, workflows)