feat: upgrade voice pipeline — MLX Whisper STT (20x faster), Qwen3.5 MoE LLM, fix HA tool calling
- Replace faster-whisper with wyoming-mlx-whisper (whisper-large-v3-turbo, MLX Metal GPU) STT latency: 8.4s → 400ms for short voice commands - Add Qwen3.5-35B-A3B (MoE, 3B active params, Q8_0) to Ollama — 26.7 tok/s vs 5.4 tok/s (70B) - Add model preload launchd service to pin voice model in VRAM permanently - Fix HA tool calling: set commands.native=true, symlink ha-ctl to PATH - Add pipeline benchmark script (STT/LLM/TTS latency profiling) - Add service restart buttons and STT endpoint to dashboard - Bind Vite dev server to 0.0.0.0 for LAN access Total estimated pipeline latency: ~27s → ~4s Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
10
TODO.md
10
TODO.md
@@ -25,9 +25,11 @@
|
||||
- [x] Write and load launchd plist (`com.homeai.ollama.plist`) — `/opt/homebrew/bin/ollama`
|
||||
- [x] Register local GGUF models via Modelfiles (no download): llama3.3:70b, qwen3:32b, codestral:22b, qwen2.5:7b
|
||||
- [x] Register additional models: EVA-LLaMA-3.33-70B, Midnight-Miqu-70B, QwQ-32B, Qwen3.5-35B, Qwen3-Coder-30B, Qwen3-VL-30B, GLM-4.6V-Flash, DeepSeek-R1-8B, gemma-3-27b
|
||||
- [x] Add qwen3.5:35b-a3b (MoE, Q8_0) — 26.7 tok/s, recommended for voice pipeline
|
||||
- [x] Write model preload script + launchd service (keeps voice model in VRAM permanently)
|
||||
- [x] Deploy Open WebUI via Docker compose (port 3030)
|
||||
- [x] Verify Open WebUI connected to Ollama, all models available
|
||||
- [ ] Run `scripts/benchmark.sh` — record results in `benchmark-results.md`
|
||||
- [x] Run pipeline benchmark (homeai-voice/scripts/benchmark_pipeline.py) — STT/LLM/TTS latency profiled
|
||||
- [ ] Add Ollama + Open WebUI to Uptime Kuma monitors
|
||||
|
||||
---
|
||||
@@ -37,6 +39,7 @@
|
||||
### P3 · homeai-voice
|
||||
|
||||
- [x] Install `wyoming-faster-whisper` — model: faster-whisper-large-v3 (auto-downloaded)
|
||||
- [x] Upgrade STT to wyoming-mlx-whisper (whisper-large-v3-turbo, MLX Metal GPU) — 20x faster (8s → 400ms)
|
||||
- [x] Install Kokoro ONNX TTS — models at `~/models/kokoro/`
|
||||
- [x] Write Wyoming-Kokoro adapter server (`homeai-voice/tts/wyoming_kokoro_server.py`)
|
||||
- [x] Write + load launchd plists for Wyoming STT (10300) and TTS (10301)
|
||||
@@ -67,10 +70,11 @@
|
||||
- [x] Fix context window: set `contextWindow=32768` for llama3.3:70b in `openclaw.json`
|
||||
- [x] Fix Llama 3.3 Modelfile: add tool-calling TEMPLATE block
|
||||
- [x] Verify `openclaw agent --message "..." --agent main` → completed
|
||||
- [x] Write `skills/home-assistant` SKILL.md — HA REST API control
|
||||
- [x] Write `skills/home-assistant` SKILL.md — HA REST API control via ha-ctl CLI
|
||||
- [x] Write `skills/voice-assistant` SKILL.md — voice response style guide
|
||||
- [x] Wire HASS_TOKEN — create `~/.homeai/hass_token` or set env in launchd plist
|
||||
- [x] Test home-assistant skill: "turn on/off the reading lamp"
|
||||
- [x] Fix HA tool calling: set commands.native=true, symlink ha-ctl to PATH, update TOOLS.md
|
||||
- [x] Test home-assistant skill: "turn on/off the reading lamp" — verified exec→ha-ctl→HA action
|
||||
- [x] Set up mem0 with Chroma backend, test semantic recall
|
||||
- [x] Write memory backup launchd job
|
||||
- [x] Build morning briefing n8n workflow
|
||||
|
||||
Reference in New Issue
Block a user