feat: upgrade voice pipeline — MLX Whisper STT (20x faster), Qwen3.5 MoE LLM, fix HA tool calling

- Replace faster-whisper with wyoming-mlx-whisper (whisper-large-v3-turbo, MLX Metal GPU) STT latency: 8.4s → 400ms for short voice commands - Add Qwen3.5-35B-A3B (MoE, 3B active params, Q8_0) to Ollama — 26.7 tok/s vs 5.4 tok/s (70B) - Add model preload launchd service to pin voice model in VRAM permanently - Fix HA tool calling: set commands.native=true, symlink ha-ctl to PATH - Add pipeline benchmark script (STT/LLM/TTS latency profiling) - Add service restart buttons and STT endpoint to dashboard - Bind Vite dev server to 0.0.0.0 for LAN access Total estimated pipeline latency: ~27s → ~4s Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-13 18:03:12 +00:00
parent 1bfd7fbd08
commit af6b7bd945
10 changed files with 721 additions and 27 deletions
--- a/homeai-voice/scripts/launchd/com.homeai.wyoming-stt.plist
+++ b/homeai-voice/scripts/launchd/com.homeai.wyoming-stt.plist
@@ -8,21 +8,11 @@

  <key>ProgramArguments</key>
  <array>
-    <string>/Users/aodhan/homeai-voice-env/bin/wyoming-faster-whisper</string>
+    <string>/Users/aodhan/homeai-whisper-mlx-env/bin/wyoming-mlx-whisper</string>
    <string>--uri</string>
    <string>tcp://0.0.0.0:10300</string>
-    <string>--model</string>
-    <string>large-v3</string>
    <string>--language</string>
    <string>en</string>
-    <string>--device</string>
-    <string>cpu</string>
-    <string>--compute-type</string>
-    <string>int8</string>
-    <string>--data-dir</string>
-    <string>/Users/aodhan/models/whisper</string>
-    <string>--download-dir</string>
-    <string>/Users/aodhan/models/whisper</string>
  </array>

  <key>RunAtLoad</key>