6 Commits

Author SHA1 Message Date
Aodhan Collins
ffc2407289 Merge branch 'openclaw-skills': Music Assistant, Claude LLM, dashboard model tags, setup.sh rewrite 2026-03-18 22:22:09 +00:00
Aodhan Collins
117254d560 feat: Music Assistant, Claude primary LLM, model tag in chat, setup.sh rewrite
- Deploy Music Assistant on Pi (10.0.0.199:8095) with host networking for
  Chromecast mDNS discovery, Spotify + SMB library support
- Switch primary LLM from Ollama to Claude Sonnet 4 (Anthropic API),
  local models remain as fallback
- Add model info tag under each assistant message in dashboard chat,
  persisted in conversation JSON
- Rewrite homeai-agent/setup.sh: loads .env, injects API keys into plists,
  symlinks plists to ~/Library/LaunchAgents/, smoke tests services
- Update install_service() in common.sh to use symlinks instead of copies
- Open UFW ports on Pi for Music Assistant (8095, 8097, 8927)
- Add ANTHROPIC_API_KEY to openclaw + bridge launchd plists

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-18 22:21:28 +00:00
Aodhan Collins
60eb89ea42 feat: character system v2 — schema upgrade, memory system, per-character TTS routing
Character schema v2: background, dialogue_style, appearance, skills, gaze_presets
with automatic v1→v2 migration. LLM-assisted character creation via Character MCP
server. Two-tier memory system (personal per-character + general shared) with
budget-based injection into LLM system prompt. Per-character TTS voice routing via
state file — Wyoming TTS server reads active config to route between Kokoro (local)
and ElevenLabs (cloud PCM 24kHz). Dashboard: memories page, conversation history,
character profile on cards, auto-TTS engine selection from character config.
Also includes VTube Studio expression bridge and ComfyUI API guide.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-17 19:15:46 +00:00
Aodhan Collins
1e52c002c2 feat: Raspberry Pi 5 kitchen satellite — Wyoming voice satellite with ReSpeaker pHAT
Add full Pi 5 satellite setup with ReSpeaker 2-Mics pHAT for kitchen
voice control via Wyoming protocol. Includes satellite_wrapper.py that
monkey-patches WakeStreamingSatellite to fix three compounding bugs:

- TTS echo suppression: mutes wake word detection while speaker plays
- Server writer race fix: checks _writer before streaming, re-arms on None
- Streaming timeout: auto-recovers after 30s if pipeline hangs
- Error recovery: resets streaming state on server Error events

Also includes Pi 5 hardware workarounds (wm8960 overlay, stereo-only
audio wrappers, ALSA mixer calibration) and deploy.sh with fast
iteration commands (--push-wrapper, --test-logs).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-14 20:09:47 +00:00
Aodhan Collins
5f147cae61 Merge branch 'esp32': ESP32-S3-BOX-3 room satellite with voice pipeline
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-13 20:48:09 +00:00
Aodhan Collins
c4cecbd8dc feat: ESP32-S3-BOX-3 room satellite — ESPHome config, OTA deploy, placeholder faces
Living room unit fully working: on-device wake word (hey_jarvis), voice pipeline
via HA (Wyoming STT → OpenClaw → Wyoming TTS), static PNG display states, OTA
updates. Includes deploy.sh for quick OTA with custom image support.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-13 20:48:03 +00:00
63 changed files with 7585 additions and 1096 deletions

View File

@@ -9,6 +9,7 @@ OPENAI_API_KEY=
DEEPSEEK_API_KEY=
GEMINI_API_KEY=
ELEVENLABS_API_KEY=
GAZE_API_KEY=
# ─── Data & Paths ──────────────────────────────────────────────────────────────
DATA_DIR=${HOME}/homeai-data
@@ -40,10 +41,14 @@ OPEN_WEBUI_URL=http://localhost:3030
OLLAMA_PRIMARY_MODEL=llama3.3:70b
OLLAMA_FAST_MODEL=qwen2.5:7b
# Medium model kept warm for voice pipeline (override per persona)
# Used by preload-models.sh keep-warm daemon
HOMEAI_MEDIUM_MODEL=qwen3.5:35b-a3b
# ─── P3: Voice ─────────────────────────────────────────────────────────────────
WYOMING_STT_URL=tcp://localhost:10300
WYOMING_TTS_URL=tcp://localhost:10301
ELEVENLABS_API_KEY= # Create at elevenlabs.io if using elevenlabs TTS engine
# ELEVENLABS_API_KEY is set above in API Keys section
# ─── P4: Agent ─────────────────────────────────────────────────────────────────
OPENCLAW_URL=http://localhost:8080

123
CLAUDE.md
View File

@@ -18,14 +18,16 @@ A self-hosted, always-on personal AI assistant running on a **Mac Mini M4 Pro (6
| Storage | 1TB SSD |
| Network | Gigabit Ethernet |
All AI inference runs locally on this machine. No cloud dependency required (cloud APIs optional).
Primary LLM is Claude Sonnet 4 via Anthropic API. Local Ollama models available as fallback. All other inference (STT, TTS, image gen) runs locally.
---
## Core Stack
### AI & LLM
- **Ollama** — local LLM runtime (target models: Llama 3.3 70B, Qwen 2.5 72B)
- **Claude Sonnet 4** — primary LLM via Anthropic API (`anthropic/claude-sonnet-4-20250514`), used for all agent interactions
- **Ollama** — local LLM runtime (fallback models: Llama 3.3 70B, Qwen 3.5 35B-A3B, Qwen 2.5 7B)
- **Model keep-warm daemon** — `preload-models.sh` runs as a loop, checks every 5 min, re-pins evicted models with `keep_alive=-1`. Keeps `qwen2.5:7b` (small/fast) and `$HOMEAI_MEDIUM_MODEL` (default: `qwen3.5:35b-a3b`) always loaded in VRAM. Medium model is configurable via env var for per-persona model assignment.
- **Open WebUI** — browser-based chat interface, runs as Docker container
### Image Generation
@@ -35,7 +37,8 @@ All AI inference runs locally on this machine. No cloud dependency required (clo
### Speech
- **Whisper.cpp** — speech-to-text, optimised for Apple Silicon/Neural Engine
- **Kokoro TTS** — fast, lightweight text-to-speech (primary, low-latency)
- **Kokoro TTS** — fast, lightweight text-to-speech (primary, low-latency, local)
- **ElevenLabs TTS** — cloud voice cloning/synthesis (per-character voice ID, routed via state file)
- **Chatterbox TTS** — voice cloning engine (Apple Silicon MPS optimised)
- **Qwen3-TTS** — alternative voice cloning via MLX
- **openWakeWord** — always-on wake word detection
@@ -43,17 +46,21 @@ All AI inference runs locally on this machine. No cloud dependency required (clo
### Smart Home
- **Home Assistant** — smart home control platform (Docker)
- **Wyoming Protocol** — bridges Whisper STT + Kokoro/Piper TTS into Home Assistant
- **Music Assistant** — self-hosted music control, integrates with Home Assistant
- **Music Assistant** — self-hosted music control (Docker on Pi at 10.0.0.199:8095), Spotify + SMB library + Chromecast players
- **Snapcast** — multi-room synchronised audio output
### AI Agent / Orchestration
- **OpenClaw** — primary AI agent layer; receives voice commands, calls tools, manages personality
- **OpenClaw Skills** — 13 skills total: home-assistant, image-generation, voice-assistant, vtube-studio, memory, service-monitor, character, routine, music, workflow, gitea, calendar, mode
- **n8n** — visual workflow automation (Docker), chains AI actions
- **mem0** — long-term memory layer for the AI character
- **Character Memory System** — two-tier JSON-based memories (personal per-character + general shared), injected into LLM system prompt with budget truncation
- **Public/Private Mode** — routes requests to local Ollama (private) or cloud LLMs (public) with per-category overrides via `active-mode.json`. Default primary model is Claude Sonnet 4.
### Character & Personality
- **Character Manager** (built — see `character-manager.jsx`) — single config UI for personality, prompts, models, Live2D mappings, and notes
- Character config exports to JSON, consumed by OpenClaw system prompt and pipeline
- **Character Schema v2** — JSON spec with background, dialogue_style, appearance, skills, gaze_presets (v1 auto-migrated)
- **HomeAI Dashboard** — unified web app: character editor, chat, memory manager, service dashboard
- **Character MCP Server** — LLM-assisted character creation via Fandom wiki/Wikipedia lookup (Docker)
- Character config stored as JSON files in `~/homeai-data/characters/`, consumed by bridge for system prompt construction
### Visual Representation
- **VTube Studio** — Live2D model display on desktop (macOS) and mobile (iOS/Android)
@@ -85,60 +92,108 @@ All AI inference runs locally on this machine. No cloud dependency required (clo
ESP32-S3-BOX-3 (room)
→ Wake word detected (openWakeWord, runs locally on device or Mac Mini)
→ Audio streamed to Mac Mini via Wyoming Satellite
→ Whisper.cpp transcribes speech to text
OpenClaw receives text + context
Ollama LLM generates response (with character persona from system prompt)
mem0 updates long-term memory
→ Whisper MLX transcribes speech to text
HA conversation agent → OpenClaw HTTP Bridge
Bridge resolves character (satellite_id → character mapping)
Bridge builds system prompt (profile + memories) and writes TTS config to state file
→ Bridge checks active-mode.json for model routing (private=local, public=cloud)
→ OpenClaw CLI → LLM generates response (Claude Sonnet 4 default, Ollama fallback)
→ Response dispatched:
Kokoro/Chatterbox renders TTS audio
Wyoming TTS reads state file → routes to Kokoro (local) or ElevenLabs (cloud)
→ Audio sent back to ESP32-S3-BOX-3 (spoken response)
→ VTube Studio API triggered (expression + lip sync on desktop/mobile)
→ Home Assistant action called if applicable (lights, music, etc.)
```
### Timeout Strategy
The HTTP bridge checks Ollama `/api/ps` before each request to determine if the LLM is already loaded:
| Layer | Warm (model loaded) | Cold (model loading) |
|---|---|---|
| HA conversation component | 200s | 200s |
| OpenClaw HTTP bridge | 60s | 180s |
| OpenClaw agent | 60s | 60s |
The keep-warm daemon ensures models stay loaded, so cold starts should be rare (only after Ollama restarts or VRAM pressure).
---
## Character System
The AI assistant has a defined personality managed via the Character Manager tool.
The AI assistant has a defined personality managed via the HomeAI Dashboard (character editor + memory manager).
Key config surfaces:
- **System prompt** — injected into every Ollama request
- **Voice clone reference** — `.wav` file path for Chatterbox/Qwen3-TTS
- **Live2D expression mappings** — idle, speaking, thinking, happy, error states
- **VTube Studio WebSocket triggers** — JSON map of events to expressions
### Character Schema v2
Each character is a JSON file in `~/homeai-data/characters/` with:
- **System prompt** — core personality, injected into every LLM request
- **Profile fields** — background, appearance, dialogue_style, skills array
- **TTS config** — engine (kokoro/elevenlabs), kokoro_voice, elevenlabs_voice_id, elevenlabs_model, speed
- **GAZE presets** — array of `{preset, trigger}` for image generation styles
- **Custom prompt rules** — trigger/response overrides for specific contexts
- **mem0** — persistent memory that evolves over time
Character config JSON (exported from Character Manager) is the single source of truth consumed by all pipeline components.
### Memory System
Two-tier memory stored as JSON in `~/homeai-data/memories/`:
- **Personal memories** (`personal/{character_id}.json`) — per-character, about user interactions
- **General memories** (`general.json`) — shared operational knowledge (tool usage, device info, routines)
Memories are injected into the system prompt by the bridge with budget truncation (personal: 4000 chars, general: 3000 chars, newest first).
### TTS Voice Routing
The bridge writes the active character's TTS config to `~/homeai-data/active-tts-voice.json` before each request. The Wyoming TTS server reads this state file to determine which engine/voice to use:
- **Kokoro** — local, fast, uses `kokoro_voice` field (e.g., `af_heart`)
- **ElevenLabs** — cloud, uses `elevenlabs_voice_id` + `elevenlabs_model`, returns PCM 24kHz
This works for both ESP32/HA pipeline and dashboard chat.
---
## Project Priorities
1. **Foundation** — Docker stack up (Home Assistant, Open WebUI, Portainer, Uptime Kuma)
2. **LLM** — Ollama running with target models, Open WebUI connected
3. **Voice pipeline** — Whisper → Ollama → Kokoro → Wyoming → Home Assistant
4. **OpenClaw** — installed, onboarded, connected to Ollama and Home Assistant
5. **ESP32-S3-BOX-3** — ESPHome flash, Wyoming Satellite, LVGL face
6. **Character system** — system prompt wired up, mem0 integrated, voice cloned
7. **VTube Studio**model loaded, WebSocket API bridge written as OpenClaw skill
8. **ComfyUI** — image generation online, character-consistent model workflows
9. **Extended integrations** — n8n workflows, Music Assistant, Snapcast, Gitea, code-server
10. **Polish** — Authelia, Tailscale hardening, mobile companion, iOS widgets
1. **Foundation** — Docker stack up (Home Assistant, Open WebUI, Portainer, Uptime Kuma)
2. **LLM** — Ollama running with target models, Open WebUI connected
3. **Voice pipeline** — Whisper → Ollama → Kokoro → Wyoming → Home Assistant
4. **OpenClaw** — installed, onboarded, connected to Ollama and Home Assistant
5. **ESP32-S3-BOX-3** — ESPHome flash, Wyoming Satellite, display faces ✅
6. **Character system** — schema v2, dashboard editor, memory system, per-character TTS routing ✅
7. **OpenClaw skills expansion**9 new skills (memory, monitor, character, routine, music, workflow, gitea, calendar, mode) + public/private mode routing ✅
8. **Music Assistant** — deployed on Pi (10.0.0.199:8095), Spotify + SMB + Chromecast players ✅
9. **Animated visual** — PNG/GIF character visual for the web assistant (initial visual layer)
10. **Android app** — companion app for mobile access to the assistant
11. **ComfyUI** — image generation online, character-consistent model workflows
12. **Extended integrations** — Snapcast, code-server
13. **Polish** — Authelia, Tailscale hardening, iOS widgets
### Stretch Goals
- **Live2D / VTube Studio** — full Live2D model with WebSocket API bridge (requires learning Live2D tooling)
---
## Key Paths & Conventions
- All Docker compose files: `~/server/docker/`
- Launchd plists (source): `homeai-*/launchd/` (symlinked to `~/Library/LaunchAgents/`)
- Docker compose (Mac Mini): `homeai-infra/docker/docker-compose.yml`
- Docker compose (Pi/SELBINA): `~/docker/selbina/` on 10.0.0.199
- OpenClaw skills: `~/.openclaw/skills/`
- Character configs: `~/.openclaw/characters/`
- OpenClaw workspace tools: `~/.openclaw/workspace/TOOLS.md`
- OpenClaw config: `~/.openclaw/openclaw.json`
- Character configs: `~/homeai-data/characters/`
- Character memories: `~/homeai-data/memories/`
- Conversation history: `~/homeai-data/conversations/`
- Active TTS state: `~/homeai-data/active-tts-voice.json`
- Active mode state: `~/homeai-data/active-mode.json`
- Satellite → character map: `~/homeai-data/satellite-map.json`
- Local routines: `~/homeai-data/routines/`
- Voice reminders: `~/homeai-data/reminders.json`
- Whisper models: `~/models/whisper/`
- Ollama models: managed by Ollama at `~/.ollama/models/`
- ComfyUI models: `~/ComfyUI/models/`
- Voice reference audio: `~/voices/`
- Gitea repos root: `~/gitea/`
- Music Assistant (Pi): `~/docker/selbina/music-assistant/` on 10.0.0.199
- Skills user guide: `homeai-agent/SKILLS_GUIDE.md`
---
@@ -148,6 +203,8 @@ Character config JSON (exported from Character Manager) is the single source of
- ESP32-S3-BOX-3 units are dumb satellites — all intelligence stays on Mac Mini
- The character JSON schema (from Character Manager) should be treated as a versioned spec; pipeline components read from it, never hardcode personality values
- OpenClaw skills are the primary extension mechanism — new capabilities = new skills
- Prefer local models; cloud API keys (Anthropic, OpenAI) are fallback only
- Primary LLM is Claude Sonnet 4 (Anthropic API); local Ollama models are available as fallback
- Launchd plists are symlinked from repo source to ~/Library/LaunchAgents/ — edit source, then bootout/bootstrap to reload
- Music Assistant runs on Pi (10.0.0.199), not Mac Mini — needs host networking for Chromecast mDNS discovery
- VTube Studio API bridge should be a standalone OpenClaw skill with clear event interface
- mem0 memory store should be backed up as part of regular Gitea commits

122
TODO.md
View File

@@ -16,7 +16,7 @@
- [x] `docker compose up -d` — bring all services up
- [x] Home Assistant onboarding — long-lived access token generated, stored in `.env`
- [ ] Install Tailscale, verify all services reachable on Tailnet
- [ ] Uptime Kuma: add monitors for all services, configure mobile alerts
- [x] Uptime Kuma: add monitors for all services, configure mobile alerts
- [ ] Verify all containers survive a cold reboot
### P2 · homeai-llm
@@ -26,11 +26,11 @@
- [x] Register local GGUF models via Modelfiles (no download): llama3.3:70b, qwen3:32b, codestral:22b, qwen2.5:7b
- [x] Register additional models: EVA-LLaMA-3.33-70B, Midnight-Miqu-70B, QwQ-32B, Qwen3.5-35B, Qwen3-Coder-30B, Qwen3-VL-30B, GLM-4.6V-Flash, DeepSeek-R1-8B, gemma-3-27b
- [x] Add qwen3.5:35b-a3b (MoE, Q8_0) — 26.7 tok/s, recommended for voice pipeline
- [x] Write model preload script + launchd service (keeps voice model in VRAM permanently)
- [x] Write model keep-warm daemon + launchd service (pins qwen2.5:7b + $HOMEAI_MEDIUM_MODEL in VRAM, checks every 5 min)
- [x] Deploy Open WebUI via Docker compose (port 3030)
- [x] Verify Open WebUI connected to Ollama, all models available
- [x] Run pipeline benchmark (homeai-voice/scripts/benchmark_pipeline.py) — STT/LLM/TTS latency profiled
- [ ] Add Ollama + Open WebUI to Uptime Kuma monitors
- [x] Add Ollama + Open WebUI to Uptime Kuma monitors
---
@@ -56,7 +56,7 @@
- [ ] Install Chatterbox TTS (MPS build), test with sample `.wav`
- [ ] Install Qwen3-TTS via MLX (fallback)
- [ ] Train custom wake word using character name
- [ ] Add Wyoming STT/TTS to Uptime Kuma monitors
- [x] Add Wyoming STT/TTS to Uptime Kuma monitors
---
@@ -82,7 +82,7 @@
- [x] Verify full voice → agent → HA action flow
- [x] Add OpenClaw to Uptime Kuma monitors (Manual user action required)
### P5 · homeai-character *(can start alongside P4)*
### P5 · homeai-dashboard *(character system + dashboard)*
- [x] Define and write `schema/character.schema.json` (v1)
- [x] Write `characters/aria.json` — default character
@@ -100,6 +100,15 @@
- [x] Add character profile management to dashboard — store/switch character configs with attached profile images
- [x] Add TTS voice preview in character editor — Kokoro preview via OpenClaw bridge with loading state, custom text, stop control
- [x] Merge homeai-character + homeai-desktop into unified homeai-dashboard (services, chat, characters, editor)
- [x] Upgrade character schema to v2 — background, dialogue_style, appearance, skills, gaze_presets (auto-migrate v1)
- [x] Add LLM-assisted character creation via Character MCP server (Fandom/Wikipedia lookup)
- [x] Add character memory system — personal (per-character) + general (shared) memories with dashboard UI
- [x] Add conversation history with per-conversation persistence
- [x] Wire character_id through full pipeline (dashboard → bridge → LLM system prompt)
- [x] Add TTS text cleaning — strip tags, asterisks, emojis, markdown before synthesis
- [x] Add per-character TTS voice routing — bridge writes state file, Wyoming server reads it
- [x] Add ElevenLabs TTS support in Wyoming server — cloud voice synthesis via state file routing
- [x] Dashboard auto-selects character's TTS engine/voice (Kokoro or ElevenLabs)
- [ ] Deploy dashboard as Docker container or static site on Mac Mini
---
@@ -108,65 +117,88 @@
### P6 · homeai-esp32
- [ ] Install ESPHome: `pip install esphome`
- [ ] Write `esphome/secrets.yaml` (gitignored)
- [ ] Write `base.yaml`, `voice.yaml`, `display.yaml`, `animations.yaml`
- [ ] Write `s3-box-living-room.yaml` for first unit
- [ ] Flash first unit via USB
- [ ] Verify unit appears in HA device list
- [ ] Assign Wyoming voice pipeline to unit in HA
- [ ] Test full wake → STT → LLM → TTS → audio playback cycle
- [ ] Test LVGL face: idle → listening → thinking → speaking → error
- [ ] Verify OTA firmware update works wirelessly
- [ ] Flash remaining units (bedroom, kitchen, etc.)
- [x] Install ESPHome in `~/homeai-esphome-env` (Python 3.12 venv)
- [x] Write `esphome/secrets.yaml` (gitignored)
- [x] Write `homeai-living-room.yaml` (based on official S3-BOX-3 reference config)
- [x] Generate placeholder face illustrations (7 PNGs, 320×240)
- [x] Write `setup.sh` with flash/ota/logs/validate commands
- [x] Write `deploy.sh` with OTA deploy, image management, multi-unit support
- [x] Flash first unit via USB (living room)
- [x] Verify unit appears in HA device list (requires HA 2026.x for ESPHome 2025.12+ compat)
- [x] Assign Wyoming voice pipeline to unit in HA
- [x] Test full wake → STT → LLM → TTS → audio playback cycle
- [x] Test display states: idle → listening → thinking → replying → error
- [x] Verify OTA firmware update works wirelessly (`deploy.sh --device OTA`)
- [ ] Flash remaining units (bedroom, kitchen)
- [ ] Document MAC address → room name mapping
### P6b · homeai-rpi (Kitchen Satellite)
- [x] Set up Wyoming Satellite on Raspberry Pi 5 (SELBINA) with ReSpeaker 2-Mics pHAT
- [x] Write setup.sh — full Pi provisioning (venv, drivers, systemd, scripts)
- [x] Write deploy.sh — remote deploy/manage from Mac Mini (push-wrapper, test-logs, etc.)
- [x] Write satellite_wrapper.py — monkey-patches fixing TTS echo, writer race, streaming timeout
- [x] Test multi-command voice loop without freezing
---
## Phase 5 — Visual Layer
### P7 · homeai-visual
- [ ] Install VTube Studio (Mac App Store)
- [ ] Enable WebSocket API on port 8001
- [ ] Source/purchase a Live2D model (nizima.com or booth.pm)
- [ ] Load model in VTube Studio
- [ ] Create hotkeys for all 8 expression states
- [ ] Write `skills/vtube_studio` SKILL.md + implementation
- [ ] Run auth flow — click Allow in VTube Studio, save token
- [ ] Test all 8 expressions via test script
- [ ] Update `aria.json` with real VTube Studio hotkey IDs
- [ ] Write `lipsync.py` amplitude-based helper
- [ ] Integrate lip sync into OpenClaw TTS dispatch
- [ ] Test full pipeline: voice → thinking expression → speaking with lip sync
#### VTube Studio Expression Bridge
- [x] Write `vtube-bridge.py` — persistent WebSocket ↔ HTTP bridge daemon (port 8002)
- [x] Write `vtube-ctl` CLI wrapper + OpenClaw skill (`~/.openclaw/skills/vtube-studio/`)
- [x] Wire expression triggers into `openclaw-http-bridge.py` (thinking → idle, speaking → idle)
- [x] Add amplitude-based lip sync to `wyoming_kokoro_server.py` (RMS → MouthOpen parameter)
- [x] Write `test-expressions.py` — auth flow, expression cycle, lip sync sweep, latency test
- [x] Write launchd plist + setup.sh for venv creation and service registration
- [ ] Install VTube Studio from Mac App Store, enable WebSocket API (port 8001)
- [ ] Source/purchase Live2D model, load in VTube Studio
- [ ] Create 8 expression hotkeys, record UUIDs
- [ ] Run `setup.sh` to create venv, install websockets, load launchd service
- [ ] Run `vtube-ctl auth` — click Allow in VTube Studio
- [ ] Update `aria.json` with real hotkey UUIDs (replace placeholders)
- [ ] Run `test-expressions.py --all` — verify expressions + lip sync + latency
- [ ] Set up VTube Studio mobile (iPhone/iPad) on Tailnet
#### Web Visuals (Dashboard)
- [ ] Design PNG/GIF character visuals for web assistant (idle, thinking, speaking, etc.)
- [ ] Integrate animated visuals into homeai-dashboard chat view
- [ ] Sync visual state to voice pipeline events (listening, processing, responding)
- [ ] Add expression transitions and idle animations
### P8 · homeai-android
- [ ] Build Android companion app for mobile assistant access
- [ ] Integrate with OpenClaw bridge API (chat, TTS, STT)
- [ ] Add character visual display
- [ ] Push notification support via ntfy/FCM
---
## Phase 6 — Image Generation
### P8 · homeai-images
### P9 · homeai-images (ComfyUI)
- [ ] Clone ComfyUI to `~/ComfyUI/`, install deps in venv
- [ ] Verify MPS is detected at launch
- [ ] Write and load launchd plist (`com.homeai.comfyui.plist`)
- [ ] Download SDXL base model
- [ ] Download Flux.1-schnell
- [ ] Download ControlNet models (canny, depth)
- [ ] Download SDXL base model + Flux.1-schnell + ControlNet models
- [ ] Test generation via ComfyUI web UI (port 8188)
- [ ] Build and export `quick.json`, `portrait.json`, `scene.json`, `upscale.json` workflows
- [ ] Build and export workflow JSONs (quick, portrait, scene, upscale)
- [ ] Write `skills/comfyui` SKILL.md + implementation
- [ ] Test skill: "Generate a portrait of Aria looking happy"
- [ ] Collect character reference images for LoRA training
- [ ] Train SDXL LoRA with kohya_ss, verify character consistency
- [ ] Add ComfyUI to Uptime Kuma monitors
---
## Phase 7 — Extended Integrations & Polish
- [ ] Deploy Music Assistant (Docker), integrate with Home Assistant
- [ ] Write `skills/music` SKILL.md for OpenClaw
### P10 · Integrations & Polish
- [x] Deploy Music Assistant (Docker on Pi 10.0.0.199:8095), Spotify + SMB + Chromecast
- [x] Write `skills/music` SKILL.md for OpenClaw
- [ ] Deploy Snapcast server on Mac Mini
- [ ] Configure Snapcast clients on ESP32 units for multi-room audio
- [ ] Configure Authelia as 2FA layer in front of web UIs
@@ -181,10 +213,24 @@
---
## Stretch Goals
### Live2D / VTube Studio
- [ ] Learn Live2D modelling toolchain (Live2D Cubism Editor)
- [ ] Install VTube Studio (Mac App Store), enable WebSocket API on port 8001
- [ ] Source/commission a Live2D model (nizima.com or booth.pm)
- [ ] Create hotkeys for expression states
- [ ] Write `skills/vtube_studio` SKILL.md + implementation
- [ ] Write `lipsync.py` amplitude-based helper
- [ ] Integrate lip sync into OpenClaw TTS dispatch
- [ ] Set up VTube Studio mobile (iPhone/iPad) on Tailnet
---
## Open Decisions
- [ ] Confirm character name (determines wake word training)
- [ ] Live2D model: purchase off-the-shelf or commission custom?
- [ ] mem0 backend: Chroma (simple) vs Qdrant Docker (better semantic search)?
- [ ] Snapcast output: ESP32 built-in speakers or dedicated audio hardware per room?
- [ ] Authelia user store: local file vs LDAP?

View File

@@ -1,37 +1,38 @@
# P4: homeai-agent — AI Agent, Skills & Automation
> Phase 3 | Depends on: P1 (HA), P2 (Ollama), P3 (Wyoming/TTS), P5 (character JSON)
---
## Goal
OpenClaw running as the primary AI agent: receives voice/text input, loads character persona, calls tools (skills), manages memory (mem0), dispatches responses (TTS, HA actions, VTube expressions). n8n handles scheduled/automated workflows.
> Phase 4 | Depends on: P1 (HA), P2 (Ollama), P3 (Wyoming/TTS), P5 (character JSON)
> Status: **COMPLETE** (all skills implemented)
---
## Architecture
```
Voice input (text from P3 Wyoming STT)
Voice input (text from Wyoming STT via HA pipeline)
OpenClaw API (port 8080)
loads character JSON from P5
System prompt construction
Ollama LLM (P2) — llama3.3:70b
↓ response + tool calls
Skill dispatcher
├── home_assistant.py → HA REST API (P1)
├── memory.py → mem0 (local)
├── vtube_studio.py → VTube WS (P7)
├── comfyui.py → ComfyUI API (P8)
├── music.py → Music Assistant (Phase 7)
── weather.py → HA sensor data
OpenClaw HTTP Bridge (port 8081)
resolves character, loads memories, checks mode
System prompt construction (profile + memories)
checks active-mode.json for model routing
OpenClaw CLI → LLM (Ollama local or cloud API)
↓ response + tool calls via exec
Skill dispatcher (CLIs on PATH)
├── ha-ctl → Home Assistant REST API
├── memory-ctl → JSON memory files
├── monitor-ctl → service health checks
├── character-ctl → character switching
├── routine-ctl → scenes, scripts, multi-step routines
── music-ctl → media player control
├── workflow-ctl → n8n workflow triggering
├── gitea-ctl → Gitea repo/issue queries
├── calendar-ctl → HA calendar + voice reminders
├── mode-ctl → public/private LLM routing
├── gaze-ctl → image generation
└── vtube-ctl → VTube Studio expressions
↓ final response text
TTS dispatch:
├── Chatterbox (voice clone, if active)
└── Kokoro (via Wyoming, fallback)
TTS dispatch (via active-tts-voice.json):
├── Kokoro (local, Wyoming)
└── ElevenLabs (cloud API)
Audio playback to appropriate room
```
@@ -40,296 +41,148 @@ OpenClaw API (port 8080)
## OpenClaw Setup
### Installation
```bash
# Confirm OpenClaw supports Ollama — check repo for latest install method
pip install openclaw
# or
git clone https://github.com/<openclaw-repo>/openclaw
pip install -e .
```
**Key question:** Verify OpenClaw's Ollama/OpenAI-compatible backend support before installation. If OpenClaw doesn't support local Ollama natively, use a thin adapter layer pointing its OpenAI endpoint at `http://localhost:11434/v1`.
### Config — `~/.openclaw/config.yaml`
```yaml
version: 1
llm:
provider: ollama # or openai-compatible
base_url: http://localhost:11434/v1
model: llama3.3:70b
fast_model: qwen2.5:7b # used for quick intent classification
character:
active: aria
config_dir: ~/.openclaw/characters/
memory:
provider: mem0
store_path: ~/.openclaw/memory/
embedding_model: nomic-embed-text
embedding_url: http://localhost:11434/v1
api:
host: 0.0.0.0
port: 8080
tts:
primary: chatterbox # when voice clone active
fallback: kokoro-wyoming # Wyoming TTS endpoint
wyoming_tts_url: tcp://localhost:10301
wake:
endpoint: /wake # openWakeWord POSTs here to trigger listening
```
- **Runtime:** Node.js global install at `/opt/homebrew/bin/openclaw` (v2026.3.2)
- **Config:** `~/.openclaw/openclaw.json`
- **Gateway:** port 8080, mode local, launchd: `com.homeai.openclaw`
- **Default model:** `ollama/qwen3.5:35b-a3b` (MoE, 35B total, 3B active, 26.7 tok/s)
- **Cloud models (public mode):** `anthropic/claude-sonnet-4-20250514`, `openai/gpt-4o`
- **Critical:** `commands.native: true` in config (enables exec tool for CLI skills)
- **Critical:** `contextWindow: 32768` for large models (prevents GPU OOM)
---
## Skills
## Skills (13 total)
All skills live in `~/.openclaw/skills/` (symlinked from `homeai-agent/skills/`).
All skills follow the same pattern:
- `~/.openclaw/skills/<name>/SKILL.md` — metadata + agent instructions
- `~/.openclaw/skills/<name>/<tool>` — executable Python CLI (stdlib only)
- Symlinked to `/opt/homebrew/bin/` for PATH access
- Agent invokes via `exec` tool
- Documented in `~/.openclaw/workspace/TOOLS.md`
### `home_assistant.py`
### Existing Skills (4)
Wraps the HA REST API for common smart home actions.
| Skill | CLI | Description |
|-------|-----|-------------|
| home-assistant | `ha-ctl` | Smart home device control |
| image-generation | `gaze-ctl` | Image generation via ComfyUI/GAZE |
| voice-assistant | (none) | Voice pipeline handling |
| vtube-studio | `vtube-ctl` | VTube Studio expression control |
**Functions:**
- `turn_on(entity_id, **kwargs)` — lights, switches, media players
- `turn_off(entity_id)`
- `toggle(entity_id)`
- `set_light(entity_id, brightness=None, color_temp=None, rgb_color=None)`
- `run_scene(scene_id)`
- `get_state(entity_id)` → returns state + attributes
- `list_entities(domain=None)` → returns entity list
### New Skills (9) — Added 2026-03-17
Uses `HA_URL` and `HA_TOKEN` from `.env.services`.
| Skill | CLI | Description |
|-------|-----|-------------|
| memory | `memory-ctl` | Store/search/recall memories |
| service-monitor | `monitor-ctl` | Service health checks |
| character | `character-ctl` | Character switching |
| routine | `routine-ctl` | Scenes and multi-step routines |
| music | `music-ctl` | Media player control |
| workflow | `workflow-ctl` | n8n workflow management |
| gitea | `gitea-ctl` | Gitea repo/issue/PR queries |
| calendar | `calendar-ctl` | Calendar events and voice reminders |
| mode | `mode-ctl` | Public/private LLM routing |
### `memory.py`
Wraps mem0 for persistent long-term memory.
**Functions:**
- `remember(text, category=None)` — store a memory
- `recall(query, limit=5)` — semantic search over memories
- `forget(memory_id)` — delete a specific memory
- `list_recent(n=10)` — list most recent memories
mem0 uses `nomic-embed-text` via Ollama for embeddings.
### `weather.py`
Pulls weather data from Home Assistant sensors (local weather station or HA weather integration).
**Functions:**
- `get_current()` → temp, humidity, conditions
- `get_forecast(days=3)` → forecast array
### `timer.py`
Simple timer/reminder management.
**Functions:**
- `set_timer(duration_seconds, label=None)` → fires HA notification/TTS on expiry
- `set_reminder(datetime_str, message)` → schedules future TTS playback
- `list_timers()`
- `cancel_timer(timer_id)`
### `music.py` (stub — completed in Phase 7)
```python
def play(query: str): ... # "play jazz" → Music Assistant
def pause(): ...
def skip(): ...
def set_volume(level: int): ... # 0-100
```
### `vtube_studio.py` (implemented in P7)
Stub in P4, full implementation in P7:
```python
def trigger_expression(event: str): ... # "thinking", "happy", etc.
def set_parameter(name: str, value: float): ...
```
### `comfyui.py` (implemented in P8)
Stub in P4, full implementation in P8:
```python
def generate(workflow: str, params: dict) -> str: ... # returns image path
```
See `SKILLS_GUIDE.md` for full user documentation.
---
## mem0 — Long-Term Memory
## HTTP Bridge
### Setup
**File:** `openclaw-http-bridge.py` (runs in homeai-voice-env)
**Port:** 8081, launchd: `com.homeai.openclaw-bridge`
```bash
pip install mem0ai
```
### Config
```python
from mem0 import Memory
config = {
"llm": {
"provider": "ollama",
"config": {
"model": "llama3.3:70b",
"ollama_base_url": "http://localhost:11434",
}
},
"embedder": {
"provider": "ollama",
"config": {
"model": "nomic-embed-text",
"ollama_base_url": "http://localhost:11434",
}
},
"vector_store": {
"provider": "chroma",
"config": {
"collection_name": "homeai_memory",
"path": "~/.openclaw/memory/chroma",
}
}
}
memory = Memory.from_config(config)
```
> **Decision point:** Start with Chroma (local file-based). If semantic recall quality is poor, migrate to Qdrant (Docker container).
### Backup
Daily cron (via launchd) commits mem0 data to Gitea:
```bash
#!/usr/bin/env bash
cd ~/.openclaw/memory
git add .
git commit -m "mem0 backup $(date +%Y-%m-%d)"
git push origin main
```
---
## n8n Workflows
n8n runs in Docker (deployed in P1). Workflows exported as JSON and stored in `homeai-agent/workflows/`.
### Starter Workflows
**`morning-briefing.json`**
- Trigger: time-based (e.g., 7:30 AM on weekdays)
- Steps: fetch weather → fetch calendar events → compose briefing → POST to OpenClaw TTS → speak aloud
**`notification-router.json`**
- Trigger: HA webhook (new notification)
- Steps: classify urgency → if high: TTS immediately; if low: queue for next interaction
**`memory-backup.json`**
- Trigger: daily schedule
- Steps: commit mem0 data to Gitea
### n8n ↔ OpenClaw Integration
OpenClaw exposes a webhook endpoint that n8n can call to trigger TTS or run a skill:
```
POST http://localhost:8080/speak
{
"text": "Good morning. It is 7:30 and the weather is...",
"room": "all"
}
```
---
## API Surface (OpenClaw)
Key endpoints consumed by other projects:
### Endpoints
| Endpoint | Method | Description |
|---|---|---|
| `/chat` | POST | Send text, get response (+ fires skills) |
| `/wake` | POST | Wake word trigger from openWakeWord |
| `/speak` | POST | TTS only — no LLM, just speak text |
| `/skill/<name>` | POST | Call a specific skill directly |
| `/memory` | GET/POST | Read/write memories |
|----------|--------|-------------|
| `/api/agent/message` | POST | Send message → LLM → response |
| `/api/tts` | POST | Text-to-speech (Kokoro or ElevenLabs) |
| `/api/stt` | POST | Speech-to-text (Wyoming/Whisper) |
| `/wake` | POST | Wake word notification |
| `/status` | GET | Health check |
---
### Request Flow
## Directory Layout
1. Resolve character: explicit `character_id` > `satellite_id` mapping > default
2. Build system prompt: profile fields + metadata + personal/general memories
3. Write TTS config to `active-tts-voice.json`
4. Load mode from `active-mode.json`, resolve model (private → local, public → cloud)
5. Call OpenClaw CLI with `--model` flag if public mode
6. Detect/re-prompt if model promises action but doesn't call exec tool
7. Return response
```
homeai-agent/
├── skills/
│ ├── home_assistant.py
│ ├── memory.py
│ ├── weather.py
│ ├── timer.py
│ ├── music.py # stub
│ ├── vtube_studio.py # stub
│ └── comfyui.py # stub
├── workflows/
│ ├── morning-briefing.json
│ ├── notification-router.json
│ └── memory-backup.json
└── config/
├── config.yaml.example
└── mem0-config.py
```
### Timeout Strategy
| State | Timeout |
|-------|---------|
| Model warm (loaded in VRAM) | 120s |
| Model cold (loading) | 180s |
---
## Interface Contracts
## Daemons
**Consumes:**
- Ollama API: `http://localhost:11434/v1`
- HA API: `$HA_URL` with `$HA_TOKEN`
- Wyoming TTS: `tcp://localhost:10301`
- Character JSON: `~/.openclaw/characters/<active>.json` (from P5)
**Exposes:**
- OpenClaw HTTP API: `http://localhost:8080` — consumed by P3 (voice), P7 (visual triggers), P8 (image skill)
**Add to `.env.services`:**
```dotenv
OPENCLAW_URL=http://localhost:8080
```
| Daemon | Plist | Purpose |
|--------|-------|---------|
| `com.homeai.openclaw` | `launchd/com.homeai.openclaw.plist` | OpenClaw gateway (port 8080) |
| `com.homeai.openclaw-bridge` | `launchd/com.homeai.openclaw-bridge.plist` | HTTP bridge (port 8081) |
| `com.homeai.reminder-daemon` | `launchd/com.homeai.reminder-daemon.plist` | Voice reminder checker (60s interval) |
---
## Implementation Steps
## Data Files
- [ ] Confirm OpenClaw installation method and Ollama compatibility
- [ ] Install OpenClaw, write `config.yaml` pointing at Ollama and HA
- [ ] Verify OpenClaw responds to a basic text query via `/chat`
- [ ] Write `home_assistant.py` skill — test lights on/off via voice
- [ ] Write `memory.py` skill — test store and recall
- [ ] Write `weather.py` skill — verify HA weather sensor data
- [ ] Write `timer.py` skill — test set/fire a timer
- [ ] Write skill stubs: `music.py`, `vtube_studio.py`, `comfyui.py`
- [ ] Set up mem0 with Chroma backend, test semantic recall
- [ ] Write and test memory backup launchd job
- [ ] Deploy n8n via Docker (P1 task if not done)
- [ ] Build morning briefing n8n workflow
- [ ] Symlink `homeai-agent/skills/``~/.openclaw/skills/`
- [ ] Verify full voice → agent → HA action flow (with P3 pipeline)
| File | Purpose |
|------|---------|
| `~/homeai-data/memories/personal/*.json` | Per-character memories |
| `~/homeai-data/memories/general.json` | Shared general memories |
| `~/homeai-data/characters/*.json` | Character profiles (schema v2) |
| `~/homeai-data/satellite-map.json` | Satellite → character mapping |
| `~/homeai-data/active-tts-voice.json` | Current TTS engine/voice |
| `~/homeai-data/active-mode.json` | Public/private mode state |
| `~/homeai-data/routines/*.json` | Local routine definitions |
| `~/homeai-data/reminders.json` | Pending voice reminders |
| `~/homeai-data/conversations/*.json` | Chat conversation history |
---
## Success Criteria
## Environment Variables (OpenClaw Plist)
- [ ] "Turn on the living room lights" → lights turn on via HA
- [ ] "Remember that I prefer jazz in the mornings" → mem0 stores it; "What do I like in the mornings?" → recalls it
- [ ] Morning briefing n8n workflow fires on schedule and speaks via TTS
- [ ] OpenClaw `/status` returns healthy
- [ ] OpenClaw survives Mac Mini reboot (launchd or Docker — TBD based on OpenClaw's preferred run method)
| Variable | Purpose |
|----------|---------|
| `HASS_TOKEN` / `HA_TOKEN` | Home Assistant API token |
| `HA_URL` | Home Assistant URL |
| `GAZE_API_KEY` | Image generation API key |
| `N8N_API_KEY` | n8n automation API key |
| `GITEA_TOKEN` | Gitea API token |
| `ANTHROPIC_API_KEY` | Claude API key (public mode) |
| `OPENAI_API_KEY` | OpenAI API key (public mode) |
---
## Implementation Status
- [x] OpenClaw installed and configured
- [x] HTTP bridge with character resolution and memory injection
- [x] ha-ctl — smart home control
- [x] gaze-ctl — image generation
- [x] vtube-ctl — VTube Studio expressions
- [x] memory-ctl — memory store/search/recall
- [x] monitor-ctl — service health checks
- [x] character-ctl — character switching
- [x] routine-ctl — scenes and multi-step routines
- [x] music-ctl — media player control
- [x] workflow-ctl — n8n workflow triggering
- [x] gitea-ctl — Gitea integration
- [x] calendar-ctl — calendar + voice reminders
- [x] mode-ctl — public/private LLM routing
- [x] Bridge mode routing (active-mode.json → --model flag)
- [x] Cloud providers in openclaw.json (Anthropic, OpenAI)
- [x] Dashboard /api/mode endpoint
- [x] Reminder daemon (com.homeai.reminder-daemon)
- [x] TOOLS.md updated with all skills
- [ ] Set N8N_API_KEY (requires generating in n8n UI)
- [ ] Set GITEA_TOKEN (requires generating in Gitea UI)
- [ ] Set ANTHROPIC_API_KEY / OPENAI_API_KEY for public mode
- [ ] End-to-end voice test of each skill

View File

@@ -0,0 +1,386 @@
# OpenClaw Skills — User Guide
> All skills are invoked by voice or chat. Say a natural command and the AI agent will route it to the right tool automatically.
---
## Quick Reference
| Skill | CLI | What it does |
|-------|-----|-------------|
| Home Assistant | `ha-ctl` | Control lights, switches, sensors, climate |
| Image Generation | `gaze-ctl` | Generate images via ComfyUI/GAZE |
| Memory | `memory-ctl` | Store and recall things about you |
| Service Monitor | `monitor-ctl` | Check if services are running |
| Character Switcher | `character-ctl` | Switch AI personalities |
| Routines & Scenes | `routine-ctl` | Create and trigger multi-step automations |
| Music | `music-ctl` | Play, pause, skip, volume control |
| n8n Workflows | `workflow-ctl` | Trigger automation workflows |
| Gitea | `gitea-ctl` | Query repos, commits, issues |
| Calendar & Reminders | `calendar-ctl` | View calendar, set voice reminders |
| Public/Private Mode | `mode-ctl` | Route to local or cloud LLMs |
---
## Phase A — Core Skills
### Memory (`memory-ctl`)
The agent can remember things about you and recall them later. Memories persist across conversations and are visible in the dashboard.
**Voice examples:**
- "Remember that my favorite color is blue"
- "I take my coffee black"
- "What do you know about me?"
- "Forget that I said I like jazz"
**CLI usage:**
```bash
memory-ctl add personal "User's favorite color is blue" --category preference
memory-ctl add general "Living room speaker is a Sonos" --category fact
memory-ctl search "coffee"
memory-ctl list --type personal
memory-ctl delete <memory_id>
```
**Categories:** `preference`, `fact`, `routine`
**How it works:** Memories are stored as JSON in `~/homeai-data/memories/`. Personal memories are per-character (each character has their own relationship with you). General memories are shared across all characters.
---
### Service Monitor (`monitor-ctl`)
Ask the assistant if everything is healthy, check specific services, or see what models are loaded.
**Voice examples:**
- "Is everything running?"
- "What models are loaded?"
- "Is Home Assistant up?"
- "Show me the Docker containers"
**CLI usage:**
```bash
monitor-ctl status # Full health check (all services)
monitor-ctl check ollama # Single service
monitor-ctl ollama # Models loaded, VRAM usage
monitor-ctl docker # Docker container status
```
**Services checked:** Ollama, OpenClaw Bridge, OpenClaw Gateway, Wyoming STT, Wyoming TTS, Dashboard, n8n, Uptime Kuma, Home Assistant, Gitea
---
### Character Switcher (`character-ctl`)
Switch between AI personalities on the fly. Each character has their own voice, personality, and memories.
**Voice examples:**
- "Talk to Aria"
- "Switch to Sucy"
- "Who can I talk to?"
- "Who am I talking to?"
- "Tell me about Aria"
**CLI usage:**
```bash
character-ctl list # See all characters
character-ctl active # Who is the current default
character-ctl switch "Aria" # Switch (fuzzy name matching)
character-ctl info "Sucy" # Character profile
character-ctl map homeai-kitchen.local aria_123 # Map a satellite to a character
```
**How it works:** Switching updates the default character in `satellite-map.json` and writes the TTS voice config. The new character takes effect on the next request.
---
## Phase B — Home Assistant Extensions
### Routines & Scenes (`routine-ctl`)
Create and trigger Home Assistant scenes and multi-step routines by voice.
**Voice examples:**
- "Activate movie mode"
- "Run the bedtime routine"
- "What scenes do I have?"
- "Create a morning routine"
**CLI usage:**
```bash
routine-ctl list-scenes # HA scenes
routine-ctl list-scripts # HA scripts
routine-ctl trigger "movie_mode" # Activate scene/script
routine-ctl create-scene "cozy" --entities '[{"entity_id":"light.lamp","state":"on","brightness":80}]'
routine-ctl create-routine "bedtime" --steps '[
{"type":"ha","cmd":"off \"All Lights\""},
{"type":"delay","seconds":2},
{"type":"tts","text":"Good night!"}
]'
routine-ctl run "bedtime" # Execute routine
routine-ctl list-routines # List local routines
routine-ctl delete-routine "bedtime" # Remove routine
```
**Step types:**
| Type | Description | Fields |
|------|-------------|--------|
| `scene` | Trigger an HA scene | `target` (scene name) |
| `ha` | Run an ha-ctl command | `cmd` (e.g. `off "Lamp"`) |
| `delay` | Wait between steps | `seconds` |
| `tts` | Speak text aloud | `text` |
**Storage:** Routines are saved as JSON in `~/homeai-data/routines/`.
---
### Music Control (`music-ctl`)
Control music playback through Home Assistant media players — works with Spotify, Music Assistant, Chromecast, and any HA media player.
**Voice examples:**
- "Play some jazz"
- "Pause the music"
- "Next song"
- "What's playing?"
- "Turn the volume to 50"
- "Play Bohemian Rhapsody on the kitchen speaker"
- "Shuffle on"
**CLI usage:**
```bash
music-ctl players # List available players
music-ctl play "jazz" # Search and play
music-ctl play # Resume paused playback
music-ctl pause # Pause
music-ctl next # Skip to next
music-ctl prev # Go to previous
music-ctl volume 50 # Set volume (0-100)
music-ctl now-playing # Current track info
music-ctl shuffle on # Enable shuffle
music-ctl play "rock" --player media_player.kitchen # Target specific player
```
**How it works:** All commands go through HA's `media_player` services. The `--player` flag defaults to the first active (playing/paused) player. Multi-room audio works through Snapcast zones, which appear as separate `media_player` entities.
**Prerequisites:** At least one media player configured in Home Assistant (Spotify integration, Music Assistant, or Chromecast).
---
## Phase C — External Service Skills
### n8n Workflows (`workflow-ctl`)
List and trigger n8n automation workflows by voice.
**Voice examples:**
- "Run the backup workflow"
- "What workflows do I have?"
- "Did the last workflow succeed?"
**CLI usage:**
```bash
workflow-ctl list # All workflows
workflow-ctl trigger "backup" # Trigger by name (fuzzy match)
workflow-ctl trigger "abc123" --data '{"key":"val"}' # Trigger with data
workflow-ctl status <execution_id> # Check execution result
workflow-ctl history --limit 5 # Recent executions
```
**Setup required:**
1. Generate an API key in n8n: Settings → API → Create API Key
2. Set `N8N_API_KEY` in the OpenClaw launchd plist
3. Restart OpenClaw: `launchctl kickstart -k gui/501/com.homeai.openclaw`
---
### Gitea (`gitea-ctl`)
Query your self-hosted Gitea repositories, commits, issues, and pull requests.
**Voice examples:**
- "What repos do I have?"
- "Show recent commits for homeai"
- "Any open issues?"
- "Create an issue for the TTS bug"
**CLI usage:**
```bash
gitea-ctl repos # List all repos
gitea-ctl commits aodhan/homeai --limit 5 # Recent commits
gitea-ctl issues aodhan/homeai --state open # Open issues
gitea-ctl prs aodhan/homeai # Pull requests
gitea-ctl create-issue aodhan/homeai "Bug title" --body "Description here"
```
**Setup required:**
1. Generate a token in Gitea: Settings → Applications → Generate Token
2. Set `GITEA_TOKEN` in the OpenClaw launchd plist
3. Restart OpenClaw
---
### Calendar & Reminders (`calendar-ctl`)
Read calendar events from Home Assistant and set voice reminders that speak via TTS when due.
**Voice examples:**
- "What's on my calendar today?"
- "What's coming up this week?"
- "Remind me in 30 minutes to check the oven"
- "Remind me at 5pm to call mum"
- "What reminders do I have?"
- "Cancel that reminder"
**CLI usage:**
```bash
calendar-ctl today # Today's events
calendar-ctl upcoming --days 3 # Next 3 days
calendar-ctl add "Dentist" --start 2026-03-18T14:00:00 --end 2026-03-18T15:00:00
calendar-ctl remind "Check the oven" --at "in 30 minutes"
calendar-ctl remind "Call mum" --at "at 5pm"
calendar-ctl remind "Team standup" --at "tomorrow 9am"
calendar-ctl reminders # List pending
calendar-ctl cancel-reminder <id> # Cancel
```
**Supported time formats:**
| Format | Example |
|--------|---------|
| Relative | `in 30 minutes`, `in 2 hours` |
| Absolute | `at 5pm`, `at 17:00`, `at 5:30pm` |
| Tomorrow | `tomorrow 9am`, `tomorrow at 14:00` |
| Combined | `in 1 hour 30 minutes` |
**How reminders work:** A background daemon (`com.homeai.reminder-daemon`) checks `~/homeai-data/reminders.json` every 60 seconds. When a reminder is due, it POSTs to the TTS bridge and speaks the reminder aloud. Fired reminders are automatically cleaned up after 24 hours.
**Prerequisites:** Calendar entity configured in Home Assistant (Google Calendar, CalDAV, or local calendar integration).
---
## Phase D — Public/Private Mode
### Mode Controller (`mode-ctl`)
Route AI requests to local LLMs (private, no data leaves the machine) or cloud LLMs (public, faster/more capable) with per-category overrides.
**Voice examples:**
- "Switch to public mode"
- "Go private"
- "What mode am I in?"
- "Use Claude for coding"
- "Keep health queries private"
**CLI usage:**
```bash
mode-ctl status # Current mode and overrides
mode-ctl private # All requests → local Ollama
mode-ctl public # All requests → cloud LLM
mode-ctl set-provider anthropic # Use Claude (default)
mode-ctl set-provider openai # Use GPT-4o
mode-ctl override coding public # Always use cloud for coding
mode-ctl override health private # Always keep health local
mode-ctl list-overrides # Show all category rules
```
**Default category rules:**
| Always Private | Always Public | Follows Global Mode |
|---------------|--------------|-------------------|
| Personal finance | Web search | General chat |
| Health | Coding help | Smart home |
| Passwords | Complex reasoning | Music |
| Private conversations | Translation | Calendar |
**How it works:** The HTTP bridge reads `~/homeai-data/active-mode.json` before each request. Based on the mode and any category overrides, it passes `--model` to the OpenClaw CLI to route to either `ollama/qwen3.5:35b-a3b` (private) or `anthropic/claude-sonnet-4-20250514` / `openai/gpt-4o` (public).
**Setup required for public mode:**
1. Set `ANTHROPIC_API_KEY` and/or `OPENAI_API_KEY` in the OpenClaw launchd plist
2. Restart OpenClaw: `launchctl kickstart -k gui/501/com.homeai.openclaw`
**Dashboard:** The mode can also be toggled via the dashboard API at `GET/POST /api/mode`.
---
## Administration
### Adding API Keys
All API keys are stored in the OpenClaw launchd plist at:
```
~/gitea/homeai/homeai-agent/launchd/com.homeai.openclaw.plist
```
After editing, deploy and restart:
```bash
cp ~/gitea/homeai/homeai-agent/launchd/com.homeai.openclaw.plist ~/Library/LaunchAgents/
launchctl kickstart -k gui/501/com.homeai.openclaw
```
### Environment Variables
| Variable | Purpose | Required for |
|----------|---------|-------------|
| `HASS_TOKEN` | Home Assistant API token | ha-ctl, routine-ctl, music-ctl, calendar-ctl |
| `HA_URL` | Home Assistant URL | Same as above |
| `GAZE_API_KEY` | Image generation API key | gaze-ctl |
| `N8N_API_KEY` | n8n automation API key | workflow-ctl |
| `GITEA_TOKEN` | Gitea API token | gitea-ctl |
| `ANTHROPIC_API_KEY` | Claude API key | mode-ctl (public mode) |
| `OPENAI_API_KEY` | OpenAI API key | mode-ctl (public mode) |
### Skill File Locations
```
~/.openclaw/skills/
├── home-assistant/ ha-ctl → /opt/homebrew/bin/ha-ctl
├── image-generation/ gaze-ctl → /opt/homebrew/bin/gaze-ctl
├── memory/ memory-ctl → /opt/homebrew/bin/memory-ctl
├── service-monitor/ monitor-ctl → /opt/homebrew/bin/monitor-ctl
├── character/ character-ctl → /opt/homebrew/bin/character-ctl
├── routine/ routine-ctl → /opt/homebrew/bin/routine-ctl
├── music/ music-ctl → /opt/homebrew/bin/music-ctl
├── workflow/ workflow-ctl → /opt/homebrew/bin/workflow-ctl
├── gitea/ gitea-ctl → /opt/homebrew/bin/gitea-ctl
├── calendar/ calendar-ctl → /opt/homebrew/bin/calendar-ctl
├── mode/ mode-ctl → /opt/homebrew/bin/mode-ctl
├── voice-assistant/ (no CLI)
└── vtube-studio/ vtube-ctl → /opt/homebrew/bin/vtube-ctl
```
### Data File Locations
| File | Purpose |
|------|---------|
| `~/homeai-data/memories/personal/*.json` | Per-character memories |
| `~/homeai-data/memories/general.json` | Shared general memories |
| `~/homeai-data/characters/*.json` | Character profiles |
| `~/homeai-data/satellite-map.json` | Satellite → character mapping |
| `~/homeai-data/active-tts-voice.json` | Current TTS voice config |
| `~/homeai-data/active-mode.json` | Public/private mode state |
| `~/homeai-data/routines/*.json` | Local routine definitions |
| `~/homeai-data/reminders.json` | Pending voice reminders |
| `~/homeai-data/conversations/*.json` | Chat conversation history |
### Creating a New Skill
Every skill follows the same pattern:
1. Create directory: `~/.openclaw/skills/<name>/`
2. Write `SKILL.md` with YAML frontmatter (`name`, `description`) + usage docs
3. Create Python CLI (stdlib only: `urllib.request`, `json`, `os`, `sys`, `re`, `datetime`)
4. `chmod +x` the CLI and symlink to `/opt/homebrew/bin/`
5. Add env vars to the OpenClaw launchd plist if needed
6. Add a section to `~/.openclaw/workspace/TOOLS.md`
7. Restart OpenClaw: `launchctl kickstart -k gui/501/com.homeai.openclaw`
8. Test: `openclaw agent --message "test prompt" --agent main`
### Daemons
| Daemon | Plist | Purpose |
|--------|-------|---------|
| `com.homeai.reminder-daemon` | `homeai-agent/launchd/com.homeai.reminder-daemon.plist` | Fires TTS reminders when due |
| `com.homeai.openclaw` | `homeai-agent/launchd/com.homeai.openclaw.plist` | OpenClaw gateway |
| `com.homeai.openclaw-bridge` | `homeai-agent/launchd/com.homeai.openclaw-bridge.plist` | HTTP bridge (voice pipeline) |
| `com.homeai.preload-models` | `homeai-llm/scripts/preload-models.sh` | Keeps models warm in VRAM |

View File

@@ -12,7 +12,7 @@ CONF_TIMEOUT = "timeout"
DEFAULT_HOST = "10.0.0.101"
DEFAULT_PORT = 8081 # OpenClaw HTTP Bridge (not 8080 gateway)
DEFAULT_AGENT = "main"
DEFAULT_TIMEOUT = 120
DEFAULT_TIMEOUT = 200 # Must exceed bridge cold timeout (180s)
# API endpoints
OPENCLAW_API_PATH = "/api/agent/message"

View File

@@ -77,7 +77,11 @@ class OpenClawAgent(AbstractConversationAgent):
_LOGGER.debug("Processing message: %s", text)
try:
response_text = await self._call_openclaw(text)
response_text = await self._call_openclaw(
text,
satellite_id=getattr(user_input, "satellite_id", None),
device_id=getattr(user_input, "device_id", None),
)
# Create proper IntentResponse for Home Assistant
intent_response = IntentResponse(language=user_input.language or "en")
@@ -96,13 +100,14 @@ class OpenClawAgent(AbstractConversationAgent):
conversation_id=conversation_id,
)
async def _call_openclaw(self, message: str) -> str:
async def _call_openclaw(self, message: str, satellite_id: str = None, device_id: str = None) -> str:
"""Call OpenClaw API and return the response."""
url = f"http://{self.host}:{self.port}{OPENCLAW_API_PATH}"
payload = {
"message": message,
"agent": self.agent_name,
"satellite_id": satellite_id or device_id,
}
session = async_get_clientsession(self.hass)

View File

@@ -35,6 +35,10 @@
<dict>
<key>PATH</key>
<string>/opt/homebrew/bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin</string>
<key>ELEVENLABS_API_KEY</key>
<string>sk_ec10e261c6190307a37aa161a9583504dcf25a0cabe5dbd5</string>
<key>ANTHROPIC_API_KEY</key>
<string>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</string>
</dict>
</dict>
</plist>

View File

@@ -28,6 +28,20 @@
<string>eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJmZGQ1NzZlYWNkMTU0ZTY2ODY1OTkzYTlhNTIxM2FmNyIsImlhdCI6MTc3MjU4ODYyOCwiZXhwIjoyMDg3OTQ4NjI4fQ.CTAU1EZgpVLp_aRnk4vg6cQqwS5N-p8jQkAAXTxFmLY</string>
<key>HASS_TOKEN</key>
<string>eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJmZGQ1NzZlYWNkMTU0ZTY2ODY1OTkzYTlhNTIxM2FmNyIsImlhdCI6MTc3MjU4ODYyOCwiZXhwIjoyMDg3OTQ4NjI4fQ.CTAU1EZgpVLp_aRnk4vg6cQqwS5N-p8jQkAAXTxFmLY</string>
<key>GAZE_API_KEY</key>
<string>e63401f17e4845e1059f830267f839fe7fc7b6083b1cb1730863318754d799f4</string>
<key>N8N_URL</key>
<string>http://localhost:5678</string>
<key>N8N_API_KEY</key>
<string></string>
<key>GITEA_URL</key>
<string>http://10.0.0.199:3000</string>
<key>GITEA_TOKEN</key>
<string></string>
<key>ANTHROPIC_API_KEY</key>
<string>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</string>
<key>OPENAI_API_KEY</key>
<string></string>
</dict>
<key>RunAtLoad</key>

View File

@@ -0,0 +1,30 @@
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN"
"http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>Label</key>
<string>com.homeai.reminder-daemon</string>
<key>ProgramArguments</key>
<array>
<string>/Users/aodhan/homeai-voice-env/bin/python3</string>
<string>/Users/aodhan/gitea/homeai/homeai-agent/reminder-daemon.py</string>
</array>
<key>RunAtLoad</key>
<true/>
<key>KeepAlive</key>
<true/>
<key>StandardOutPath</key>
<string>/tmp/homeai-reminder-daemon.log</string>
<key>StandardErrorPath</key>
<string>/tmp/homeai-reminder-daemon-error.log</string>
<key>ThrottleInterval</key>
<integer>10</integer>
</dict>
</plist>

View File

@@ -24,9 +24,12 @@ Endpoints:
import argparse
import json
import os
import subprocess
import sys
import asyncio
import urllib.request
import threading
from http.server import HTTPServer, BaseHTTPRequestHandler
from socketserver import ThreadingMixIn
from urllib.parse import urlparse
@@ -40,19 +43,248 @@ from wyoming.asr import Transcribe, Transcript
from wyoming.audio import AudioStart, AudioChunk, AudioStop
from wyoming.info import Info
# Timeout settings (seconds)
TIMEOUT_WARM = 120 # Model already loaded in VRAM
TIMEOUT_COLD = 180 # Model needs loading first (~10-20s load + inference)
OLLAMA_PS_URL = "http://localhost:11434/api/ps"
VTUBE_BRIDGE_URL = "http://localhost:8002"
DEFAULT_MODEL = "anthropic/claude-sonnet-4-20250514"
def load_character_prompt() -> str:
"""Load the active character system prompt."""
character_path = Path.home() / ".openclaw" / "characters" / "aria.json"
def _vtube_fire_and_forget(path: str, data: dict):
"""Send a non-blocking POST to the VTube Studio bridge. Failures are silent."""
def _post():
try:
body = json.dumps(data).encode()
req = urllib.request.Request(
f"{VTUBE_BRIDGE_URL}{path}",
data=body,
headers={"Content-Type": "application/json"},
method="POST",
)
urllib.request.urlopen(req, timeout=2)
except Exception:
pass # bridge may not be running — that's fine
threading.Thread(target=_post, daemon=True).start()
def is_model_warm() -> bool:
"""Check if the default Ollama model is already loaded in VRAM."""
try:
req = urllib.request.Request(OLLAMA_PS_URL)
with urllib.request.urlopen(req, timeout=2) as resp:
data = json.loads(resp.read())
return len(data.get("models", [])) > 0
except Exception:
# If we can't reach Ollama, assume cold (safer longer timeout)
return False
CHARACTERS_DIR = Path("/Users/aodhan/homeai-data/characters")
SATELLITE_MAP_PATH = Path("/Users/aodhan/homeai-data/satellite-map.json")
MEMORIES_DIR = Path("/Users/aodhan/homeai-data/memories")
ACTIVE_TTS_VOICE_PATH = Path("/Users/aodhan/homeai-data/active-tts-voice.json")
ACTIVE_MODE_PATH = Path("/Users/aodhan/homeai-data/active-mode.json")
# Cloud provider model mappings for mode routing
CLOUD_MODELS = {
"anthropic": "anthropic/claude-sonnet-4-20250514",
"openai": "openai/gpt-4o",
}
def load_mode() -> dict:
"""Load the public/private mode configuration."""
try:
with open(ACTIVE_MODE_PATH) as f:
return json.load(f)
except Exception:
return {"mode": "private", "cloud_provider": "anthropic", "overrides": {}}
def resolve_model(mode_data: dict) -> str | None:
"""Resolve which model to use based on mode. Returns None for default (private/local)."""
mode = mode_data.get("mode", "private")
if mode == "private":
return None # Use OpenClaw default (ollama/qwen3.5:35b-a3b)
provider = mode_data.get("cloud_provider", "anthropic")
return CLOUD_MODELS.get(provider, CLOUD_MODELS["anthropic"])
def clean_text_for_tts(text: str) -> str:
"""Strip content that shouldn't be spoken: tags, asterisks, emojis, markdown."""
# Remove HTML/XML tags and their content for common non-spoken tags
text = re.sub(r'<[^>]+>', '', text)
# Remove content between asterisks (actions/emphasis markup like *sighs*)
text = re.sub(r'\*[^*]+\*', '', text)
# Remove markdown bold/italic markers that might remain
text = re.sub(r'[*_]{1,3}', '', text)
# Remove markdown headers
text = re.sub(r'^#{1,6}\s+', '', text, flags=re.MULTILINE)
# Remove markdown links [text](url) → keep text
text = re.sub(r'\[([^\]]+)\]\([^)]+\)', r'\1', text)
# Remove bare URLs
text = re.sub(r'https?://\S+', '', text)
# Remove code blocks and inline code
text = re.sub(r'```[\s\S]*?```', '', text)
text = re.sub(r'`[^`]+`', '', text)
# Remove emojis
text = re.sub(
r'[\U0001F600-\U0001F64F\U0001F300-\U0001F5FF\U0001F680-\U0001F6FF'
r'\U0001F1E0-\U0001F1FF\U0001F900-\U0001F9FF\U0001FA00-\U0001FAFF'
r'\U00002702-\U000027B0\U0000FE00-\U0000FE0F\U0000200D'
r'\U00002600-\U000026FF\U00002300-\U000023FF]+', '', text
)
# Collapse multiple spaces/newlines
text = re.sub(r'\n{2,}', '\n', text)
text = re.sub(r'[ \t]{2,}', ' ', text)
return text.strip()
def load_satellite_map() -> dict:
"""Load the satellite-to-character mapping."""
try:
with open(SATELLITE_MAP_PATH) as f:
return json.load(f)
except Exception:
return {"default": "aria_default", "satellites": {}}
def set_active_tts_voice(character_id: str, tts_config: dict):
"""Write the active TTS config to a state file for the Wyoming TTS server to read."""
try:
ACTIVE_TTS_VOICE_PATH.parent.mkdir(parents=True, exist_ok=True)
state = {
"character_id": character_id,
"engine": tts_config.get("engine", "kokoro"),
"kokoro_voice": tts_config.get("kokoro_voice", ""),
"elevenlabs_voice_id": tts_config.get("elevenlabs_voice_id", ""),
"elevenlabs_model": tts_config.get("elevenlabs_model", "eleven_multilingual_v2"),
"speed": tts_config.get("speed", 1),
}
with open(ACTIVE_TTS_VOICE_PATH, "w") as f:
json.dump(state, f)
except Exception as e:
print(f"[OpenClaw Bridge] Warning: could not write active TTS config: {e}")
def resolve_character_id(satellite_id: str = None) -> str:
"""Resolve a satellite ID to a character profile ID."""
sat_map = load_satellite_map()
if satellite_id and satellite_id in sat_map.get("satellites", {}):
return sat_map["satellites"][satellite_id]
return sat_map.get("default", "aria_default")
def load_character(character_id: str = None) -> dict:
"""Load a character profile by ID. Returns the full character data dict."""
if not character_id:
character_id = resolve_character_id()
safe_id = character_id.replace("/", "_")
character_path = CHARACTERS_DIR / f"{safe_id}.json"
if not character_path.exists():
return ""
return {}
try:
with open(character_path) as f:
data = json.load(f)
return data.get("system_prompt", "")
profile = json.load(f)
return profile.get("data", {})
except Exception:
return {}
def load_character_prompt(satellite_id: str = None, character_id: str = None) -> str:
"""Load the full system prompt for a character, resolved by satellite or explicit ID.
Builds a rich prompt from system_prompt + profile fields (background, dialogue_style, etc.)."""
if not character_id:
character_id = resolve_character_id(satellite_id)
char = load_character(character_id)
if not char:
return ""
sections = []
# Core system prompt
prompt = char.get("system_prompt", "")
if prompt:
sections.append(prompt)
# Character profile fields
profile_parts = []
if char.get("background"):
profile_parts.append(f"## Background\n{char['background']}")
if char.get("appearance"):
profile_parts.append(f"## Appearance\n{char['appearance']}")
if char.get("dialogue_style"):
profile_parts.append(f"## Dialogue Style\n{char['dialogue_style']}")
if char.get("skills"):
skills = char["skills"]
if isinstance(skills, list):
skills_text = ", ".join(skills[:15])
else:
skills_text = str(skills)
profile_parts.append(f"## Skills & Interests\n{skills_text}")
if profile_parts:
sections.append("[Character Profile]\n" + "\n\n".join(profile_parts))
# Character metadata
meta_lines = []
if char.get("display_name"):
meta_lines.append(f"Your name is: {char['display_name']}")
# Support both v1 (gaze_preset string) and v2 (gaze_presets array)
gaze_presets = char.get("gaze_presets", [])
if gaze_presets and isinstance(gaze_presets, list):
for gp in gaze_presets:
preset = gp.get("preset", "")
trigger = gp.get("trigger", "self-portrait")
if preset:
meta_lines.append(f"GAZE preset '{preset}' — use for: {trigger}")
elif char.get("gaze_preset"):
meta_lines.append(f"Your gaze_preset for self-portraits is: {char['gaze_preset']}")
if meta_lines:
sections.append("[Character Metadata]\n" + "\n".join(meta_lines))
# Memories (personal + general)
personal, general = load_memories(character_id)
if personal:
sections.append("[Personal Memories]\n" + "\n".join(f"- {m}" for m in personal))
if general:
sections.append("[General Knowledge]\n" + "\n".join(f"- {m}" for m in general))
return "\n\n".join(sections)
def load_memories(character_id: str) -> tuple[list[str], list[str]]:
"""Load personal (per-character) and general memories.
Returns (personal_contents, general_contents) truncated to fit context budget."""
PERSONAL_BUDGET = 4000 # max chars for personal memories in prompt
GENERAL_BUDGET = 3000 # max chars for general memories in prompt
def _read_memories(path: Path, budget: int) -> list[str]:
try:
with open(path) as f:
data = json.load(f)
except Exception:
return []
memories = data.get("memories", [])
# Sort newest first
memories.sort(key=lambda m: m.get("createdAt", ""), reverse=True)
result = []
used = 0
for m in memories:
content = m.get("content", "").strip()
if not content:
continue
if used + len(content) > budget:
break
result.append(content)
used += len(content)
return result
safe_id = character_id.replace("/", "_")
personal = _read_memories(MEMORIES_DIR / "personal" / f"{safe_id}.json", PERSONAL_BUDGET)
general = _read_memories(MEMORIES_DIR / "general.json", GENERAL_BUDGET)
return personal, general
class OpenClawBridgeHandler(BaseHTTPRequestHandler):
"""HTTP request handler for OpenClaw bridge."""
@@ -95,7 +327,7 @@ class OpenClawBridgeHandler(BaseHTTPRequestHandler):
self._send_json_response(404, {"error": "Not found"})
def _handle_tts_request(self):
"""Handle TTS request and return wav audio."""
"""Handle TTS request and return audio. Routes to Kokoro or ElevenLabs based on engine."""
content_length = int(self.headers.get("Content-Length", 0))
if content_length == 0:
self._send_json_response(400, {"error": "Empty body"})
@@ -109,30 +341,64 @@ class OpenClawBridgeHandler(BaseHTTPRequestHandler):
return
text = data.get("text", "Hello, this is a test.")
# Strip emojis so TTS doesn't try to read them out
text = re.sub(
r'[\U0001F600-\U0001F64F\U0001F300-\U0001F5FF\U0001F680-\U0001F6FF'
r'\U0001F1E0-\U0001F1FF\U0001F900-\U0001F9FF\U0001FA00-\U0001FAFF'
r'\U00002702-\U000027B0\U0000FE00-\U0000FE0F\U0000200D'
r'\U00002600-\U000026FF\U00002300-\U000023FF]+', '', text
).strip()
text = clean_text_for_tts(text)
voice = data.get("voice", "af_heart")
engine = data.get("engine", "kokoro")
try:
# Run the async Wyoming client
audio_bytes = asyncio.run(self._synthesize_audio(text, voice))
# Signal avatar: speaking
_vtube_fire_and_forget("/expression", {"event": "speaking"})
if engine == "elevenlabs":
audio_bytes, content_type = self._synthesize_elevenlabs(text, voice, data.get("model"))
else:
# Default: local Kokoro via Wyoming
audio_bytes = asyncio.run(self._synthesize_audio(text, voice))
content_type = "audio/wav"
# Signal avatar: idle
_vtube_fire_and_forget("/expression", {"event": "idle"})
# Send WAV response
self.send_response(200)
self.send_header("Content-Type", "audio/wav")
# Allow CORS for local testing from Vite
self.send_header("Content-Type", content_type)
self.send_header("Access-Control-Allow-Origin", "*")
self.end_headers()
self.wfile.write(audio_bytes)
except Exception as e:
_vtube_fire_and_forget("/expression", {"event": "error"})
self._send_json_response(500, {"error": str(e)})
def _synthesize_elevenlabs(self, text: str, voice_id: str, model: str = None) -> tuple[bytes, str]:
"""Call ElevenLabs TTS API and return (audio_bytes, content_type)."""
api_key = os.environ.get("ELEVENLABS_API_KEY", "")
if not api_key:
raise RuntimeError("ELEVENLABS_API_KEY not set in environment")
if not voice_id:
raise RuntimeError("No ElevenLabs voice ID provided")
model = model or "eleven_multilingual_v2"
url = f"https://api.elevenlabs.io/v1/text-to-speech/{voice_id}"
payload = json.dumps({
"text": text,
"model_id": model,
"voice_settings": {"stability": 0.5, "similarity_boost": 0.75},
}).encode()
req = urllib.request.Request(
url,
data=payload,
headers={
"Content-Type": "application/json",
"xi-api-key": api_key,
"Accept": "audio/mpeg",
},
method="POST",
)
with urllib.request.urlopen(req, timeout=30) as resp:
audio_bytes = resp.read()
return audio_bytes, "audio/mpeg"
def do_OPTIONS(self):
"""Handle CORS preflight requests."""
self.send_response(204)
@@ -264,6 +530,46 @@ class OpenClawBridgeHandler(BaseHTTPRequestHandler):
print(f"[OpenClaw Bridge] Wake word detected: {wake_word_data.get('wake_word', 'unknown')}")
self._send_json_response(200, {"status": "ok", "message": "Wake word received"})
@staticmethod
def _call_openclaw(message: str, agent: str, timeout: int, model: str = None) -> str:
"""Call OpenClaw CLI and return stdout."""
cmd = ["/opt/homebrew/bin/openclaw", "agent", "--message", message, "--agent", agent]
if model:
cmd.extend(["--model", model])
result = subprocess.run(
cmd,
capture_output=True,
text=True,
timeout=timeout,
check=True,
)
return result.stdout.strip()
@staticmethod
def _needs_followup(response: str) -> bool:
"""Detect if the model promised to act but didn't actually do it.
Returns True if the response looks like a 'will do' without a result."""
if not response:
return False
resp_lower = response.lower()
# If the response contains a URL or JSON-like output, it probably completed
if "http://" in response or "https://" in response or '"status"' in response:
return False
# If it contains a tool result indicator (ha-ctl output, gaze-ctl output)
if any(kw in resp_lower for kw in ["image_url", "seed", "entity_id", "state:", "turned on", "turned off"]):
return False
# Detect promise-like language without substance
promise_phrases = [
"let me", "i'll ", "i will ", "sure thing", "sure,", "right away",
"generating", "one moment", "working on", "hang on", "just a moment",
"on it", "let me generate", "let me create",
]
has_promise = any(phrase in resp_lower for phrase in promise_phrases)
# Short responses with promise language are likely incomplete
if has_promise and len(response) < 200:
return True
return False
def _handle_agent_request(self):
"""Handle agent message request."""
content_length = int(self.headers.get("Content-Length", 0))
@@ -280,29 +586,72 @@ class OpenClawBridgeHandler(BaseHTTPRequestHandler):
message = data.get("message")
agent = data.get("agent", "main")
satellite_id = data.get("satellite_id")
explicit_character_id = data.get("character_id")
if not message:
self._send_json_response(400, {"error": "Message is required"})
return
# Inject system prompt
system_prompt = load_character_prompt()
# Resolve character: explicit ID > satellite mapping > default
if explicit_character_id:
character_id = explicit_character_id
else:
character_id = resolve_character_id(satellite_id)
system_prompt = load_character_prompt(character_id=character_id)
# Set the active TTS config for the Wyoming server to pick up
char = load_character(character_id)
tts_config = char.get("tts", {})
if tts_config:
set_active_tts_voice(character_id, tts_config)
engine = tts_config.get("engine", "kokoro")
voice_label = tts_config.get("kokoro_voice", "") if engine == "kokoro" else tts_config.get("elevenlabs_voice_id", "")
print(f"[OpenClaw Bridge] Active TTS: {engine} / {voice_label}")
if satellite_id:
print(f"[OpenClaw Bridge] Satellite: {satellite_id} → character: {character_id}")
elif explicit_character_id:
print(f"[OpenClaw Bridge] Character: {character_id}")
if system_prompt:
message = f"System Context: {system_prompt}\n\nUser Request: {message}"
# Load mode and resolve model routing
mode_data = load_mode()
model_override = resolve_model(mode_data)
active_model = model_override or DEFAULT_MODEL
if model_override:
print(f"[OpenClaw Bridge] Mode: PUBLIC → {model_override}")
else:
print(f"[OpenClaw Bridge] Mode: PRIVATE ({active_model})")
# Check if model is warm to set appropriate timeout
warm = is_model_warm()
timeout = TIMEOUT_WARM if warm else TIMEOUT_COLD
print(f"[OpenClaw Bridge] Model {'warm' if warm else 'cold'}, timeout={timeout}s")
# Signal avatar: thinking
_vtube_fire_and_forget("/expression", {"event": "thinking"})
# Call OpenClaw CLI (use full path for launchd compatibility)
try:
result = subprocess.run(
["/opt/homebrew/bin/openclaw", "agent", "--message", message, "--agent", agent],
capture_output=True,
text=True,
timeout=120,
check=True
)
response_text = result.stdout.strip()
self._send_json_response(200, {"response": response_text})
response_text = self._call_openclaw(message, agent, timeout, model=model_override)
# Re-prompt if the model promised to act but didn't call a tool.
# Detect "I'll do X" / "Let me X" responses that lack any result.
if self._needs_followup(response_text):
print(f"[OpenClaw Bridge] Response looks like a promise without action, re-prompting")
followup = (
"You just said you would do something but didn't actually call the exec tool. "
"Do NOT explain what you will do — call the tool NOW using exec and return the result."
)
response_text = self._call_openclaw(followup, agent, timeout, model=model_override)
# Signal avatar: idle (TTS handler will override to 'speaking' if voice is used)
_vtube_fire_and_forget("/expression", {"event": "idle"})
self._send_json_response(200, {"response": response_text, "model": active_model})
except subprocess.TimeoutExpired:
self._send_json_response(504, {"error": "OpenClaw command timed out"})
self._send_json_response(504, {"error": f"OpenClaw command timed out after {timeout}s (model was {'warm' if warm else 'cold'})"})
except subprocess.CalledProcessError as e:
error_msg = e.stderr.strip() if e.stderr else "OpenClaw command failed"
self._send_json_response(500, {"error": error_msg})

90
homeai-agent/reminder-daemon.py Executable file
View File

@@ -0,0 +1,90 @@
#!/usr/bin/env python3
"""
HomeAI Reminder Daemon — checks ~/homeai-data/reminders.json every 60s
and fires TTS via POST http://localhost:8081/api/tts when reminders are due.
"""
import json
import os
import time
import urllib.request
from datetime import datetime
REMINDERS_FILE = os.path.expanduser("~/homeai-data/reminders.json")
TTS_URL = "http://localhost:8081/api/tts"
CHECK_INTERVAL = 60 # seconds
def load_reminders():
try:
with open(REMINDERS_FILE) as f:
return json.load(f)
except (FileNotFoundError, json.JSONDecodeError):
return {"reminders": []}
def save_reminders(data):
with open(REMINDERS_FILE, "w") as f:
json.dump(data, f, indent=2)
def fire_tts(message):
"""Speak reminder via the OpenClaw bridge TTS endpoint."""
try:
payload = json.dumps({"text": f"Reminder: {message}"}).encode()
req = urllib.request.Request(
TTS_URL,
data=payload,
headers={"Content-Type": "application/json"},
method="POST"
)
urllib.request.urlopen(req, timeout=30)
print(f"[{datetime.now().isoformat()}] TTS fired: {message}")
return True
except Exception as e:
print(f"[{datetime.now().isoformat()}] TTS error: {e}")
return False
def check_reminders():
data = load_reminders()
now = datetime.now()
changed = False
for r in data.get("reminders", []):
if r.get("fired"):
continue
try:
due = datetime.fromisoformat(r["due_at"])
except (KeyError, ValueError):
continue
if now >= due:
print(f"[{now.isoformat()}] Reminder due: {r.get('message', '?')}")
fire_tts(r["message"])
r["fired"] = True
changed = True
if changed:
# Clean up fired reminders older than 24h
cutoff = (now.timestamp() - 86400) * 1000
data["reminders"] = [
r for r in data["reminders"]
if not r.get("fired") or int(r.get("id", "0")) > cutoff
]
save_reminders(data)
def main():
print(f"[{datetime.now().isoformat()}] Reminder daemon started (check every {CHECK_INTERVAL}s)")
while True:
try:
check_reminders()
except Exception as e:
print(f"[{datetime.now().isoformat()}] Error: {e}")
time.sleep(CHECK_INTERVAL)
if __name__ == "__main__":
main()

230
homeai-agent/setup.sh Normal file → Executable file
View File

@@ -1,17 +1,20 @@
#!/usr/bin/env bash
# homeai-agent/setup.sh — P4: OpenClaw agent + skills + mem0
# homeai-agent/setup.sh — OpenClaw agent, HTTP bridge, skills, reminder daemon
#
# Components:
# - OpenClaw — AI agent runtime (port 8080)
# - skills/ — home_assistant, memory, weather, timer, music stubs
# - mem0long-term memory (Chroma backend)
# - n8n workflows — morning briefing, notification router, memory backup
# - OpenClaw gateway — AI agent runtime (port 8080)
# - OpenClaw HTTP bridge — HA ↔ OpenClaw translator (port 8081)
# - 13 skillshome-assistant, image-generation, voice-assistant,
# vtube-studio, memory, service-monitor, character,
# routine, music, workflow, gitea, calendar, mode
# - Reminder daemon — fires TTS when reminders are due
#
# Prerequisites:
# - P1 (homeai-infra) — Home Assistant running, HA_TOKEN set
# - P2 (homeai-llm) — Ollama running with llama3.3:70b + nomic-embed-text
# - P3 (homeai-voice) — Wyoming TTS running (for voice output)
# - P5 (homeai-character) — aria.json character config exists
# - Ollama running (port 11434)
# - Home Assistant reachable (HA_TOKEN set in .env)
# - Wyoming TTS running (port 10301)
# - homeai-voice-env venv exists (for bridge + reminder daemon)
# - At least one character JSON in ~/homeai-data/characters/
set -euo pipefail
@@ -19,47 +22,196 @@ SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
REPO_DIR="$(cd "${SCRIPT_DIR}/.." && pwd)"
source "${REPO_DIR}/scripts/common.sh"
log_section "P4: Agent (OpenClaw + skills + mem0)"
log_section "P4: Agent (OpenClaw + HTTP Bridge + Skills)"
detect_platform
# ─── Prerequisite check ────────────────────────────────────────────────────────
# ─── Load environment ────────────────────────────────────────────────────────
ENV_FILE="${REPO_DIR}/.env"
if [[ -f "$ENV_FILE" ]]; then
log_info "Loading .env..."
load_env "$ENV_FILE"
else
log_warn "No .env found at ${ENV_FILE} — API keys may be missing"
fi
# ─── Prerequisite checks ────────────────────────────────────────────────────
log_info "Checking prerequisites..."
for service in "http://localhost:11434:Ollama(P2)" "http://localhost:8123:HomeAssistant(P1)"; do
url="${service%%:*}"; name="${service##*:}"
if ! curl -sf "$url" -o /dev/null 2>/dev/null; then
require_command node "brew install node"
require_command openclaw "npm install -g openclaw"
VOICE_ENV="${HOME}/homeai-voice-env"
if [[ ! -d "$VOICE_ENV" ]]; then
die "homeai-voice-env not found at $VOICE_ENV — run homeai-voice/setup.sh first"
fi
# Check key services (non-fatal)
for check in "http://localhost:11434:Ollama" "http://localhost:10301:Wyoming-TTS"; do
url="${check%%:*}"; name="${check##*:}"
if curl -sf "$url" -o /dev/null 2>/dev/null; then
log_success "$name reachable"
else
log_warn "$name not reachable at $url"
fi
done
load_env_services
if [[ -z "${HA_TOKEN:-}" ]]; then
log_warn "HA_TOKEN not set in ~/.env.services — needed for home_assistant skill"
# Check required env vars
MISSING_KEYS=()
[[ -z "${HA_TOKEN:-}" ]] && MISSING_KEYS+=("HA_TOKEN")
[[ -z "${ANTHROPIC_API_KEY:-}" ]] && MISSING_KEYS+=("ANTHROPIC_API_KEY")
if [[ ${#MISSING_KEYS[@]} -gt 0 ]]; then
log_warn "Missing env vars: ${MISSING_KEYS[*]} — set these in ${ENV_FILE}"
fi
# ─── TODO: Implementation ──────────────────────────────────────────────────────
# ─── Ensure data directories ─────────────────────────────────────────────────
DATA_DIR="${HOME}/homeai-data"
for dir in characters memories memories/personal conversations routines; do
mkdir -p "${DATA_DIR}/${dir}"
done
log_success "Data directories verified"
# ─── OpenClaw config ─────────────────────────────────────────────────────────
OPENCLAW_DIR="${HOME}/.openclaw"
OPENCLAW_CONFIG="${OPENCLAW_DIR}/openclaw.json"
if [[ ! -f "$OPENCLAW_CONFIG" ]]; then
die "OpenClaw config not found at $OPENCLAW_CONFIG — run: openclaw doctor --fix"
fi
log_success "OpenClaw config exists at $OPENCLAW_CONFIG"
# Verify Anthropic provider is configured
if ! grep -q '"anthropic"' "$OPENCLAW_CONFIG" 2>/dev/null; then
log_warn "Anthropic provider not found in openclaw.json — add it for Claude support"
fi
# ─── Install skills ──────────────────────────────────────────────────────────
SKILLS_SRC="${SCRIPT_DIR}/skills"
SKILLS_DEST="${OPENCLAW_DIR}/skills"
if [[ -d "$SKILLS_SRC" ]]; then
log_info "Syncing skills..."
mkdir -p "$SKILLS_DEST"
for skill_dir in "$SKILLS_SRC"/*/; do
skill_name="$(basename "$skill_dir")"
dest="${SKILLS_DEST}/${skill_name}"
if [[ -L "$dest" ]]; then
log_info " ${skill_name} (symlinked)"
elif [[ -d "$dest" ]]; then
# Replace copy with symlink
rm -rf "$dest"
ln -s "$skill_dir" "$dest"
log_step "${skill_name} → symlinked"
else
ln -s "$skill_dir" "$dest"
log_step "${skill_name} → installed"
fi
done
log_success "Skills synced ($(ls -d "$SKILLS_DEST"/*/ 2>/dev/null | wc -l | tr -d ' ') total)"
else
log_warn "No skills directory at $SKILLS_SRC"
fi
# ─── Install launchd services (macOS) ────────────────────────────────────────
if [[ "$OS_TYPE" == "macos" ]]; then
log_info "Installing launchd agents..."
LAUNCHD_DIR="${SCRIPT_DIR}/launchd"
AGENTS_DIR="${HOME}/Library/LaunchAgents"
mkdir -p "$AGENTS_DIR"
# Inject API keys into plists that need them
_inject_plist_key() {
local plist="$1" key="$2" value="$3"
if [[ -n "$value" ]] && grep -q "<key>${key}</key>" "$plist" 2>/dev/null; then
# Use python for reliable XML-safe replacement
python3 -c "
import sys, re
with open('$plist') as f: content = f.read()
pattern = r'(<key>${key}</key>\s*<string>)[^<]*(</string>)'
content = re.sub(pattern, r'\g<1>${value}\g<2>', content)
with open('$plist', 'w') as f: f.write(content)
"
fi
}
# Update API keys in plist source files before linking
OPENCLAW_PLIST="${LAUNCHD_DIR}/com.homeai.openclaw.plist"
BRIDGE_PLIST="${LAUNCHD_DIR}/com.homeai.openclaw-bridge.plist"
if [[ -f "$OPENCLAW_PLIST" ]]; then
_inject_plist_key "$OPENCLAW_PLIST" "ANTHROPIC_API_KEY" "${ANTHROPIC_API_KEY:-}"
_inject_plist_key "$OPENCLAW_PLIST" "OPENAI_API_KEY" "${OPENAI_API_KEY:-}"
_inject_plist_key "$OPENCLAW_PLIST" "HA_TOKEN" "${HA_TOKEN:-}"
_inject_plist_key "$OPENCLAW_PLIST" "HASS_TOKEN" "${HA_TOKEN:-}"
_inject_plist_key "$OPENCLAW_PLIST" "GITEA_TOKEN" "${GITEA_TOKEN:-}"
_inject_plist_key "$OPENCLAW_PLIST" "N8N_API_KEY" "${N8N_API_KEY:-}"
fi
if [[ -f "$BRIDGE_PLIST" ]]; then
_inject_plist_key "$BRIDGE_PLIST" "ANTHROPIC_API_KEY" "${ANTHROPIC_API_KEY:-}"
_inject_plist_key "$BRIDGE_PLIST" "ELEVENLABS_API_KEY" "${ELEVENLABS_API_KEY:-}"
fi
# Symlink and load each plist
for plist in "$LAUNCHD_DIR"/*.plist; do
[[ ! -f "$plist" ]] && continue
plist_name="$(basename "$plist")"
plist_label="${plist_name%.plist}"
dest="${AGENTS_DIR}/${plist_name}"
# Unload if already running
launchctl bootout "gui/$(id -u)/${plist_label}" 2>/dev/null || true
# Symlink source → LaunchAgents
ln -sf "$(cd "$(dirname "$plist")" && pwd)/${plist_name}" "$dest"
# Load
launchctl bootstrap "gui/$(id -u)" "$dest" 2>/dev/null && \
log_success " ${plist_label} → loaded" || \
log_warn " ${plist_label} → failed to load (check: launchctl print gui/$(id -u)/${plist_label})"
done
fi
# ─── Smoke test ──────────────────────────────────────────────────────────────
log_info "Running smoke tests..."
sleep 2 # Give services a moment to start
# Check gateway
if curl -sf "http://localhost:8080" -o /dev/null 2>/dev/null; then
log_success "OpenClaw gateway responding on :8080"
else
log_warn "OpenClaw gateway not responding on :8080 — check: tail /tmp/homeai-openclaw.log"
fi
# Check bridge
if curl -sf "http://localhost:8081/status" -o /dev/null 2>/dev/null; then
log_success "HTTP bridge responding on :8081"
else
log_warn "HTTP bridge not responding on :8081 — check: tail /tmp/homeai-openclaw-bridge.log"
fi
# ─── Summary ─────────────────────────────────────────────────────────────────
print_summary "Agent Setup Complete" \
"OpenClaw gateway" "http://localhost:8080" \
"HTTP bridge" "http://localhost:8081" \
"OpenClaw config" "$OPENCLAW_CONFIG" \
"Skills directory" "$SKILLS_DEST" \
"Character data" "${DATA_DIR}/characters/" \
"Memory data" "${DATA_DIR}/memories/" \
"Reminder data" "${DATA_DIR}/reminders.json" \
"Gateway log" "/tmp/homeai-openclaw.log" \
"Bridge log" "/tmp/homeai-openclaw-bridge.log"
cat <<'EOF'
┌─────────────────────────────────────────────────────────────────┐
P4: homeai-agent — NOT YET IMPLEMENTED │
│ │
│ OPEN QUESTION: Which OpenClaw version/fork to use? │
│ Decide before implementing. See homeai-agent/PLAN.md. │
│ │
Implementation steps: │
1. Install OpenClaw (pip install or git clone) │
│ 2. Create ~/.openclaw/config.yaml from config/config.yaml.example │
│ 3. Create skills: home_assistant, memory, weather, timer, music│
│ 4. Install mem0 + Chroma backend │
│ 5. Create systemd/launchd service for OpenClaw (port 8080) │
│ 6. Import n8n workflows from workflows/ │
│ 7. Smoke test: POST /chat "turn on living room lights" │
│ │
│ Interface contracts: │
│ OPENCLAW_URL=http://localhost:8080 │
└─────────────────────────────────────────────────────────────────┘
To reload a service after editing its plist:
launchctl bootout gui/$(id -u)/com.homeai.<service>
launchctl bootstrap gui/$(id -u) ~/Library/LaunchAgents/com.homeai.<service>.plist
To test the agent:
curl -X POST http://localhost:8081/api/agent/message \
-H 'Content-Type: application/json' \
-d '{"message":"say hello","agent":"main"}'
EOF
log_info "P4 is not yet implemented. See homeai-agent/PLAN.md for details."
exit 0

View File

@@ -24,6 +24,8 @@
<string>/opt/homebrew/bin:/usr/local/bin:/usr/bin:/bin</string>
<key>HOME</key>
<string>/Users/aodhan</string>
<key>GAZE_API_KEY</key>
<string>e63401f17e4845e1059f830267f839fe7fc7b6083b1cb1730863318754d799f4</string>
</dict>
<key>RunAtLoad</key>

View File

@@ -1,15 +1,24 @@
{
"$schema": "http://json-schema.org/draft-07/schema#",
"title": "HomeAI Character Config",
"version": "1",
"version": "2",
"type": "object",
"required": ["schema_version", "name", "system_prompt", "tts"],
"properties": {
"schema_version": { "type": "integer", "const": 1 },
"schema_version": { "type": "integer", "enum": [1, 2] },
"name": { "type": "string" },
"display_name": { "type": "string" },
"description": { "type": "string" },
"background": { "type": "string", "description": "Backstory, lore, or general prompt enrichment" },
"dialogue_style": { "type": "string", "description": "How the persona speaks or reacts, with example lines" },
"appearance": { "type": "string", "description": "Physical description, also used for image prompting" },
"skills": {
"type": "array",
"description": "Topics the persona specialises in or enjoys talking about",
"items": { "type": "string" }
},
"system_prompt": { "type": "string" },
"model_overrides": {
@@ -31,35 +40,21 @@
"voice_ref_path": { "type": "string" },
"kokoro_voice": { "type": "string" },
"elevenlabs_voice_id": { "type": "string" },
"elevenlabs_voice_name": { "type": "string" },
"elevenlabs_model": { "type": "string", "default": "eleven_monolingual_v1" },
"speed": { "type": "number", "default": 1.0 }
}
},
"live2d_expressions": {
"type": "object",
"description": "Maps semantic state to VTube Studio hotkey ID",
"properties": {
"idle": { "type": "string" },
"listening": { "type": "string" },
"thinking": { "type": "string" },
"speaking": { "type": "string" },
"happy": { "type": "string" },
"sad": { "type": "string" },
"surprised": { "type": "string" },
"error": { "type": "string" }
}
},
"vtube_ws_triggers": {
"type": "object",
"description": "VTube Studio WebSocket actions keyed by event name",
"additionalProperties": {
"gaze_presets": {
"type": "array",
"description": "GAZE image generation presets with trigger conditions",
"items": {
"type": "object",
"required": ["preset"],
"properties": {
"type": { "type": "string", "enum": ["hotkey", "parameter"] },
"id": { "type": "string" },
"value": { "type": "number" }
"preset": { "type": "string" },
"trigger": { "type": "string", "default": "self-portrait" }
}
}
},
@@ -78,5 +73,6 @@
},
"notes": { "type": "string" }
}
},
"additionalProperties": true
}

View File

@@ -3,6 +3,7 @@ import Dashboard from './pages/Dashboard';
import Chat from './pages/Chat';
import Characters from './pages/Characters';
import Editor from './pages/Editor';
import Memories from './pages/Memories';
function NavItem({ to, children, icon }) {
return (
@@ -77,6 +78,17 @@ function Layout({ children }) {
Characters
</NavItem>
<NavItem
to="/memories"
icon={
<svg className="w-5 h-5" fill="none" viewBox="0 0 24 24" stroke="currentColor" strokeWidth={1.5}>
<path strokeLinecap="round" strokeLinejoin="round" d="M12 18v-5.25m0 0a6.01 6.01 0 001.5-.189m-1.5.189a6.01 6.01 0 01-1.5-.189m3.75 7.478a12.06 12.06 0 01-4.5 0m3.75 2.383a14.406 14.406 0 01-3 0M14.25 18v-.192c0-.983.658-1.823 1.508-2.316a7.5 7.5 0 10-7.517 0c.85.493 1.509 1.333 1.509 2.316V18" />
</svg>
}
>
Memories
</NavItem>
<NavItem
to="/editor"
icon={
@@ -113,6 +125,7 @@ function App() {
<Route path="/" element={<div className="flex-1 overflow-y-auto p-8"><div className="max-w-6xl mx-auto"><Dashboard /></div></div>} />
<Route path="/chat" element={<Chat />} />
<Route path="/characters" element={<div className="flex-1 overflow-y-auto p-8"><div className="max-w-6xl mx-auto"><Characters /></div></div>} />
<Route path="/memories" element={<div className="flex-1 overflow-y-auto p-8"><div className="max-w-6xl mx-auto"><Memories /></div></div>} />
<Route path="/editor" element={<div className="flex-1 overflow-y-auto p-8"><div className="max-w-6xl mx-auto"><Editor /></div></div>} />
</Routes>
</Layout>

View File

@@ -2,8 +2,10 @@ import { useEffect, useRef } from 'react'
import MessageBubble from './MessageBubble'
import ThinkingIndicator from './ThinkingIndicator'
export default function ChatPanel({ messages, isLoading, onReplay }) {
export default function ChatPanel({ messages, isLoading, onReplay, character }) {
const bottomRef = useRef(null)
const name = character?.name || 'AI'
const image = character?.image || null
useEffect(() => {
bottomRef.current?.scrollIntoView({ behavior: 'smooth' })
@@ -13,10 +15,14 @@ export default function ChatPanel({ messages, isLoading, onReplay }) {
return (
<div className="flex-1 flex items-center justify-center">
<div className="text-center">
<div className="w-16 h-16 rounded-full bg-indigo-600/20 flex items-center justify-center mx-auto mb-4">
<span className="text-indigo-400 text-2xl">AI</span>
</div>
<h2 className="text-xl font-medium text-gray-200 mb-2">Hi, I'm Aria</h2>
{image ? (
<img src={image} alt={name} className="w-20 h-20 rounded-full object-cover mx-auto mb-4 ring-2 ring-indigo-500/30" />
) : (
<div className="w-20 h-20 rounded-full bg-indigo-600/20 flex items-center justify-center mx-auto mb-4">
<span className="text-indigo-400 text-2xl">{name[0]}</span>
</div>
)}
<h2 className="text-xl font-medium text-gray-200 mb-2">Hi, I'm {name}</h2>
<p className="text-gray-500 text-sm">Type a message or press the mic to talk</p>
</div>
</div>
@@ -26,9 +32,9 @@ export default function ChatPanel({ messages, isLoading, onReplay }) {
return (
<div className="flex-1 overflow-y-auto py-4">
{messages.map((msg) => (
<MessageBubble key={msg.id} message={msg} onReplay={onReplay} />
<MessageBubble key={msg.id} message={msg} onReplay={onReplay} character={character} />
))}
{isLoading && <ThinkingIndicator />}
{isLoading && <ThinkingIndicator character={character} />}
<div ref={bottomRef} />
</div>
)

View File

@@ -0,0 +1,70 @@
function timeAgo(dateStr) {
if (!dateStr) return ''
const diff = Date.now() - new Date(dateStr).getTime()
const mins = Math.floor(diff / 60000)
if (mins < 1) return 'just now'
if (mins < 60) return `${mins}m ago`
const hours = Math.floor(mins / 60)
if (hours < 24) return `${hours}h ago`
const days = Math.floor(hours / 24)
return `${days}d ago`
}
export default function ConversationList({ conversations, activeId, onCreate, onSelect, onDelete }) {
return (
<div className="w-72 border-r border-gray-800 flex flex-col bg-gray-950 shrink-0">
{/* New chat button */}
<div className="p-3 border-b border-gray-800">
<button
onClick={onCreate}
className="w-full flex items-center justify-center gap-2 px-3 py-2 bg-indigo-600 hover:bg-indigo-500 text-white text-sm rounded-lg transition-colors"
>
<svg className="w-4 h-4" fill="none" viewBox="0 0 24 24" stroke="currentColor" strokeWidth={2}>
<path strokeLinecap="round" strokeLinejoin="round" d="M12 4.5v15m7.5-7.5h-15" />
</svg>
New chat
</button>
</div>
{/* Conversation list */}
<div className="flex-1 overflow-y-auto">
{conversations.length === 0 ? (
<p className="text-xs text-gray-600 text-center py-6">No conversations yet</p>
) : (
conversations.map(conv => (
<div
key={conv.id}
onClick={() => onSelect(conv.id)}
className={`group flex items-start gap-2 px-3 py-2.5 cursor-pointer border-b border-gray-800/50 transition-colors ${
conv.id === activeId
? 'bg-gray-800 text-white'
: 'text-gray-400 hover:bg-gray-800/50 hover:text-gray-200'
}`}
>
<div className="flex-1 min-w-0">
<p className="text-sm truncate">
{conv.title || 'New conversation'}
</p>
<div className="flex items-center gap-2 mt-0.5">
{conv.characterName && (
<span className="text-xs text-indigo-400/70">{conv.characterName}</span>
)}
<span className="text-xs text-gray-600">{timeAgo(conv.updatedAt)}</span>
</div>
</div>
<button
onClick={(e) => { e.stopPropagation(); onDelete(conv.id) }}
className="opacity-0 group-hover:opacity-100 p-1 text-gray-500 hover:text-red-400 transition-all shrink-0 mt-0.5"
title="Delete"
>
<svg className="w-3.5 h-3.5" fill="none" viewBox="0 0 24 24" stroke="currentColor" strokeWidth={2}>
<path strokeLinecap="round" strokeLinejoin="round" d="M14.74 9l-.346 9m-4.788 0L9.26 9m9.968-3.21c.342.052.682.107 1.022.166m-1.022-.165L18.16 19.673a2.25 2.25 0 01-2.244 2.077H8.084a2.25 2.25 0 01-2.244-2.077L4.772 5.79m14.456 0a48.108 48.108 0 00-3.478-.397m-12 .562c.34-.059.68-.114 1.022-.165m0 0a48.11 48.11 0 013.478-.397m7.5 0v-.916c0-1.18-.91-2.164-2.09-2.201a51.964 51.964 0 00-3.32 0c-1.18.037-2.09 1.022-2.09 2.201v.916m7.5 0a48.667 48.667 0 00-7.5 0" />
</svg>
</button>
</div>
))
)}
</div>
</div>
)
}

View File

@@ -1,14 +1,100 @@
export default function MessageBubble({ message, onReplay }) {
import { useState } from 'react'
function Avatar({ character }) {
const name = character?.name || 'AI'
const image = character?.image || null
if (image) {
return <img src={image} alt={name} className="w-8 h-8 rounded-full object-cover shrink-0 mt-0.5 ring-1 ring-gray-700" />
}
return (
<div className="w-8 h-8 rounded-full bg-indigo-600/20 flex items-center justify-center shrink-0 mt-0.5">
<span className="text-indigo-400 text-sm">{name[0]}</span>
</div>
)
}
function ImageOverlay({ src, onClose }) {
return (
<div
className="fixed inset-0 z-50 bg-black/80 flex items-center justify-center cursor-zoom-out"
onClick={onClose}
>
<img
src={src}
alt="Full size"
className="max-w-[90vw] max-h-[90vh] object-contain rounded-lg shadow-2xl"
onClick={(e) => e.stopPropagation()}
/>
<button
onClick={onClose}
className="absolute top-4 right-4 text-white/70 hover:text-white transition-colors p-2"
>
<svg className="w-6 h-6" fill="none" viewBox="0 0 24 24" stroke="currentColor" strokeWidth={2}>
<path strokeLinecap="round" strokeLinejoin="round" d="M6 18L18 6M6 6l12 12" />
</svg>
</button>
</div>
)
}
const IMAGE_URL_RE = /(https?:\/\/[^\s]+\.(?:png|jpg|jpeg|gif|webp))/gi
function RichContent({ text }) {
const [overlayImage, setOverlayImage] = useState(null)
const parts = []
let lastIndex = 0
let match
IMAGE_URL_RE.lastIndex = 0
while ((match = IMAGE_URL_RE.exec(text)) !== null) {
if (match.index > lastIndex) {
parts.push({ type: 'text', value: text.slice(lastIndex, match.index) })
}
parts.push({ type: 'image', value: match[1] })
lastIndex = IMAGE_URL_RE.lastIndex
}
if (lastIndex < text.length) {
parts.push({ type: 'text', value: text.slice(lastIndex) })
}
if (parts.length === 1 && parts[0].type === 'text') {
return <>{text}</>
}
return (
<>
{parts.map((part, i) =>
part.type === 'image' ? (
<button
key={i}
onClick={() => setOverlayImage(part.value)}
className="block my-2 cursor-zoom-in"
>
<img
src={part.value}
alt="Generated image"
className="rounded-xl max-w-full max-h-80 object-contain"
loading="lazy"
/>
</button>
) : (
<span key={i}>{part.value}</span>
)
)}
{overlayImage && <ImageOverlay src={overlayImage} onClose={() => setOverlayImage(null)} />}
</>
)
}
export default function MessageBubble({ message, onReplay, character }) {
const isUser = message.role === 'user'
return (
<div className={`flex ${isUser ? 'justify-end' : 'justify-start'} px-4 py-1.5`}>
<div className={`flex items-start gap-3 max-w-[80%] ${isUser ? 'flex-row-reverse' : ''}`}>
{!isUser && (
<div className="w-8 h-8 rounded-full bg-indigo-600/20 flex items-center justify-center shrink-0 mt-0.5">
<span className="text-indigo-400 text-sm">AI</span>
</div>
)}
{!isUser && <Avatar character={character} />}
<div>
<div
className={`rounded-2xl px-4 py-2.5 text-sm leading-relaxed whitespace-pre-wrap ${
@@ -19,18 +105,27 @@ export default function MessageBubble({ message, onReplay }) {
: 'bg-gray-800 text-gray-100'
}`}
>
{message.content}
{isUser ? message.content : <RichContent text={message.content} />}
</div>
{!isUser && !message.isError && onReplay && (
<button
onClick={() => onReplay(message.content)}
className="mt-1 ml-1 text-gray-500 hover:text-indigo-400 transition-colors"
title="Replay audio"
>
<svg className="w-4 h-4" fill="none" viewBox="0 0 24 24" stroke="currentColor" strokeWidth={2}>
<path strokeLinecap="round" strokeLinejoin="round" d="M15.536 8.464a5 5 0 010 7.072M17.95 6.05a8 8 0 010 11.9M6.5 9H4a1 1 0 00-1 1v4a1 1 0 001 1h2.5l4 4V5l-4 4z" />
</svg>
</button>
{!isUser && (
<div className="flex items-center gap-2 mt-1 ml-1">
{message.model && (
<span className="text-[10px] text-gray-500 font-mono">
{message.model}
</span>
)}
{!message.isError && onReplay && (
<button
onClick={() => onReplay(message.content)}
className="text-gray-500 hover:text-indigo-400 transition-colors"
title="Replay audio"
>
<svg className="w-4 h-4" fill="none" viewBox="0 0 24 24" stroke="currentColor" strokeWidth={2}>
<path strokeLinecap="round" strokeLinejoin="round" d="M15.536 8.464a5 5 0 010 7.072M17.95 6.05a8 8 0 010 11.9M6.5 9H4a1 1 0 00-1 1v4a1 1 0 001 1h2.5l4 4V5l-4 4z" />
</svg>
</button>
)}
</div>
)}
</div>
</div>

View File

@@ -1,8 +1,10 @@
import { VOICES } from '../lib/constants'
import { VOICES, TTS_ENGINES } from '../lib/constants'
export default function SettingsDrawer({ isOpen, onClose, settings, onUpdate }) {
if (!isOpen) return null
const isKokoro = !settings.ttsEngine || settings.ttsEngine === 'kokoro'
return (
<>
<div className="fixed inset-0 bg-black/50 z-40" onClick={onClose} />
@@ -16,18 +18,48 @@ export default function SettingsDrawer({ isOpen, onClose, settings, onUpdate })
</button>
</div>
<div className="flex-1 overflow-y-auto p-4 space-y-5">
{/* TTS Engine */}
<div>
<label className="block text-xs font-medium text-gray-400 mb-1.5">TTS Engine</label>
<select
value={settings.ttsEngine || 'kokoro'}
onChange={(e) => onUpdate('ttsEngine', e.target.value)}
className="w-full bg-gray-800 text-gray-200 text-sm rounded-lg px-3 py-2 border border-gray-700 focus:outline-none focus:border-indigo-500"
>
{TTS_ENGINES.map((e) => (
<option key={e.id} value={e.id}>{e.label}</option>
))}
</select>
</div>
{/* Voice */}
<div>
<label className="block text-xs font-medium text-gray-400 mb-1.5">Voice</label>
<select
value={settings.voice}
onChange={(e) => onUpdate('voice', e.target.value)}
className="w-full bg-gray-800 text-gray-200 text-sm rounded-lg px-3 py-2 border border-gray-700 focus:outline-none focus:border-indigo-500"
>
{VOICES.map((v) => (
<option key={v.id} value={v.id}>{v.label}</option>
))}
</select>
{isKokoro ? (
<select
value={settings.voice}
onChange={(e) => onUpdate('voice', e.target.value)}
className="w-full bg-gray-800 text-gray-200 text-sm rounded-lg px-3 py-2 border border-gray-700 focus:outline-none focus:border-indigo-500"
>
{VOICES.map((v) => (
<option key={v.id} value={v.id}>{v.label}</option>
))}
</select>
) : (
<div>
<input
type="text"
value={settings.voice || ''}
onChange={(e) => onUpdate('voice', e.target.value)}
className="w-full bg-gray-800 text-gray-200 text-sm rounded-lg px-3 py-2 border border-gray-700 focus:outline-none focus:border-indigo-500"
placeholder={settings.ttsEngine === 'elevenlabs' ? 'ElevenLabs voice ID' : 'Voice identifier'}
readOnly
/>
<p className="text-xs text-gray-500 mt-1">
Set via active character profile
</p>
</div>
)}
</div>
{/* Auto TTS */}

View File

@@ -1,9 +1,16 @@
export default function ThinkingIndicator() {
export default function ThinkingIndicator({ character }) {
const name = character?.name || 'AI'
const image = character?.image || null
return (
<div className="flex items-start gap-3 px-4 py-3">
<div className="w-8 h-8 rounded-full bg-indigo-600/20 flex items-center justify-center shrink-0">
<span className="text-indigo-400 text-sm">AI</span>
</div>
{image ? (
<img src={image} alt={name} className="w-8 h-8 rounded-full object-cover shrink-0 ring-1 ring-gray-700" />
) : (
<div className="w-8 h-8 rounded-full bg-indigo-600/20 flex items-center justify-center shrink-0">
<span className="text-indigo-400 text-sm">{name[0]}</span>
</div>
)}
<div className="flex items-center gap-1 pt-2.5">
<span className="w-2 h-2 rounded-full bg-gray-400 animate-[bounce_1.4s_ease-in-out_infinite]" />
<span className="w-2 h-2 rounded-full bg-gray-400 animate-[bounce_1.4s_ease-in-out_0.2s_infinite]" />

View File

@@ -0,0 +1,28 @@
import { useState, useEffect } from 'react'
const ACTIVE_KEY = 'homeai_active_character'
export function useActiveCharacter() {
const [character, setCharacter] = useState(null)
useEffect(() => {
const activeId = localStorage.getItem(ACTIVE_KEY)
if (!activeId) return
fetch(`/api/characters/${activeId}`)
.then(r => r.ok ? r.json() : null)
.then(profile => {
if (profile) {
setCharacter({
id: profile.id,
name: profile.data.display_name || profile.data.name || 'AI',
image: profile.image || null,
tts: profile.data.tts || null,
})
}
})
.catch(() => {})
}, [])
return character
}

View File

@@ -1,45 +1,125 @@
import { useState, useCallback } from 'react'
import { useState, useCallback, useEffect, useRef } from 'react'
import { sendMessage } from '../lib/api'
import { getConversation, saveConversation } from '../lib/conversationApi'
export function useChat() {
export function useChat(conversationId, conversationMeta, onConversationUpdate) {
const [messages, setMessages] = useState([])
const [isLoading, setIsLoading] = useState(false)
const [isLoadingConv, setIsLoadingConv] = useState(false)
const convRef = useRef(null)
const idRef = useRef(conversationId)
const send = useCallback(async (text) => {
// Keep idRef in sync
useEffect(() => { idRef.current = conversationId }, [conversationId])
// Load conversation from server when ID changes
useEffect(() => {
if (!conversationId) {
setMessages([])
convRef.current = null
return
}
let cancelled = false
setIsLoadingConv(true)
getConversation(conversationId).then(conv => {
if (cancelled) return
if (conv) {
convRef.current = conv
setMessages(conv.messages || [])
} else {
convRef.current = null
setMessages([])
}
setIsLoadingConv(false)
}).catch(() => {
if (!cancelled) {
convRef.current = null
setMessages([])
setIsLoadingConv(false)
}
})
return () => { cancelled = true }
}, [conversationId])
// Persist conversation to server
const persist = useCallback(async (updatedMessages, title, overrideId) => {
const id = overrideId || idRef.current
if (!id) return
const now = new Date().toISOString()
const conv = {
id,
title: title || convRef.current?.title || '',
characterId: conversationMeta?.characterId || convRef.current?.characterId || '',
characterName: conversationMeta?.characterName || convRef.current?.characterName || '',
createdAt: convRef.current?.createdAt || now,
updatedAt: now,
messages: updatedMessages,
}
convRef.current = conv
await saveConversation(conv).catch(() => {})
if (onConversationUpdate) {
onConversationUpdate(id, {
title: conv.title,
updatedAt: conv.updatedAt,
messageCount: conv.messages.length,
})
}
}, [conversationMeta, onConversationUpdate])
// send accepts an optional overrideId for when the conversation was just created
const send = useCallback(async (text, overrideId) => {
if (!text.trim() || isLoading) return null
const userMsg = { id: Date.now(), role: 'user', content: text.trim(), timestamp: new Date() }
setMessages((prev) => [...prev, userMsg])
const userMsg = { id: Date.now(), role: 'user', content: text.trim(), timestamp: new Date().toISOString() }
const isFirstMessage = messages.length === 0
const newMessages = [...messages, userMsg]
setMessages(newMessages)
setIsLoading(true)
try {
const response = await sendMessage(text.trim())
const { response, model } = await sendMessage(text.trim(), conversationMeta?.characterId || null)
const assistantMsg = {
id: Date.now() + 1,
role: 'assistant',
content: response,
timestamp: new Date(),
timestamp: new Date().toISOString(),
...(model && { model }),
}
setMessages((prev) => [...prev, assistantMsg])
const allMessages = [...newMessages, assistantMsg]
setMessages(allMessages)
const title = isFirstMessage
? text.trim().slice(0, 80) + (text.trim().length > 80 ? '...' : '')
: undefined
await persist(allMessages, title, overrideId)
return response
} catch (err) {
const errorMsg = {
id: Date.now() + 1,
role: 'assistant',
content: `Error: ${err.message}`,
timestamp: new Date(),
timestamp: new Date().toISOString(),
isError: true,
}
setMessages((prev) => [...prev, errorMsg])
const allMessages = [...newMessages, errorMsg]
setMessages(allMessages)
await persist(allMessages, undefined, overrideId)
return null
} finally {
setIsLoading(false)
}
}, [isLoading])
}, [isLoading, messages, persist])
const clearHistory = useCallback(() => {
const clearHistory = useCallback(async () => {
setMessages([])
}, [])
if (idRef.current) {
await persist([], undefined)
}
}, [persist])
return { messages, isLoading, send, clearHistory }
return { messages, isLoading, isLoadingConv, send, clearHistory }
}

View File

@@ -0,0 +1,66 @@
import { useState, useEffect, useCallback } from 'react'
import { listConversations, saveConversation, deleteConversation as deleteConv } from '../lib/conversationApi'
const ACTIVE_KEY = 'homeai_active_conversation'
export function useConversations() {
const [conversations, setConversations] = useState([])
const [activeId, setActiveId] = useState(() => localStorage.getItem(ACTIVE_KEY) || null)
const [isLoading, setIsLoading] = useState(true)
const loadList = useCallback(async () => {
try {
const list = await listConversations()
setConversations(list)
} catch {
setConversations([])
} finally {
setIsLoading(false)
}
}, [])
useEffect(() => { loadList() }, [loadList])
const select = useCallback((id) => {
setActiveId(id)
if (id) {
localStorage.setItem(ACTIVE_KEY, id)
} else {
localStorage.removeItem(ACTIVE_KEY)
}
}, [])
const create = useCallback(async (characterId, characterName) => {
const id = `conv_${Date.now()}`
const now = new Date().toISOString()
const conv = {
id,
title: '',
characterId: characterId || '',
characterName: characterName || '',
createdAt: now,
updatedAt: now,
messages: [],
}
await saveConversation(conv)
setConversations(prev => [{ ...conv, messageCount: 0 }, ...prev])
select(id)
return id
}, [select])
const remove = useCallback(async (id) => {
await deleteConv(id)
setConversations(prev => prev.filter(c => c.id !== id))
if (activeId === id) {
select(null)
}
}, [activeId, select])
const updateMeta = useCallback((id, updates) => {
setConversations(prev => prev.map(c =>
c.id === id ? { ...c, ...updates } : c
))
}, [])
return { conversations, activeId, isLoading, select, create, remove, updateMeta, refresh: loadList }
}

View File

@@ -1,7 +1,7 @@
import { useState, useRef, useCallback } from 'react'
import { synthesize } from '../lib/api'
export function useTtsPlayback(voice) {
export function useTtsPlayback(voice, engine = 'kokoro', model = null) {
const [isPlaying, setIsPlaying] = useState(false)
const audioCtxRef = useRef(null)
const sourceRef = useRef(null)
@@ -23,7 +23,7 @@ export function useTtsPlayback(voice) {
setIsPlaying(true)
try {
const audioData = await synthesize(text, voice)
const audioData = await synthesize(text, voice, engine, model)
const ctx = getAudioContext()
if (ctx.state === 'suspended') await ctx.resume()
@@ -42,7 +42,7 @@ export function useTtsPlayback(voice) {
console.error('TTS playback error:', err)
setIsPlaying(false)
}
}, [voice])
}, [voice, engine, model])
const stop = useCallback(() => {
if (sourceRef.current) {

View File

@@ -4,7 +4,43 @@ import schema from '../../schema/character.schema.json'
const ajv = new Ajv({ allErrors: true, strict: false })
const validate = ajv.compile(schema)
/**
* Migrate a v1 character config to v2 in-place.
* Removes live2d/vtube fields, converts gaze_preset to gaze_presets array,
* and initialises new persona fields.
*/
export function migrateV1toV2(config) {
config.schema_version = 2
// Remove deprecated fields
delete config.live2d_expressions
delete config.vtube_ws_triggers
// Convert single gaze_preset string → gaze_presets array
if ('gaze_preset' in config) {
const old = config.gaze_preset
config.gaze_presets = old ? [{ preset: old, trigger: 'self-portrait' }] : []
delete config.gaze_preset
}
if (!config.gaze_presets) {
config.gaze_presets = []
}
// Initialise new fields if absent
if (config.background === undefined) config.background = ''
if (config.dialogue_style === undefined) config.dialogue_style = ''
if (config.appearance === undefined) config.appearance = ''
if (config.skills === undefined) config.skills = []
return config
}
export function validateCharacter(config) {
// Auto-migrate v1 → v2
if (config.schema_version === 1 || config.schema_version === undefined) {
migrateV1toV2(config)
}
const valid = validate(config)
if (!valid) {
throw new Error(ajv.errorsText(validate.errors))

View File

@@ -1,22 +1,46 @@
export async function sendMessage(text) {
const res = await fetch('/api/agent/message', {
const MAX_RETRIES = 3
const RETRY_DELAY_MS = 2000
async function fetchWithRetry(url, options, retries = MAX_RETRIES) {
for (let attempt = 1; attempt <= retries; attempt++) {
try {
const res = await fetch(url, options)
if (res.status === 502 && attempt < retries) {
// Bridge unreachable — wait and retry
await new Promise(r => setTimeout(r, RETRY_DELAY_MS * attempt))
continue
}
return res
} catch (err) {
if (attempt >= retries) throw err
await new Promise(r => setTimeout(r, RETRY_DELAY_MS * attempt))
}
}
}
export async function sendMessage(text, characterId = null) {
const payload = { message: text, agent: 'main' }
if (characterId) payload.character_id = characterId
const res = await fetchWithRetry('/api/agent/message', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ message: text, agent: 'main' }),
body: JSON.stringify(payload),
})
if (!res.ok) {
const err = await res.json().catch(() => ({ error: 'Request failed' }))
throw new Error(err.error || `HTTP ${res.status}`)
}
const data = await res.json()
return data.response
return { response: data.response, model: data.model || null }
}
export async function synthesize(text, voice) {
export async function synthesize(text, voice, engine = 'kokoro', model = null) {
const payload = { text, voice, engine }
if (model) payload.model = model
const res = await fetch('/api/tts', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ text, voice }),
body: JSON.stringify(payload),
})
if (!res.ok) throw new Error('TTS failed')
return await res.arrayBuffer()

View File

@@ -30,7 +30,15 @@ export const VOICES = [
{ id: 'bm_lewis', label: 'Lewis (M, UK)' },
]
export const TTS_ENGINES = [
{ id: 'kokoro', label: 'Kokoro (local)' },
{ id: 'chatterbox', label: 'Chatterbox (voice clone)' },
{ id: 'qwen3', label: 'Qwen3 TTS' },
{ id: 'elevenlabs', label: 'ElevenLabs (cloud)' },
]
export const DEFAULT_SETTINGS = {
ttsEngine: 'kokoro',
voice: DEFAULT_VOICE,
autoTts: true,
sttMode: 'bridge',

View File

@@ -0,0 +1,25 @@
export async function listConversations() {
const res = await fetch('/api/conversations')
if (!res.ok) throw new Error(`Failed to list conversations: ${res.status}`)
return res.json()
}
export async function getConversation(id) {
const res = await fetch(`/api/conversations/${encodeURIComponent(id)}`)
if (!res.ok) return null
return res.json()
}
export async function saveConversation(conversation) {
const res = await fetch('/api/conversations', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify(conversation),
})
if (!res.ok) throw new Error(`Failed to save conversation: ${res.status}`)
}
export async function deleteConversation(id) {
const res = await fetch(`/api/conversations/${encodeURIComponent(id)}`, { method: 'DELETE' })
if (!res.ok) throw new Error(`Failed to delete conversation: ${res.status}`)
}

View File

@@ -0,0 +1,45 @@
export async function getPersonalMemories(characterId) {
const res = await fetch(`/api/memories/personal/${encodeURIComponent(characterId)}`)
if (!res.ok) return { characterId, memories: [] }
return res.json()
}
export async function savePersonalMemory(characterId, memory) {
const res = await fetch(`/api/memories/personal/${encodeURIComponent(characterId)}`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify(memory),
})
if (!res.ok) throw new Error(`Failed to save memory: ${res.status}`)
return res.json()
}
export async function deletePersonalMemory(characterId, memoryId) {
const res = await fetch(`/api/memories/personal/${encodeURIComponent(characterId)}/${encodeURIComponent(memoryId)}`, {
method: 'DELETE',
})
if (!res.ok) throw new Error(`Failed to delete memory: ${res.status}`)
}
export async function getGeneralMemories() {
const res = await fetch('/api/memories/general')
if (!res.ok) return { memories: [] }
return res.json()
}
export async function saveGeneralMemory(memory) {
const res = await fetch('/api/memories/general', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify(memory),
})
if (!res.ok) throw new Error(`Failed to save memory: ${res.status}`)
return res.json()
}
export async function deleteGeneralMemory(memoryId) {
const res = await fetch(`/api/memories/general/${encodeURIComponent(memoryId)}`, {
method: 'DELETE',
})
if (!res.ok) throw new Error(`Failed to delete memory: ${res.status}`)
}

View File

@@ -1,23 +1,9 @@
import { useState, useEffect } from 'react';
import { useState, useEffect, useCallback } from 'react';
import { useNavigate } from 'react-router-dom';
import { validateCharacter } from '../lib/SchemaValidator';
const STORAGE_KEY = 'homeai_characters';
const ACTIVE_KEY = 'homeai_active_character';
function loadProfiles() {
try {
const raw = localStorage.getItem(STORAGE_KEY);
return raw ? JSON.parse(raw) : [];
} catch {
return [];
}
}
function saveProfiles(profiles) {
localStorage.setItem(STORAGE_KEY, JSON.stringify(profiles));
}
function getActiveId() {
return localStorage.getItem(ACTIVE_KEY) || null;
}
@@ -27,15 +13,52 @@ function setActiveId(id) {
}
export default function Characters() {
const [profiles, setProfiles] = useState(loadProfiles);
const [profiles, setProfiles] = useState([]);
const [activeId, setActive] = useState(getActiveId);
const [error, setError] = useState(null);
const [dragOver, setDragOver] = useState(false);
const [loading, setLoading] = useState(true);
const [satMap, setSatMap] = useState({ default: '', satellites: {} });
const [newSatId, setNewSatId] = useState('');
const [newSatChar, setNewSatChar] = useState('');
const navigate = useNavigate();
// Load profiles and satellite map on mount
useEffect(() => {
saveProfiles(profiles);
}, [profiles]);
Promise.all([
fetch('/api/characters').then(r => r.json()),
fetch('/api/satellite-map').then(r => r.json()),
])
.then(([chars, map]) => {
setProfiles(chars);
setSatMap(map);
setLoading(false);
})
.catch(err => { setError(`Failed to load: ${err.message}`); setLoading(false); });
}, []);
const saveSatMap = useCallback(async (updated) => {
setSatMap(updated);
await fetch('/api/satellite-map', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify(updated),
});
}, []);
const saveProfile = useCallback(async (profile) => {
const res = await fetch('/api/characters', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify(profile),
});
if (!res.ok) throw new Error('Failed to save profile');
}, []);
const deleteProfile = useCallback(async (id) => {
const safeId = id.replace(/[^a-zA-Z0-9_\-\.]/g, '_');
await fetch(`/api/characters/${safeId}`, { method: 'DELETE' });
}, []);
const handleImport = (e) => {
const files = Array.from(e.target?.files || []);
@@ -47,12 +70,14 @@ export default function Characters() {
files.forEach(file => {
if (!file.name.endsWith('.json')) return;
const reader = new FileReader();
reader.onload = (ev) => {
reader.onload = async (ev) => {
try {
const data = JSON.parse(ev.target.result);
validateCharacter(data);
const id = data.name + '_' + Date.now();
setProfiles(prev => [...prev, { id, data, image: null, addedAt: new Date().toISOString() }]);
const profile = { id, data, image: null, addedAt: new Date().toISOString() };
await saveProfile(profile);
setProfiles(prev => [...prev, profile]);
setError(null);
} catch (err) {
setError(`Import failed for ${file.name}: ${err.message}`);
@@ -73,15 +98,17 @@ export default function Characters() {
const file = e.target.files[0];
if (!file) return;
const reader = new FileReader();
reader.onload = (ev) => {
setProfiles(prev =>
prev.map(p => p.id === profileId ? { ...p, image: ev.target.result } : p)
);
reader.onload = async (ev) => {
const updated = profiles.map(p => p.id === profileId ? { ...p, image: ev.target.result } : p);
const profile = updated.find(p => p.id === profileId);
if (profile) await saveProfile(profile);
setProfiles(updated);
};
reader.readAsDataURL(file);
};
const removeProfile = (id) => {
const removeProfile = async (id) => {
await deleteProfile(id);
setProfiles(prev => prev.filter(p => p.id !== id));
if (activeId === id) {
setActive(null);
@@ -92,6 +119,28 @@ export default function Characters() {
const activateProfile = (id) => {
setActive(id);
setActiveId(id);
// Sync active character's TTS settings to chat settings
const profile = profiles.find(p => p.id === id);
if (profile?.data?.tts) {
const tts = profile.data.tts;
const engine = tts.engine || 'kokoro';
let voice;
if (engine === 'kokoro') voice = tts.kokoro_voice || 'af_heart';
else if (engine === 'elevenlabs') voice = tts.elevenlabs_voice_id || '';
else if (engine === 'chatterbox') voice = tts.voice_ref_path || '';
else voice = '';
try {
const raw = localStorage.getItem('homeai_dashboard_settings');
const settings = raw ? JSON.parse(raw) : {};
localStorage.setItem('homeai_dashboard_settings', JSON.stringify({
...settings,
ttsEngine: engine,
voice: voice,
}));
} catch { /* ignore */ }
}
};
const exportProfile = (profile) => {
@@ -125,13 +174,28 @@ export default function Characters() {
)}
</p>
</div>
<label className="flex items-center gap-2 px-4 py-2 bg-indigo-600 hover:bg-indigo-500 text-white rounded-lg cursor-pointer transition-colors">
<svg className="w-4 h-4" fill="none" viewBox="0 0 24 24" stroke="currentColor" strokeWidth={2}>
<path strokeLinecap="round" strokeLinejoin="round" d="M12 4.5v15m7.5-7.5h-15" />
</svg>
Import JSON
<input type="file" accept=".json" multiple className="hidden" onChange={handleImport} />
</label>
<div className="flex gap-3">
<button
onClick={() => {
sessionStorage.removeItem('edit_character');
sessionStorage.removeItem('edit_character_profile_id');
navigate('/editor');
}}
className="flex items-center gap-2 px-4 py-2 bg-indigo-600 hover:bg-indigo-500 text-white rounded-lg transition-colors"
>
<svg className="w-4 h-4" fill="none" viewBox="0 0 24 24" stroke="currentColor" strokeWidth={2}>
<path strokeLinecap="round" strokeLinejoin="round" d="M12 4.5v15m7.5-7.5h-15" />
</svg>
New Character
</button>
<label className="flex items-center gap-2 px-4 py-2 bg-gray-800 hover:bg-gray-700 text-gray-300 rounded-lg cursor-pointer border border-gray-700 transition-colors">
<svg className="w-4 h-4" fill="none" viewBox="0 0 24 24" stroke="currentColor" strokeWidth={2}>
<path strokeLinecap="round" strokeLinejoin="round" d="M3 16.5v2.25A2.25 2.25 0 005.25 21h13.5A2.25 2.25 0 0021 18.75V16.5m-13.5-9L12 3m0 0l4.5 4.5M12 3v13.5" />
</svg>
Import JSON
<input type="file" accept=".json" multiple className="hidden" onChange={handleImport} />
</label>
</div>
</div>
{error && (
@@ -158,7 +222,11 @@ export default function Characters() {
</div>
{/* Profile grid */}
{profiles.length === 0 ? (
{loading ? (
<div className="text-center py-16">
<p className="text-gray-500">Loading characters...</p>
</div>
) : profiles.length === 0 ? (
<div className="text-center py-16">
<svg className="w-16 h-16 mx-auto text-gray-700 mb-4" fill="none" viewBox="0 0 24 24" stroke="currentColor" strokeWidth={1}>
<path strokeLinecap="round" strokeLinejoin="round" d="M15.75 6a3.75 3.75 0 11-7.5 0 3.75 3.75 0 017.5 0zM4.501 20.118a7.5 7.5 0 0114.998 0A17.933 17.933 0 0112 21.75c-2.676 0-5.216-.584-7.499-1.632z" />
@@ -230,11 +298,32 @@ export default function Characters() {
<span className="px-2 py-0.5 bg-gray-700/70 text-gray-400 text-xs rounded-full">
{char.model_overrides?.primary || 'default'}
</span>
{char.tts?.kokoro_voice && (
{char.tts?.engine === 'kokoro' && char.tts?.kokoro_voice && (
<span className="px-2 py-0.5 bg-gray-700/70 text-gray-400 text-xs rounded-full">
{char.tts.kokoro_voice}
</span>
)}
{char.tts?.engine === 'elevenlabs' && char.tts?.elevenlabs_voice_id && (
<span className="px-2 py-0.5 bg-gray-700/70 text-gray-400 text-xs rounded-full" title={char.tts.elevenlabs_voice_id}>
{char.tts.elevenlabs_voice_name || char.tts.elevenlabs_voice_id.slice(0, 8) + '…'}
</span>
)}
{char.tts?.engine === 'chatterbox' && char.tts?.voice_ref_path && (
<span className="px-2 py-0.5 bg-gray-700/70 text-gray-400 text-xs rounded-full" title={char.tts.voice_ref_path}>
{char.tts.voice_ref_path.split('/').pop()}
</span>
)}
{(() => {
const defaultPreset = char.gaze_presets?.find(gp => gp.trigger === 'self-portrait')?.preset
|| char.gaze_presets?.[0]?.preset
|| char.gaze_preset
|| null;
return defaultPreset ? (
<span className="px-2 py-0.5 bg-violet-500/20 text-violet-300 text-xs rounded-full border border-violet-500/30" title={`GAZE: ${defaultPreset}`}>
{defaultPreset}
</span>
) : null;
})()}
</div>
<div className="flex gap-2 pt-1">
@@ -287,6 +376,96 @@ export default function Characters() {
})}
</div>
)}
{/* Satellite Assignment */}
{!loading && profiles.length > 0 && (
<div className="bg-gray-900 border border-gray-800 rounded-xl p-5 space-y-4">
<div>
<h2 className="text-lg font-semibold text-gray-200">Satellite Routing</h2>
<p className="text-xs text-gray-500 mt-1">Assign characters to voice satellites. Unmapped satellites use the default.</p>
</div>
{/* Default character */}
<div className="flex items-center gap-3">
<label className="text-sm text-gray-400 w-32 shrink-0">Default</label>
<select
value={satMap.default || ''}
onChange={(e) => saveSatMap({ ...satMap, default: e.target.value })}
className="flex-1 bg-gray-800 text-gray-200 text-sm rounded-lg px-3 py-2 border border-gray-700 focus:outline-none focus:border-indigo-500"
>
<option value="">-- None --</option>
{profiles.map(p => (
<option key={p.id} value={p.id}>{p.data.display_name || p.data.name}</option>
))}
</select>
</div>
{/* Per-satellite assignments */}
{Object.entries(satMap.satellites || {}).map(([satId, charId]) => (
<div key={satId} className="flex items-center gap-3">
<span className="text-sm text-gray-300 w-32 shrink-0 truncate font-mono" title={satId}>{satId}</span>
<select
value={charId}
onChange={(e) => {
const updated = { ...satMap, satellites: { ...satMap.satellites, [satId]: e.target.value } };
saveSatMap(updated);
}}
className="flex-1 bg-gray-800 text-gray-200 text-sm rounded-lg px-3 py-2 border border-gray-700 focus:outline-none focus:border-indigo-500"
>
{profiles.map(p => (
<option key={p.id} value={p.id}>{p.data.display_name || p.data.name}</option>
))}
</select>
<button
onClick={() => {
const { [satId]: _, ...rest } = satMap.satellites;
saveSatMap({ ...satMap, satellites: rest });
}}
className="px-2 py-1.5 bg-gray-700 hover:bg-red-600 text-gray-400 hover:text-white rounded-lg transition-colors"
title="Remove"
>
<svg className="w-4 h-4" fill="none" viewBox="0 0 24 24" stroke="currentColor" strokeWidth={2}>
<path strokeLinecap="round" strokeLinejoin="round" d="M6 18L18 6M6 6l12 12" />
</svg>
</button>
</div>
))}
{/* Add new satellite */}
<div className="flex items-center gap-3 pt-2 border-t border-gray-800">
<input
type="text"
value={newSatId}
onChange={(e) => setNewSatId(e.target.value)}
placeholder="Satellite ID (from bridge log)"
className="w-32 shrink-0 bg-gray-800 text-gray-200 text-sm rounded-lg px-3 py-2 border border-gray-700 focus:outline-none focus:border-indigo-500 font-mono"
/>
<select
value={newSatChar}
onChange={(e) => setNewSatChar(e.target.value)}
className="flex-1 bg-gray-800 text-gray-200 text-sm rounded-lg px-3 py-2 border border-gray-700 focus:outline-none focus:border-indigo-500"
>
<option value="">-- Select Character --</option>
{profiles.map(p => (
<option key={p.id} value={p.id}>{p.data.display_name || p.data.name}</option>
))}
</select>
<button
onClick={() => {
if (newSatId && newSatChar) {
saveSatMap({ ...satMap, satellites: { ...satMap.satellites, [newSatId]: newSatChar } });
setNewSatId('');
setNewSatChar('');
}
}}
disabled={!newSatId || !newSatChar}
className="px-3 py-1.5 bg-indigo-600 hover:bg-indigo-500 disabled:bg-gray-700 disabled:text-gray-500 text-white text-sm rounded-lg transition-colors"
>
Add
</button>
</div>
</div>
)}
</div>
);
}

View File

@@ -1,115 +1,146 @@
import { useState, useEffect, useCallback } from 'react'
import { useState, useCallback } from 'react'
import ChatPanel from '../components/ChatPanel'
import InputBar from '../components/InputBar'
import StatusIndicator from '../components/StatusIndicator'
import SettingsDrawer from '../components/SettingsDrawer'
import ConversationList from '../components/ConversationList'
import { useSettings } from '../hooks/useSettings'
import { useBridgeHealth } from '../hooks/useBridgeHealth'
import { useChat } from '../hooks/useChat'
import { useTtsPlayback } from '../hooks/useTtsPlayback'
import { useVoiceInput } from '../hooks/useVoiceInput'
import { useActiveCharacter } from '../hooks/useActiveCharacter'
import { useConversations } from '../hooks/useConversations'
export default function Chat() {
const { settings, updateSetting } = useSettings()
const isOnline = useBridgeHealth()
const { messages, isLoading, send, clearHistory } = useChat()
const { isPlaying, speak, stop } = useTtsPlayback(settings.voice)
const character = useActiveCharacter()
const {
conversations, activeId, isLoading: isLoadingList,
select, create, remove, updateMeta,
} = useConversations()
const convMeta = {
characterId: character?.id || '',
characterName: character?.name || '',
}
const { messages, isLoading, isLoadingConv, send, clearHistory } = useChat(activeId, convMeta, updateMeta)
// Use character's TTS config if available, fall back to global settings
const ttsEngine = character?.tts?.engine || settings.ttsEngine
const ttsVoice = ttsEngine === 'elevenlabs'
? (character?.tts?.elevenlabs_voice_id || settings.voice)
: (character?.tts?.kokoro_voice || settings.voice)
const ttsModel = ttsEngine === 'elevenlabs' ? (character?.tts?.elevenlabs_model || null) : null
const { isPlaying, speak, stop } = useTtsPlayback(ttsVoice, ttsEngine, ttsModel)
const { isRecording, isTranscribing, startRecording, stopRecording } = useVoiceInput(settings.sttMode)
const [settingsOpen, setSettingsOpen] = useState(false)
// Send a message and optionally speak the response
const handleSend = useCallback(async (text) => {
const response = await send(text)
// Auto-create a conversation if none is active
let newId = null
if (!activeId) {
newId = await create(convMeta.characterId, convMeta.characterName)
}
const response = await send(text, newId)
if (response && settings.autoTts) {
speak(response)
}
}, [send, settings.autoTts, speak])
}, [activeId, create, convMeta, send, settings.autoTts, speak])
// Toggle voice recording
const handleVoiceToggle = useCallback(async () => {
if (isRecording) {
const text = await stopRecording()
if (text) {
handleSend(text)
}
if (text) handleSend(text)
} else {
startRecording()
}
}, [isRecording, stopRecording, startRecording, handleSend])
// Space bar push-to-talk when input not focused
useEffect(() => {
const handleKeyDown = (e) => {
if (e.code === 'Space' && e.target.tagName !== 'TEXTAREA' && e.target.tagName !== 'INPUT') {
e.preventDefault()
handleVoiceToggle()
}
}
window.addEventListener('keydown', handleKeyDown)
return () => window.removeEventListener('keydown', handleKeyDown)
}, [handleVoiceToggle])
const handleNewChat = useCallback(() => {
create(convMeta.characterId, convMeta.characterName)
}, [create, convMeta])
return (
<div className="flex-1 flex flex-col min-h-0">
{/* Status bar */}
<header className="flex items-center justify-between px-4 py-2 border-b border-gray-800/50 shrink-0">
<div className="flex items-center gap-2">
<StatusIndicator isOnline={isOnline} />
<span className="text-xs text-gray-500">
{isOnline === null ? 'Connecting...' : isOnline ? 'Connected' : 'Offline'}
</span>
</div>
<div className="flex items-center gap-2">
{messages.length > 0 && (
<button
onClick={clearHistory}
className="text-xs text-gray-500 hover:text-gray-300 transition-colors px-2 py-1"
title="Clear conversation"
>
Clear
</button>
)}
{isPlaying && (
<button
onClick={stop}
className="text-xs text-indigo-400 hover:text-indigo-300 transition-colors px-2 py-1"
title="Stop speaking"
>
Stop audio
</button>
)}
<button
onClick={() => setSettingsOpen(true)}
className="text-gray-500 hover:text-gray-300 transition-colors p-1"
title="Settings"
>
<svg className="w-5 h-5" fill="none" viewBox="0 0 24 24" stroke="currentColor" strokeWidth={1.5}>
<path strokeLinecap="round" strokeLinejoin="round" d="M9.594 3.94c.09-.542.56-.94 1.11-.94h2.593c.55 0 1.02.398 1.11.94l.213 1.281c.063.374.313.686.645.87.074.04.147.083.22.127.325.196.72.257 1.075.124l1.217-.456a1.125 1.125 0 011.37.49l1.296 2.247a1.125 1.125 0 01-.26 1.431l-1.003.827c-.293.241-.438.613-.43.992a7.723 7.723 0 010 .255c-.008.378.137.75.43.991l1.004.827c.424.35.534.955.26 1.43l-1.298 2.247a1.125 1.125 0 01-1.369.491l-1.217-.456c-.355-.133-.75-.072-1.076.124a6.47 6.47 0 01-.22.128c-.331.183-.581.495-.644.869l-.213 1.281c-.09.543-.56.941-1.11.941h-2.594c-.55 0-1.019-.398-1.11-.94l-.213-1.281c-.062-.374-.312-.686-.644-.87a6.52 6.52 0 01-.22-.127c-.325-.196-.72-.257-1.076-.124l-1.217.456a1.125 1.125 0 01-1.369-.49l-1.297-2.247a1.125 1.125 0 01.26-1.431l1.004-.827c.292-.24.437-.613.43-.991a6.932 6.932 0 010-.255c.007-.38-.138-.751-.43-.992l-1.004-.827a1.125 1.125 0 01-.26-1.43l1.297-2.247a1.125 1.125 0 011.37-.491l1.216.456c.356.133.751.072 1.076-.124.072-.044.146-.086.22-.128.332-.183.582-.495.644-.869l.214-1.28z" />
<path strokeLinecap="round" strokeLinejoin="round" d="M15 12a3 3 0 11-6 0 3 3 0 016 0z" />
</svg>
</button>
</div>
</header>
<div className="flex-1 flex min-h-0">
{/* Conversation sidebar */}
<ConversationList
conversations={conversations}
activeId={activeId}
onCreate={handleNewChat}
onSelect={select}
onDelete={remove}
/>
{/* Chat area */}
<ChatPanel messages={messages} isLoading={isLoading} onReplay={speak} />
<div className="flex-1 flex flex-col min-h-0 min-w-0">
{/* Status bar */}
<header className="flex items-center justify-between px-4 py-2 border-b border-gray-800/50 shrink-0">
<div className="flex items-center gap-2">
<StatusIndicator isOnline={isOnline} />
<span className="text-xs text-gray-500">
{isOnline === null ? 'Connecting...' : isOnline ? 'Connected' : 'Offline'}
</span>
</div>
<div className="flex items-center gap-2">
{messages.length > 0 && (
<button
onClick={clearHistory}
className="text-xs text-gray-500 hover:text-gray-300 transition-colors px-2 py-1"
title="Clear conversation"
>
Clear
</button>
)}
{isPlaying && (
<button
onClick={stop}
className="text-xs text-indigo-400 hover:text-indigo-300 transition-colors px-2 py-1"
title="Stop speaking"
>
Stop audio
</button>
)}
<button
onClick={() => setSettingsOpen(true)}
className="text-gray-500 hover:text-gray-300 transition-colors p-1"
title="Settings"
>
<svg className="w-5 h-5" fill="none" viewBox="0 0 24 24" stroke="currentColor" strokeWidth={1.5}>
<path strokeLinecap="round" strokeLinejoin="round" d="M9.594 3.94c.09-.542.56-.94 1.11-.94h2.593c.55 0 1.02.398 1.11.94l.213 1.281c.063.374.313.686.645.87.074.04.147.083.22.127.325.196.72.257 1.075.124l1.217-.456a1.125 1.125 0 011.37.49l1.296 2.247a1.125 1.125 0 01-.26 1.431l-1.003.827c-.293.241-.438.613-.43.992a7.723 7.723 0 010 .255c-.008.378.137.75.43.991l1.004.827c.424.35.534.955.26 1.43l-1.298 2.247a1.125 1.125 0 01-1.369.491l-1.217-.456c-.355-.133-.75-.072-1.076.124a6.47 6.47 0 01-.22.128c-.331.183-.581.495-.644.869l-.213 1.281c-.09.543-.56.941-1.11.941h-2.594c-.55 0-1.019-.398-1.11-.94l-.213-1.281c-.062-.374-.312-.686-.644-.87a6.52 6.52 0 01-.22-.127c-.325-.196-.72-.257-1.076-.124l-1.217.456a1.125 1.125 0 01-1.369-.49l-1.297-2.247a1.125 1.125 0 01.26-1.431l1.004-.827c.292-.24.437-.613.43-.991a6.932 6.932 0 010-.255c.007-.38-.138-.751-.43-.992l-1.004-.827a1.125 1.125 0 01-.26-1.43l1.297-2.247a1.125 1.125 0 011.37-.491l1.216.456c.356.133.751.072 1.076-.124.072-.044.146-.086.22-.128.332-.183.582-.495.644-.869l.214-1.28z" />
<path strokeLinecap="round" strokeLinejoin="round" d="M15 12a3 3 0 11-6 0 3 3 0 016 0z" />
</svg>
</button>
</div>
</header>
{/* Input */}
<InputBar
onSend={handleSend}
onVoiceToggle={handleVoiceToggle}
isLoading={isLoading}
isRecording={isRecording}
isTranscribing={isTranscribing}
/>
{/* Messages */}
<ChatPanel
messages={messages}
isLoading={isLoading || isLoadingConv}
onReplay={speak}
character={character}
/>
{/* Settings drawer */}
<SettingsDrawer
isOpen={settingsOpen}
onClose={() => setSettingsOpen(false)}
settings={settings}
onUpdate={updateSetting}
/>
{/* Input */}
<InputBar
onSend={handleSend}
onVoiceToggle={handleVoiceToggle}
isLoading={isLoading}
isRecording={isRecording}
isTranscribing={isTranscribing}
/>
{/* Settings drawer */}
<SettingsDrawer
isOpen={settingsOpen}
onClose={() => setSettingsOpen(false)}
settings={settings}
onUpdate={updateSetting}
/>
</div>
</div>
)
}

View File

@@ -1,14 +1,18 @@
import React, { useState, useEffect, useRef } from 'react';
import { validateCharacter } from '../lib/SchemaValidator';
import { validateCharacter, migrateV1toV2 } from '../lib/SchemaValidator';
const DEFAULT_CHARACTER = {
schema_version: 1,
name: "aria",
display_name: "Aria",
description: "Default HomeAI assistant persona",
system_prompt: "You are Aria, a warm, curious, and helpful AI assistant living in the home. You speak naturally and conversationally — never robotic. You are knowledgeable but never condescending. You remember the people you live with and build on those memories over time. Keep responses concise when controlling smart home devices; be more expressive in casual conversation. Never break character.",
schema_version: 2,
name: "",
display_name: "",
description: "",
background: "",
dialogue_style: "",
appearance: "",
skills: [],
system_prompt: "",
model_overrides: {
primary: "llama3.3:70b",
primary: "qwen3.5:35b-a3b",
fast: "qwen2.5:7b"
},
tts: {
@@ -16,24 +20,8 @@ const DEFAULT_CHARACTER = {
kokoro_voice: "af_heart",
speed: 1.0
},
live2d_expressions: {
idle: "expr_idle",
listening: "expr_listening",
thinking: "expr_thinking",
speaking: "expr_speaking",
happy: "expr_happy",
sad: "expr_sad",
surprised: "expr_surprised",
error: "expr_error"
},
vtube_ws_triggers: {
thinking: { type: "hotkey", id: "expr_thinking" },
speaking: { type: "hotkey", id: "expr_speaking" },
idle: { type: "hotkey", id: "expr_idle" }
},
custom_rules: [
{ trigger: "good morning", response: "Good morning! How did you sleep?", condition: "time_of_day == morning" }
],
gaze_presets: [],
custom_rules: [],
notes: ""
};
@@ -43,7 +31,12 @@ export default function Editor() {
if (editData) {
sessionStorage.removeItem('edit_character');
try {
return JSON.parse(editData);
const parsed = JSON.parse(editData);
// Auto-migrate v1 data
if (parsed.schema_version === 1 || !parsed.schema_version) {
migrateV1toV2(parsed);
}
return parsed;
} catch {
return DEFAULT_CHARACTER;
}
@@ -52,6 +45,7 @@ export default function Editor() {
});
const [error, setError] = useState(null);
const [saved, setSaved] = useState(false);
const isEditing = !!sessionStorage.getItem('edit_character_profile_id');
// TTS preview state
const [ttsState, setTtsState] = useState('idle');
@@ -65,6 +59,19 @@ export default function Editor() {
const [elevenLabsModels, setElevenLabsModels] = useState([]);
const [isLoadingElevenLabs, setIsLoadingElevenLabs] = useState(false);
// GAZE presets state (from API)
const [availableGazePresets, setAvailableGazePresets] = useState([]);
const [isLoadingGaze, setIsLoadingGaze] = useState(false);
// Character lookup state
const [lookupName, setLookupName] = useState('');
const [lookupFranchise, setLookupFranchise] = useState('');
const [isLookingUp, setIsLookingUp] = useState(false);
const [lookupDone, setLookupDone] = useState(false);
// Skills input state
const [newSkill, setNewSkill] = useState('');
const fetchElevenLabsData = async (key) => {
if (!key) return;
setIsLoadingElevenLabs(true);
@@ -95,6 +102,16 @@ export default function Editor() {
}
}, [character.tts.engine]);
// Fetch GAZE presets on mount
useEffect(() => {
setIsLoadingGaze(true);
fetch('/api/gaze/presets')
.then(r => r.ok ? r.json() : { presets: [] })
.then(data => setAvailableGazePresets(data.presets || []))
.catch(() => {})
.finally(() => setIsLoadingGaze(false));
}, []);
useEffect(() => {
return () => {
if (audioRef.current) { audioRef.current.pause(); audioRef.current = null; }
@@ -119,27 +136,35 @@ export default function Editor() {
}
};
const handleSaveToProfiles = () => {
const handleSaveToProfiles = async () => {
try {
validateCharacter(character);
setError(null);
const profileId = sessionStorage.getItem('edit_character_profile_id');
const storageKey = 'homeai_characters';
const raw = localStorage.getItem(storageKey);
let profiles = raw ? JSON.parse(raw) : [];
let profile;
if (profileId) {
profiles = profiles.map(p =>
p.id === profileId ? { ...p, data: character } : p
);
sessionStorage.removeItem('edit_character_profile_id');
const res = await fetch('/api/characters');
const profiles = await res.json();
const existing = profiles.find(p => p.id === profileId);
profile = existing
? { ...existing, data: character }
: { id: profileId, data: character, image: null, addedAt: new Date().toISOString() };
// Keep the profile ID in sessionStorage so subsequent saves update the same file
} else {
const id = character.name + '_' + Date.now();
profiles.push({ id, data: character, image: null, addedAt: new Date().toISOString() });
profile = { id, data: character, image: null, addedAt: new Date().toISOString() };
// Store the new ID so subsequent saves update the same file
sessionStorage.setItem('edit_character_profile_id', profile.id);
}
localStorage.setItem(storageKey, JSON.stringify(profiles));
await fetch('/api/characters', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify(profile),
});
setSaved(true);
setTimeout(() => setSaved(false), 2000);
} catch (err) {
@@ -164,6 +189,59 @@ export default function Editor() {
reader.readAsText(file);
};
// Character lookup from MCP
const handleCharacterLookup = async () => {
if (!lookupName || !lookupFranchise) return;
setIsLookingUp(true);
setError(null);
try {
const res = await fetch('/api/character-lookup', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ name: lookupName, franchise: lookupFranchise }),
});
if (!res.ok) {
const err = await res.json().catch(() => ({ error: 'Lookup failed' }));
throw new Error(err.error || `Lookup returned ${res.status}`);
}
const data = await res.json();
// Build dialogue_style from personality + notable quotes
let dialogueStyle = data.personality || '';
if (data.notable_quotes?.length) {
dialogueStyle += '\n\nExample dialogue:\n' + data.notable_quotes.map(q => `"${q}"`).join('\n');
}
// Filter abilities to clean text-only entries (skip image captions)
const skills = (data.abilities || [])
.filter(a => a.length > 20 && !a.includes('.jpg') && !a.includes('.png'))
.slice(0, 10);
// Auto-generate system prompt
const promptName = character.display_name || lookupName;
const personality = data.personality ? data.personality.split('.').slice(0, 3).join('.') + '.' : '';
const systemPrompt = `You are ${promptName} from ${lookupFranchise}. ${personality} Stay in character at all times. Respond naturally and conversationally.`;
setCharacter(prev => ({
...prev,
name: prev.name || lookupName.toLowerCase().replace(/\s+/g, '_'),
display_name: prev.display_name || lookupName,
description: data.description ? data.description.split('.').slice(0, 2).join('.') + '.' : prev.description,
background: data.background || prev.background,
appearance: data.appearance || prev.appearance,
dialogue_style: dialogueStyle || prev.dialogue_style,
skills: skills.length > 0 ? skills : prev.skills,
system_prompt: prev.system_prompt || systemPrompt,
}));
setLookupDone(true);
} catch (err) {
setError(`Character lookup failed: ${err.message}`);
} finally {
setIsLookingUp(false);
}
};
const handleChange = (field, value) => {
setCharacter(prev => ({ ...prev, [field]: value }));
};
@@ -175,6 +253,50 @@ export default function Editor() {
}));
};
// Skills helpers
const addSkill = () => {
const trimmed = newSkill.trim();
if (!trimmed) return;
setCharacter(prev => ({
...prev,
skills: [...(prev.skills || []), trimmed]
}));
setNewSkill('');
};
const removeSkill = (index) => {
setCharacter(prev => {
const updated = [...(prev.skills || [])];
updated.splice(index, 1);
return { ...prev, skills: updated };
});
};
// GAZE preset helpers
const addGazePreset = () => {
setCharacter(prev => ({
...prev,
gaze_presets: [...(prev.gaze_presets || []), { preset: '', trigger: 'self-portrait' }]
}));
};
const removeGazePreset = (index) => {
setCharacter(prev => {
const updated = [...(prev.gaze_presets || [])];
updated.splice(index, 1);
return { ...prev, gaze_presets: updated };
});
};
const handleGazePresetChange = (index, field, value) => {
setCharacter(prev => {
const updated = [...(prev.gaze_presets || [])];
updated[index] = { ...updated[index], [field]: value };
return { ...prev, gaze_presets: updated };
});
};
// Custom rules helpers
const handleRuleChange = (index, field, value) => {
setCharacter(prev => {
const newRules = [...(prev.custom_rules || [])];
@@ -198,37 +320,40 @@ export default function Editor() {
});
};
// TTS preview
const stopPreview = () => {
if (audioRef.current) {
audioRef.current.pause();
audioRef.current = null;
}
if (objectUrlRef.current) {
URL.revokeObjectURL(objectUrlRef.current);
objectUrlRef.current = null;
}
if (audioRef.current) { audioRef.current.pause(); audioRef.current = null; }
if (objectUrlRef.current) { URL.revokeObjectURL(objectUrlRef.current); objectUrlRef.current = null; }
window.speechSynthesis.cancel();
setTtsState('idle');
};
const previewTTS = async () => {
stopPreview();
const text = previewText || `Hi, I am ${character.display_name}. This is a preview of my voice.`;
const text = previewText || `Hi, I am ${character.display_name || character.name}. This is a preview of my voice.`;
const engine = character.tts.engine;
if (character.tts.engine === 'kokoro') {
let bridgeBody = null;
if (engine === 'kokoro') {
bridgeBody = { text, voice: character.tts.kokoro_voice, engine: 'kokoro' };
} else if (engine === 'elevenlabs' && character.tts.elevenlabs_voice_id) {
bridgeBody = { text, voice: character.tts.elevenlabs_voice_id, engine: 'elevenlabs', model: character.tts.elevenlabs_model };
}
if (bridgeBody) {
setTtsState('loading');
let blob;
try {
const response = await fetch('/api/tts', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ text, voice: character.tts.kokoro_voice })
body: JSON.stringify(bridgeBody)
});
if (!response.ok) throw new Error('TTS bridge returned ' + response.status);
blob = await response.blob();
} catch (err) {
setTtsState('idle');
setError(`Kokoro preview failed: ${err.message}. Falling back to browser TTS.`);
setError(`${engine} preview failed: ${err.message}. Falling back to browser TTS.`);
runBrowserTTS(text);
return;
}
@@ -269,7 +394,9 @@ export default function Editor() {
<div>
<h1 className="text-3xl font-bold text-gray-100">Character Editor</h1>
<p className="text-sm text-gray-500 mt-1">
Editing: {character.display_name || character.name}
{character.display_name || character.name
? `Editing: ${character.display_name || character.name}`
: 'New character'}
</p>
</div>
<div className="flex gap-3">
@@ -311,6 +438,64 @@ export default function Editor() {
{error && (
<div className="bg-red-900/30 border border-red-500/50 text-red-300 px-4 py-3 rounded-lg text-sm">
{error}
<button onClick={() => setError(null)} className="ml-2 text-red-400 hover:text-red-300">&times;</button>
</div>
)}
{/* Character Lookup — auto-fill from fictional character wiki */}
{!isEditing && (
<div className={cardClass}>
<div className="flex items-center gap-2">
<svg className="w-5 h-5 text-indigo-400" fill="none" viewBox="0 0 24 24" stroke="currentColor" strokeWidth={2}>
<path strokeLinecap="round" strokeLinejoin="round" d="M21 21l-5.197-5.197m0 0A7.5 7.5 0 105.196 5.196a7.5 7.5 0 0010.607 10.607z" />
</svg>
<h2 className="text-lg font-semibold text-gray-200">Auto-fill from Character</h2>
</div>
<p className="text-xs text-gray-500">Fetch character data from Fandom/Wikipedia to auto-populate fields. You can edit everything after.</p>
<div className="flex gap-3 items-end">
<div className="flex-1">
<label className={labelClass}>Character Name</label>
<input
type="text"
className={inputClass}
value={lookupName}
onChange={(e) => setLookupName(e.target.value)}
placeholder="e.g. Tifa Lockhart"
/>
</div>
<div className="flex-1">
<label className={labelClass}>Franchise / Series</label>
<input
type="text"
className={inputClass}
value={lookupFranchise}
onChange={(e) => setLookupFranchise(e.target.value)}
placeholder="e.g. Final Fantasy VII"
/>
</div>
<button
onClick={handleCharacterLookup}
disabled={isLookingUp || !lookupName || !lookupFranchise}
className={`flex items-center gap-2 px-5 py-2 rounded-lg text-white transition-colors whitespace-nowrap ${
isLookingUp
? 'bg-indigo-800 cursor-wait'
: lookupDone
? 'bg-emerald-600 hover:bg-emerald-500'
: 'bg-indigo-600 hover:bg-indigo-500 disabled:bg-gray-700 disabled:text-gray-500'
}`}
>
{isLookingUp && (
<svg className="w-4 h-4 animate-spin" viewBox="0 0 24 24" fill="none">
<circle className="opacity-25" cx="12" cy="12" r="10" stroke="currentColor" strokeWidth="4" />
<path className="opacity-75" fill="currentColor" d="M4 12a8 8 0 018-8V0C5.373 0 0 5.373 0 12h4z" />
</svg>
)}
{isLookingUp ? 'Fetching...' : lookupDone ? 'Fetched' : 'Lookup'}
</button>
</div>
{lookupDone && (
<p className="text-xs text-emerald-400">Fields populated from wiki data. Review and edit below.</p>
)}
</div>
)}
@@ -324,11 +509,11 @@ export default function Editor() {
</div>
<div>
<label className={labelClass}>Display Name</label>
<input type="text" className={inputClass} value={character.display_name} onChange={(e) => handleChange('display_name', e.target.value)} />
<input type="text" className={inputClass} value={character.display_name || ''} onChange={(e) => handleChange('display_name', e.target.value)} />
</div>
<div>
<label className={labelClass}>Description</label>
<input type="text" className={inputClass} value={character.description} onChange={(e) => handleChange('description', e.target.value)} />
<input type="text" className={inputClass} value={character.description || ''} onChange={(e) => handleChange('description', e.target.value)} />
</div>
</div>
@@ -359,7 +544,14 @@ export default function Editor() {
<div>
<label className={labelClass}>Voice ID</label>
{elevenLabsVoices.length > 0 ? (
<select className={selectClass} value={character.tts.elevenlabs_voice_id || ''} onChange={(e) => handleNestedChange('tts', 'elevenlabs_voice_id', e.target.value)}>
<select className={selectClass} value={character.tts.elevenlabs_voice_id || ''} onChange={(e) => {
const voiceId = e.target.value;
const voice = elevenLabsVoices.find(v => v.voice_id === voiceId);
setCharacter(prev => ({
...prev,
tts: { ...prev.tts, elevenlabs_voice_id: voiceId, elevenlabs_voice_name: voice?.name || '' }
}));
}}>
<option value="">-- Select Voice --</option>
{elevenLabsVoices.map(v => (
<option key={v.voice_id} value={v.voice_id}>{v.name} ({v.category})</option>
@@ -439,7 +631,7 @@ export default function Editor() {
className={inputClass}
value={previewText}
onChange={(e) => setPreviewText(e.target.value)}
placeholder={`Hi, I am ${character.display_name}. This is a preview of my voice.`}
placeholder={`Hi, I am ${character.display_name || character.name || 'your character'}. This is a preview of my voice.`}
/>
</div>
<div className="flex gap-2">
@@ -474,7 +666,9 @@ export default function Editor() {
<p className="text-xs text-gray-600">
{character.tts.engine === 'kokoro'
? 'Previews via local Kokoro TTS bridge (port 8081).'
: 'Uses browser TTS for preview. Local TTS available with Kokoro engine.'}
: character.tts.engine === 'elevenlabs'
? 'Previews via ElevenLabs through bridge.'
: 'Uses browser TTS for preview. Local TTS available with Kokoro engine.'}
</p>
</div>
</div>
@@ -483,25 +677,154 @@ export default function Editor() {
<div className={cardClass}>
<div className="flex justify-between items-center">
<h2 className="text-lg font-semibold text-gray-200">System Prompt</h2>
<span className="text-xs text-gray-600">{character.system_prompt.length} chars</span>
<span className="text-xs text-gray-600">{(character.system_prompt || '').length} chars</span>
</div>
<textarea
className={inputClass + " h-32 resize-y"}
value={character.system_prompt}
onChange={(e) => handleChange('system_prompt', e.target.value)}
placeholder="You are [character name]. Describe their personality, behaviour, and role..."
/>
</div>
{/* Character Profile — new v2 fields */}
<div className={cardClass}>
<h2 className="text-lg font-semibold text-gray-200">Character Profile</h2>
<div>
<label className={labelClass}>Background / Backstory</label>
<textarea
className={inputClass + " h-28 resize-y text-sm"}
value={character.background || ''}
onChange={(e) => handleChange('background', e.target.value)}
placeholder="Character history, origins, key life events..."
/>
</div>
<div>
<label className={labelClass}>Appearance</label>
<textarea
className={inputClass + " h-24 resize-y text-sm"}
value={character.appearance || ''}
onChange={(e) => handleChange('appearance', e.target.value)}
placeholder="Physical description — also used for image generation prompts..."
/>
</div>
<div>
<label className={labelClass}>Dialogue Style & Examples</label>
<textarea
className={inputClass + " h-24 resize-y text-sm"}
value={character.dialogue_style || ''}
onChange={(e) => handleChange('dialogue_style', e.target.value)}
placeholder="How the persona speaks, their tone, mannerisms, and example lines..."
/>
</div>
<div>
<label className={labelClass}>Skills & Interests</label>
<div className="flex flex-wrap gap-2 mb-2">
{(character.skills || []).map((skill, idx) => (
<span
key={idx}
className="inline-flex items-center gap-1 px-3 py-1 bg-indigo-500/20 text-indigo-300 text-sm rounded-full border border-indigo-500/30"
>
{skill.length > 80 ? skill.slice(0, 80) + '...' : skill}
<button
onClick={() => removeSkill(idx)}
className="ml-1 text-indigo-400 hover:text-red-400 transition-colors"
>
<svg className="w-3 h-3" fill="none" viewBox="0 0 24 24" stroke="currentColor" strokeWidth={3}>
<path strokeLinecap="round" strokeLinejoin="round" d="M6 18L18 6M6 6l12 12" />
</svg>
</button>
</span>
))}
</div>
<div className="flex gap-2">
<input
type="text"
className={inputClass + " text-sm"}
value={newSkill}
onChange={(e) => setNewSkill(e.target.value)}
onKeyDown={(e) => { if (e.key === 'Enter') { e.preventDefault(); addSkill(); } }}
placeholder="Add a skill or interest..."
/>
<button
onClick={addSkill}
disabled={!newSkill.trim()}
className="px-3 py-2 bg-indigo-600 hover:bg-indigo-500 disabled:bg-gray-700 disabled:text-gray-500 text-white text-sm rounded-lg transition-colors whitespace-nowrap"
>
Add
</button>
</div>
</div>
</div>
<div className="grid grid-cols-1 md:grid-cols-2 gap-6">
{/* Live2D Expressions */}
{/* Image Generation — GAZE presets */}
<div className={cardClass}>
<h2 className="text-lg font-semibold text-gray-200">Live2D Expressions</h2>
{Object.entries(character.live2d_expressions).map(([key, val]) => (
<div key={key} className="flex justify-between items-center gap-4">
<label className="text-sm font-medium text-gray-400 w-1/3 capitalize">{key}</label>
<input type="text" className={inputClass + " w-2/3"} value={val} onChange={(e) => handleNestedChange('live2d_expressions', key, e.target.value)} />
<div className="flex justify-between items-center">
<h2 className="text-lg font-semibold text-gray-200">GAZE Presets</h2>
<button onClick={addGazePreset} className="flex items-center gap-1 bg-indigo-600 hover:bg-indigo-500 text-white px-3 py-1.5 rounded-lg text-sm transition-colors">
<svg className="w-4 h-4" fill="none" viewBox="0 0 24 24" stroke="currentColor" strokeWidth={2}>
<path strokeLinecap="round" strokeLinejoin="round" d="M12 4.5v15m7.5-7.5h-15" />
</svg>
Add Preset
</button>
</div>
<p className="text-xs text-gray-500">Image generation presets with trigger conditions. Default trigger is "self-portrait".</p>
{(!character.gaze_presets || character.gaze_presets.length === 0) ? (
<p className="text-sm text-gray-600 italic">No GAZE presets configured.</p>
) : (
<div className="space-y-3">
{character.gaze_presets.map((gp, idx) => (
<div key={idx} className="flex items-center gap-2 border border-gray-700 p-3 rounded-lg bg-gray-800/50">
<div className="flex-1">
<label className="block text-xs text-gray-500 mb-1">Preset</label>
{isLoadingGaze ? (
<p className="text-sm text-gray-500">Loading...</p>
) : availableGazePresets.length > 0 ? (
<select
className={selectClass + " text-sm"}
value={gp.preset || ''}
onChange={(e) => handleGazePresetChange(idx, 'preset', e.target.value)}
>
<option value="">-- Select --</option>
{availableGazePresets.map(p => (
<option key={p.slug} value={p.slug}>{p.name} ({p.slug})</option>
))}
</select>
) : (
<input
type="text"
className={inputClass + " text-sm"}
value={gp.preset || ''}
onChange={(e) => handleGazePresetChange(idx, 'preset', e.target.value)}
placeholder="Preset slug"
/>
)}
</div>
<div className="flex-1">
<label className="block text-xs text-gray-500 mb-1">Trigger</label>
<input
type="text"
className={inputClass + " text-sm"}
value={gp.trigger || ''}
onChange={(e) => handleGazePresetChange(idx, 'trigger', e.target.value)}
placeholder="e.g. self-portrait, battle scene"
/>
</div>
<button
onClick={() => removeGazePreset(idx)}
className="mt-5 px-2 py-1.5 text-gray-500 hover:text-red-400 transition-colors"
title="Remove"
>
<svg className="w-4 h-4" fill="none" viewBox="0 0 24 24" stroke="currentColor" strokeWidth={2}>
<path strokeLinecap="round" strokeLinejoin="round" d="M6 18L18 6M6 6l12 12" />
</svg>
</button>
</div>
))}
</div>
))}
)}
</div>
{/* Model Overrides */}
@@ -509,7 +832,7 @@ export default function Editor() {
<h2 className="text-lg font-semibold text-gray-200">Model Overrides</h2>
<div>
<label className={labelClass}>Primary Model</label>
<select className={selectClass} value={character.model_overrides?.primary || 'llama3.3:70b'} onChange={(e) => handleNestedChange('model_overrides', 'primary', e.target.value)}>
<select className={selectClass} value={character.model_overrides?.primary || 'qwen3.5:35b-a3b'} onChange={(e) => handleNestedChange('model_overrides', 'primary', e.target.value)}>
<option value="llama3.3:70b">llama3.3:70b</option>
<option value="qwen3.5:35b-a3b">qwen3.5:35b-a3b</option>
<option value="qwen2.5:7b">qwen2.5:7b</option>
@@ -576,6 +899,17 @@ export default function Editor() {
</div>
)}
</div>
{/* Notes */}
<div className={cardClass}>
<h2 className="text-lg font-semibold text-gray-200">Notes</h2>
<textarea
className={inputClass + " h-20 resize-y text-sm"}
value={character.notes || ''}
onChange={(e) => handleChange('notes', e.target.value)}
placeholder="Internal notes, reminders, or references..."
/>
</div>
</div>
);
}

View File

@@ -0,0 +1,346 @@
import { useState, useEffect, useCallback } from 'react';
import {
getPersonalMemories, savePersonalMemory, deletePersonalMemory,
getGeneralMemories, saveGeneralMemory, deleteGeneralMemory,
} from '../lib/memoryApi';
const PERSONAL_CATEGORIES = [
{ value: 'personal_info', label: 'Personal Info', color: 'bg-blue-500/20 text-blue-300 border-blue-500/30' },
{ value: 'preference', label: 'Preference', color: 'bg-amber-500/20 text-amber-300 border-amber-500/30' },
{ value: 'interaction', label: 'Interaction', color: 'bg-emerald-500/20 text-emerald-300 border-emerald-500/30' },
{ value: 'emotional', label: 'Emotional', color: 'bg-pink-500/20 text-pink-300 border-pink-500/30' },
{ value: 'other', label: 'Other', color: 'bg-gray-500/20 text-gray-300 border-gray-500/30' },
];
const GENERAL_CATEGORIES = [
{ value: 'system', label: 'System', color: 'bg-indigo-500/20 text-indigo-300 border-indigo-500/30' },
{ value: 'tool_usage', label: 'Tool Usage', color: 'bg-cyan-500/20 text-cyan-300 border-cyan-500/30' },
{ value: 'home_layout', label: 'Home Layout', color: 'bg-emerald-500/20 text-emerald-300 border-emerald-500/30' },
{ value: 'device', label: 'Device', color: 'bg-amber-500/20 text-amber-300 border-amber-500/30' },
{ value: 'routine', label: 'Routine', color: 'bg-purple-500/20 text-purple-300 border-purple-500/30' },
{ value: 'other', label: 'Other', color: 'bg-gray-500/20 text-gray-300 border-gray-500/30' },
];
const ACTIVE_KEY = 'homeai_active_character';
function CategoryBadge({ category, categories }) {
const cat = categories.find(c => c.value === category) || categories[categories.length - 1];
return (
<span className={`px-2 py-0.5 text-xs rounded-full border ${cat.color}`}>
{cat.label}
</span>
);
}
function MemoryCard({ memory, categories, onEdit, onDelete }) {
return (
<div className="border border-gray-700 rounded-lg p-4 bg-gray-800/50 space-y-2">
<div className="flex items-start justify-between gap-3">
<p className="text-sm text-gray-200 flex-1 whitespace-pre-wrap">{memory.content}</p>
<div className="flex gap-1 shrink-0">
<button
onClick={() => onEdit(memory)}
className="p-1.5 text-gray-500 hover:text-gray-300 transition-colors"
title="Edit"
>
<svg className="w-4 h-4" fill="none" viewBox="0 0 24 24" stroke="currentColor" strokeWidth={2}>
<path strokeLinecap="round" strokeLinejoin="round" d="M16.862 4.487l1.687-1.688a1.875 1.875 0 112.652 2.652L10.582 16.07a4.5 4.5 0 01-1.897 1.13L6 18l.8-2.685a4.5 4.5 0 011.13-1.897l8.932-8.931z" />
</svg>
</button>
<button
onClick={() => onDelete(memory.id)}
className="p-1.5 text-gray-500 hover:text-red-400 transition-colors"
title="Delete"
>
<svg className="w-4 h-4" fill="none" viewBox="0 0 24 24" stroke="currentColor" strokeWidth={2}>
<path strokeLinecap="round" strokeLinejoin="round" d="M14.74 9l-.346 9m-4.788 0L9.26 9m9.968-3.21c.342.052.682.107 1.022.166m-1.022-.165L18.16 19.673a2.25 2.25 0 01-2.244 2.077H8.084a2.25 2.25 0 01-2.244-2.077L4.772 5.79m14.456 0a48.108 48.108 0 00-3.478-.397m-12 .562c.34-.059.68-.114 1.022-.165m0 0a48.11 48.11 0 013.478-.397m7.5 0v-.916c0-1.18-.91-2.164-2.09-2.201a51.964 51.964 0 00-3.32 0c-1.18.037-2.09 1.022-2.09 2.201v.916m7.5 0a48.667 48.667 0 00-7.5 0" />
</svg>
</button>
</div>
</div>
<div className="flex items-center gap-2">
<CategoryBadge category={memory.category} categories={categories} />
<span className="text-xs text-gray-600">
{memory.createdAt ? new Date(memory.createdAt).toLocaleDateString() : ''}
</span>
</div>
</div>
);
}
function MemoryForm({ categories, editing, onSave, onCancel }) {
const [content, setContent] = useState(editing?.content || '');
const [category, setCategory] = useState(editing?.category || categories[0].value);
const handleSubmit = () => {
if (!content.trim()) return;
const memory = {
...(editing?.id ? { id: editing.id } : {}),
content: content.trim(),
category,
};
onSave(memory);
setContent('');
setCategory(categories[0].value);
};
return (
<div className="border border-indigo-500/30 rounded-lg p-4 bg-indigo-500/5 space-y-3">
<textarea
className="w-full bg-gray-800 border border-gray-700 text-gray-200 p-2 rounded-lg text-sm h-20 resize-y focus:border-indigo-500 focus:ring-1 focus:ring-indigo-500 outline-none"
value={content}
onChange={(e) => setContent(e.target.value)}
placeholder="Enter memory content..."
autoFocus
/>
<div className="flex items-center gap-3">
<select
className="bg-gray-800 border border-gray-700 text-gray-200 text-sm p-2 rounded-lg focus:border-indigo-500 outline-none"
value={category}
onChange={(e) => setCategory(e.target.value)}
>
{categories.map(c => (
<option key={c.value} value={c.value}>{c.label}</option>
))}
</select>
<div className="flex gap-2 ml-auto">
<button
onClick={onCancel}
className="px-3 py-1.5 bg-gray-700 hover:bg-gray-600 text-gray-300 text-sm rounded-lg transition-colors"
>
Cancel
</button>
<button
onClick={handleSubmit}
disabled={!content.trim()}
className="px-3 py-1.5 bg-indigo-600 hover:bg-indigo-500 disabled:bg-gray-700 disabled:text-gray-500 text-white text-sm rounded-lg transition-colors"
>
{editing?.id ? 'Update' : 'Add Memory'}
</button>
</div>
</div>
</div>
);
}
export default function Memories() {
const [tab, setTab] = useState('personal'); // 'personal' | 'general'
const [characters, setCharacters] = useState([]);
const [selectedCharId, setSelectedCharId] = useState('');
const [memories, setMemories] = useState([]);
const [loading, setLoading] = useState(false);
const [showForm, setShowForm] = useState(false);
const [editing, setEditing] = useState(null);
const [error, setError] = useState(null);
const [filter, setFilter] = useState('');
// Load characters list
useEffect(() => {
fetch('/api/characters')
.then(r => r.json())
.then(chars => {
setCharacters(chars);
const activeId = localStorage.getItem(ACTIVE_KEY);
if (activeId && chars.some(c => c.id === activeId)) {
setSelectedCharId(activeId);
} else if (chars.length > 0) {
setSelectedCharId(chars[0].id);
}
})
.catch(() => {});
}, []);
// Load memories when tab or selected character changes
const loadMemories = useCallback(async () => {
setLoading(true);
setError(null);
try {
if (tab === 'personal' && selectedCharId) {
const data = await getPersonalMemories(selectedCharId);
setMemories(data.memories || []);
} else if (tab === 'general') {
const data = await getGeneralMemories();
setMemories(data.memories || []);
} else {
setMemories([]);
}
} catch (err) {
setError(err.message);
} finally {
setLoading(false);
}
}, [tab, selectedCharId]);
useEffect(() => { loadMemories(); }, [loadMemories]);
const handleSave = async (memory) => {
try {
if (tab === 'personal') {
await savePersonalMemory(selectedCharId, memory);
} else {
await saveGeneralMemory(memory);
}
setShowForm(false);
setEditing(null);
await loadMemories();
} catch (err) {
setError(err.message);
}
};
const handleDelete = async (memoryId) => {
try {
if (tab === 'personal') {
await deletePersonalMemory(selectedCharId, memoryId);
} else {
await deleteGeneralMemory(memoryId);
}
await loadMemories();
} catch (err) {
setError(err.message);
}
};
const handleEdit = (memory) => {
setEditing(memory);
setShowForm(true);
};
const categories = tab === 'personal' ? PERSONAL_CATEGORIES : GENERAL_CATEGORIES;
const filteredMemories = filter
? memories.filter(m => m.content?.toLowerCase().includes(filter.toLowerCase()) || m.category === filter)
: memories;
// Sort newest first
const sortedMemories = [...filteredMemories].sort(
(a, b) => (b.createdAt || '').localeCompare(a.createdAt || '')
);
const selectedChar = characters.find(c => c.id === selectedCharId);
return (
<div className="space-y-6">
{/* Header */}
<div className="flex items-center justify-between">
<div>
<h1 className="text-3xl font-bold text-gray-100">Memories</h1>
<p className="text-sm text-gray-500 mt-1">
{sortedMemories.length} {tab} memor{sortedMemories.length !== 1 ? 'ies' : 'y'}
{tab === 'personal' && selectedChar && (
<span className="ml-1 text-indigo-400">
for {selectedChar.data?.display_name || selectedChar.data?.name || selectedCharId}
</span>
)}
</p>
</div>
<button
onClick={() => { setEditing(null); setShowForm(!showForm); }}
className="flex items-center gap-2 px-4 py-2 bg-indigo-600 hover:bg-indigo-500 text-white rounded-lg transition-colors"
>
<svg className="w-4 h-4" fill="none" viewBox="0 0 24 24" stroke="currentColor" strokeWidth={2}>
<path strokeLinecap="round" strokeLinejoin="round" d="M12 4.5v15m7.5-7.5h-15" />
</svg>
Add Memory
</button>
</div>
{error && (
<div className="bg-red-900/30 border border-red-500/50 text-red-300 px-4 py-3 rounded-lg text-sm">
{error}
<button onClick={() => setError(null)} className="ml-2 text-red-400 hover:text-red-300">&times;</button>
</div>
)}
{/* Tabs */}
<div className="flex gap-1 bg-gray-900 p-1 rounded-lg border border-gray-800 w-fit">
<button
onClick={() => { setTab('personal'); setShowForm(false); setEditing(null); }}
className={`px-4 py-2 text-sm font-medium rounded-md transition-colors ${
tab === 'personal'
? 'bg-gray-800 text-white'
: 'text-gray-400 hover:text-gray-200'
}`}
>
Personal
</button>
<button
onClick={() => { setTab('general'); setShowForm(false); setEditing(null); }}
className={`px-4 py-2 text-sm font-medium rounded-md transition-colors ${
tab === 'general'
? 'bg-gray-800 text-white'
: 'text-gray-400 hover:text-gray-200'
}`}
>
General
</button>
</div>
{/* Character selector (personal tab only) */}
{tab === 'personal' && (
<div className="flex items-center gap-3">
<label className="text-sm text-gray-400">Character</label>
<select
value={selectedCharId}
onChange={(e) => setSelectedCharId(e.target.value)}
className="bg-gray-800 border border-gray-700 text-gray-200 text-sm p-2 rounded-lg focus:border-indigo-500 outline-none"
>
{characters.map(c => (
<option key={c.id} value={c.id}>
{c.data?.display_name || c.data?.name || c.id}
</option>
))}
</select>
</div>
)}
{/* Search filter */}
<div>
<input
type="text"
className="w-full bg-gray-800 border border-gray-700 text-gray-200 p-2 rounded-lg text-sm focus:border-indigo-500 focus:ring-1 focus:ring-indigo-500 outline-none"
value={filter}
onChange={(e) => setFilter(e.target.value)}
placeholder="Search memories..."
/>
</div>
{/* Add/Edit form */}
{showForm && (
<MemoryForm
categories={categories}
editing={editing}
onSave={handleSave}
onCancel={() => { setShowForm(false); setEditing(null); }}
/>
)}
{/* Memory list */}
{loading ? (
<div className="text-center py-12">
<p className="text-gray-500">Loading memories...</p>
</div>
) : sortedMemories.length === 0 ? (
<div className="text-center py-12">
<svg className="w-12 h-12 mx-auto text-gray-700 mb-3" fill="none" viewBox="0 0 24 24" stroke="currentColor" strokeWidth={1}>
<path strokeLinecap="round" strokeLinejoin="round" d="M12 18v-5.25m0 0a6.01 6.01 0 001.5-.189m-1.5.189a6.01 6.01 0 01-1.5-.189m3.75 7.478a12.06 12.06 0 01-4.5 0m3.75 2.383a14.406 14.406 0 01-3 0M14.25 18v-.192c0-.983.658-1.823 1.508-2.316a7.5 7.5 0 10-7.517 0c.85.493 1.509 1.333 1.509 2.316V18" />
</svg>
<p className="text-gray-500 text-sm">
{filter ? 'No memories match your search.' : 'No memories yet. Add one to get started.'}
</p>
</div>
) : (
<div className="space-y-3">
{sortedMemories.map(memory => (
<MemoryCard
key={memory.id}
memory={memory}
categories={categories}
onEdit={handleEdit}
onDelete={handleDelete}
/>
))}
</div>
)}
</div>
);
}

View File

@@ -2,6 +2,268 @@ import { defineConfig } from 'vite'
import react from '@vitejs/plugin-react'
import tailwindcss from '@tailwindcss/vite'
const CHARACTERS_DIR = '/Users/aodhan/homeai-data/characters'
const SATELLITE_MAP_PATH = '/Users/aodhan/homeai-data/satellite-map.json'
const CONVERSATIONS_DIR = '/Users/aodhan/homeai-data/conversations'
const MEMORIES_DIR = '/Users/aodhan/homeai-data/memories'
const MODE_PATH = '/Users/aodhan/homeai-data/active-mode.json'
const GAZE_HOST = 'http://10.0.0.101:5782'
const GAZE_API_KEY = process.env.GAZE_API_KEY || ''
function characterStoragePlugin() {
return {
name: 'character-storage',
configureServer(server) {
const ensureDir = async () => {
const { mkdir } = await import('fs/promises')
await mkdir(CHARACTERS_DIR, { recursive: true })
}
// GET /api/characters — list all profiles
server.middlewares.use('/api/characters', async (req, res, next) => {
if (req.method === 'OPTIONS') {
res.writeHead(204, { 'Access-Control-Allow-Origin': '*', 'Access-Control-Allow-Methods': 'GET,POST,DELETE', 'Access-Control-Allow-Headers': 'Content-Type' })
res.end()
return
}
const { readdir, readFile, writeFile, unlink } = await import('fs/promises')
await ensureDir()
// req.url has the mount prefix stripped by connect, so "/" means /api/characters
const url = new URL(req.url, 'http://localhost')
const subPath = url.pathname.replace(/^\/+/, '')
// GET /api/characters/:id — single profile
if (req.method === 'GET' && subPath) {
try {
const safeId = subPath.replace(/[^a-zA-Z0-9_\-\.]/g, '_')
const raw = await readFile(`${CHARACTERS_DIR}/${safeId}.json`, 'utf-8')
res.writeHead(200, { 'Content-Type': 'application/json', 'Access-Control-Allow-Origin': '*' })
res.end(raw)
} catch {
res.writeHead(404, { 'Content-Type': 'application/json', 'Access-Control-Allow-Origin': '*' })
res.end(JSON.stringify({ error: 'Not found' }))
}
return
}
if (req.method === 'GET' && !subPath) {
try {
const files = (await readdir(CHARACTERS_DIR)).filter(f => f.endsWith('.json'))
const profiles = []
for (const file of files) {
try {
const raw = await readFile(`${CHARACTERS_DIR}/${file}`, 'utf-8')
profiles.push(JSON.parse(raw))
} catch { /* skip corrupt files */ }
}
// Sort by addedAt descending
profiles.sort((a, b) => (b.addedAt || '').localeCompare(a.addedAt || ''))
res.writeHead(200, { 'Content-Type': 'application/json', 'Access-Control-Allow-Origin': '*' })
res.end(JSON.stringify(profiles))
} catch (err) {
res.writeHead(500, { 'Content-Type': 'application/json' })
res.end(JSON.stringify({ error: err.message }))
}
return
}
if (req.method === 'POST' && !subPath) {
try {
const chunks = []
for await (const chunk of req) chunks.push(chunk)
const profile = JSON.parse(Buffer.concat(chunks).toString())
if (!profile.id) {
res.writeHead(400, { 'Content-Type': 'application/json' })
res.end(JSON.stringify({ error: 'Missing profile id' }))
return
}
// Sanitize filename — only allow alphanumeric, underscore, dash, dot
const safeId = profile.id.replace(/[^a-zA-Z0-9_\-\.]/g, '_')
await writeFile(`${CHARACTERS_DIR}/${safeId}.json`, JSON.stringify(profile, null, 2))
res.writeHead(200, { 'Content-Type': 'application/json', 'Access-Control-Allow-Origin': '*' })
res.end(JSON.stringify({ ok: true }))
} catch (err) {
res.writeHead(500, { 'Content-Type': 'application/json' })
res.end(JSON.stringify({ error: err.message }))
}
return
}
if (req.method === 'DELETE' && subPath) {
try {
const safeId = subPath.replace(/[^a-zA-Z0-9_\-\.]/g, '_')
await unlink(`${CHARACTERS_DIR}/${safeId}.json`).catch(() => {})
res.writeHead(200, { 'Content-Type': 'application/json', 'Access-Control-Allow-Origin': '*' })
res.end(JSON.stringify({ ok: true }))
} catch (err) {
res.writeHead(500, { 'Content-Type': 'application/json' })
res.end(JSON.stringify({ error: err.message }))
}
return
}
next()
})
},
}
}
function satelliteMapPlugin() {
return {
name: 'satellite-map',
configureServer(server) {
server.middlewares.use('/api/satellite-map', async (req, res, next) => {
if (req.method === 'OPTIONS') {
res.writeHead(204, { 'Access-Control-Allow-Origin': '*', 'Access-Control-Allow-Methods': 'GET,POST', 'Access-Control-Allow-Headers': 'Content-Type' })
res.end()
return
}
const { readFile, writeFile } = await import('fs/promises')
if (req.method === 'GET') {
try {
const raw = await readFile(SATELLITE_MAP_PATH, 'utf-8')
res.writeHead(200, { 'Content-Type': 'application/json', 'Access-Control-Allow-Origin': '*' })
res.end(raw)
} catch {
res.writeHead(200, { 'Content-Type': 'application/json', 'Access-Control-Allow-Origin': '*' })
res.end(JSON.stringify({ default: 'aria_default', satellites: {} }))
}
return
}
if (req.method === 'POST') {
try {
const chunks = []
for await (const chunk of req) chunks.push(chunk)
const data = JSON.parse(Buffer.concat(chunks).toString())
await writeFile(SATELLITE_MAP_PATH, JSON.stringify(data, null, 2))
res.writeHead(200, { 'Content-Type': 'application/json', 'Access-Control-Allow-Origin': '*' })
res.end(JSON.stringify({ ok: true }))
} catch (err) {
res.writeHead(500, { 'Content-Type': 'application/json' })
res.end(JSON.stringify({ error: err.message }))
}
return
}
next()
})
},
}
}
function conversationStoragePlugin() {
return {
name: 'conversation-storage',
configureServer(server) {
const ensureDir = async () => {
const { mkdir } = await import('fs/promises')
await mkdir(CONVERSATIONS_DIR, { recursive: true })
}
server.middlewares.use('/api/conversations', async (req, res, next) => {
if (req.method === 'OPTIONS') {
res.writeHead(204, { 'Access-Control-Allow-Origin': '*', 'Access-Control-Allow-Methods': 'GET,POST,DELETE', 'Access-Control-Allow-Headers': 'Content-Type' })
res.end()
return
}
const { readdir, readFile, writeFile, unlink } = await import('fs/promises')
await ensureDir()
const url = new URL(req.url, 'http://localhost')
const subPath = url.pathname.replace(/^\/+/, '')
// GET /api/conversations/:id — single conversation with messages
if (req.method === 'GET' && subPath) {
try {
const safeId = subPath.replace(/[^a-zA-Z0-9_\-\.]/g, '_')
const raw = await readFile(`${CONVERSATIONS_DIR}/${safeId}.json`, 'utf-8')
res.writeHead(200, { 'Content-Type': 'application/json', 'Access-Control-Allow-Origin': '*' })
res.end(raw)
} catch {
res.writeHead(404, { 'Content-Type': 'application/json', 'Access-Control-Allow-Origin': '*' })
res.end(JSON.stringify({ error: 'Not found' }))
}
return
}
// GET /api/conversations — list metadata (no messages)
if (req.method === 'GET' && !subPath) {
try {
const files = (await readdir(CONVERSATIONS_DIR)).filter(f => f.endsWith('.json'))
const list = []
for (const file of files) {
try {
const raw = await readFile(`${CONVERSATIONS_DIR}/${file}`, 'utf-8')
const conv = JSON.parse(raw)
list.push({
id: conv.id,
title: conv.title || '',
characterId: conv.characterId || '',
characterName: conv.characterName || '',
createdAt: conv.createdAt || '',
updatedAt: conv.updatedAt || '',
messageCount: (conv.messages || []).length,
})
} catch { /* skip corrupt files */ }
}
list.sort((a, b) => (b.updatedAt || '').localeCompare(a.updatedAt || ''))
res.writeHead(200, { 'Content-Type': 'application/json', 'Access-Control-Allow-Origin': '*' })
res.end(JSON.stringify(list))
} catch (err) {
res.writeHead(500, { 'Content-Type': 'application/json' })
res.end(JSON.stringify({ error: err.message }))
}
return
}
// POST /api/conversations — create or update
if (req.method === 'POST' && !subPath) {
try {
const chunks = []
for await (const chunk of req) chunks.push(chunk)
const conv = JSON.parse(Buffer.concat(chunks).toString())
if (!conv.id) {
res.writeHead(400, { 'Content-Type': 'application/json' })
res.end(JSON.stringify({ error: 'Missing conversation id' }))
return
}
const safeId = conv.id.replace(/[^a-zA-Z0-9_\-\.]/g, '_')
await writeFile(`${CONVERSATIONS_DIR}/${safeId}.json`, JSON.stringify(conv, null, 2))
res.writeHead(200, { 'Content-Type': 'application/json', 'Access-Control-Allow-Origin': '*' })
res.end(JSON.stringify({ ok: true }))
} catch (err) {
res.writeHead(500, { 'Content-Type': 'application/json' })
res.end(JSON.stringify({ error: err.message }))
}
return
}
// DELETE /api/conversations/:id
if (req.method === 'DELETE' && subPath) {
try {
const safeId = subPath.replace(/[^a-zA-Z0-9_\-\.]/g, '_')
await unlink(`${CONVERSATIONS_DIR}/${safeId}.json`).catch(() => {})
res.writeHead(200, { 'Content-Type': 'application/json', 'Access-Control-Allow-Origin': '*' })
res.end(JSON.stringify({ ok: true }))
} catch (err) {
res.writeHead(500, { 'Content-Type': 'application/json' })
res.end(JSON.stringify({ error: err.message }))
}
return
}
next()
})
},
}
}
function healthCheckPlugin() {
return {
name: 'health-check-proxy',
@@ -121,6 +383,321 @@ function healthCheckPlugin() {
};
}
function gazeProxyPlugin() {
return {
name: 'gaze-proxy',
configureServer(server) {
server.middlewares.use('/api/gaze/presets', async (req, res) => {
if (!GAZE_API_KEY) {
res.writeHead(200, { 'Content-Type': 'application/json', 'Access-Control-Allow-Origin': '*' })
res.end(JSON.stringify({ presets: [] }))
return
}
try {
const http = await import('http')
const url = new URL(`${GAZE_HOST}/api/v1/presets`)
const proxyRes = await new Promise((resolve, reject) => {
const r = http.default.get(url, { headers: { 'X-API-Key': GAZE_API_KEY }, timeout: 5000 }, resolve)
r.on('error', reject)
r.on('timeout', () => { r.destroy(); reject(new Error('timeout')) })
})
const chunks = []
for await (const chunk of proxyRes) chunks.push(chunk)
res.writeHead(proxyRes.statusCode, { 'Content-Type': 'application/json', 'Access-Control-Allow-Origin': '*' })
res.end(Buffer.concat(chunks))
} catch {
res.writeHead(200, { 'Content-Type': 'application/json', 'Access-Control-Allow-Origin': '*' })
res.end(JSON.stringify({ presets: [] }))
}
})
},
}
}
function memoryStoragePlugin() {
return {
name: 'memory-storage',
configureServer(server) {
const ensureDirs = async () => {
const { mkdir } = await import('fs/promises')
await mkdir(`${MEMORIES_DIR}/personal`, { recursive: true })
}
const readJsonFile = async (path, fallback) => {
const { readFile } = await import('fs/promises')
try {
return JSON.parse(await readFile(path, 'utf-8'))
} catch {
return fallback
}
}
const writeJsonFile = async (path, data) => {
const { writeFile } = await import('fs/promises')
await writeFile(path, JSON.stringify(data, null, 2))
}
// Personal memories: /api/memories/personal/:characterId[/:memoryId]
server.middlewares.use('/api/memories/personal', async (req, res, next) => {
if (req.method === 'OPTIONS') {
res.writeHead(204, { 'Access-Control-Allow-Origin': '*', 'Access-Control-Allow-Methods': 'GET,POST,DELETE', 'Access-Control-Allow-Headers': 'Content-Type' })
res.end()
return
}
await ensureDirs()
const url = new URL(req.url, 'http://localhost')
const parts = url.pathname.replace(/^\/+/, '').split('/')
const characterId = parts[0] ? parts[0].replace(/[^a-zA-Z0-9_\-\.]/g, '_') : null
const memoryId = parts[1] || null
if (!characterId) {
res.writeHead(400, { 'Content-Type': 'application/json' })
res.end(JSON.stringify({ error: 'Missing character ID' }))
return
}
const filePath = `${MEMORIES_DIR}/personal/${characterId}.json`
if (req.method === 'GET') {
const data = await readJsonFile(filePath, { characterId, memories: [] })
res.writeHead(200, { 'Content-Type': 'application/json', 'Access-Control-Allow-Origin': '*' })
res.end(JSON.stringify(data))
return
}
if (req.method === 'POST') {
try {
const chunks = []
for await (const chunk of req) chunks.push(chunk)
const memory = JSON.parse(Buffer.concat(chunks).toString())
const data = await readJsonFile(filePath, { characterId, memories: [] })
if (memory.id) {
const idx = data.memories.findIndex(m => m.id === memory.id)
if (idx >= 0) {
data.memories[idx] = { ...data.memories[idx], ...memory }
} else {
data.memories.push(memory)
}
} else {
memory.id = 'm_' + Date.now()
memory.createdAt = memory.createdAt || new Date().toISOString()
data.memories.push(memory)
}
await writeJsonFile(filePath, data)
res.writeHead(200, { 'Content-Type': 'application/json', 'Access-Control-Allow-Origin': '*' })
res.end(JSON.stringify({ ok: true, memory }))
} catch (err) {
res.writeHead(500, { 'Content-Type': 'application/json' })
res.end(JSON.stringify({ error: err.message }))
}
return
}
if (req.method === 'DELETE' && memoryId) {
try {
const data = await readJsonFile(filePath, { characterId, memories: [] })
data.memories = data.memories.filter(m => m.id !== memoryId)
await writeJsonFile(filePath, data)
res.writeHead(200, { 'Content-Type': 'application/json', 'Access-Control-Allow-Origin': '*' })
res.end(JSON.stringify({ ok: true }))
} catch (err) {
res.writeHead(500, { 'Content-Type': 'application/json' })
res.end(JSON.stringify({ error: err.message }))
}
return
}
next()
})
// General memories: /api/memories/general[/:memoryId]
server.middlewares.use('/api/memories/general', async (req, res, next) => {
if (req.method === 'OPTIONS') {
res.writeHead(204, { 'Access-Control-Allow-Origin': '*', 'Access-Control-Allow-Methods': 'GET,POST,DELETE', 'Access-Control-Allow-Headers': 'Content-Type' })
res.end()
return
}
await ensureDirs()
const url = new URL(req.url, 'http://localhost')
const memoryId = url.pathname.replace(/^\/+/, '') || null
const filePath = `${MEMORIES_DIR}/general.json`
if (req.method === 'GET') {
const data = await readJsonFile(filePath, { memories: [] })
res.writeHead(200, { 'Content-Type': 'application/json', 'Access-Control-Allow-Origin': '*' })
res.end(JSON.stringify(data))
return
}
if (req.method === 'POST') {
try {
const chunks = []
for await (const chunk of req) chunks.push(chunk)
const memory = JSON.parse(Buffer.concat(chunks).toString())
const data = await readJsonFile(filePath, { memories: [] })
if (memory.id) {
const idx = data.memories.findIndex(m => m.id === memory.id)
if (idx >= 0) {
data.memories[idx] = { ...data.memories[idx], ...memory }
} else {
data.memories.push(memory)
}
} else {
memory.id = 'm_' + Date.now()
memory.createdAt = memory.createdAt || new Date().toISOString()
data.memories.push(memory)
}
await writeJsonFile(filePath, data)
res.writeHead(200, { 'Content-Type': 'application/json', 'Access-Control-Allow-Origin': '*' })
res.end(JSON.stringify({ ok: true, memory }))
} catch (err) {
res.writeHead(500, { 'Content-Type': 'application/json' })
res.end(JSON.stringify({ error: err.message }))
}
return
}
if (req.method === 'DELETE' && memoryId) {
try {
const data = await readJsonFile(filePath, { memories: [] })
data.memories = data.memories.filter(m => m.id !== memoryId)
await writeJsonFile(filePath, data)
res.writeHead(200, { 'Content-Type': 'application/json', 'Access-Control-Allow-Origin': '*' })
res.end(JSON.stringify({ ok: true }))
} catch (err) {
res.writeHead(500, { 'Content-Type': 'application/json' })
res.end(JSON.stringify({ error: err.message }))
}
return
}
next()
})
},
}
}
function characterLookupPlugin() {
return {
name: 'character-lookup',
configureServer(server) {
server.middlewares.use('/api/character-lookup', async (req, res) => {
if (req.method === 'OPTIONS') {
res.writeHead(204, { 'Access-Control-Allow-Origin': '*', 'Access-Control-Allow-Methods': 'POST', 'Access-Control-Allow-Headers': 'Content-Type' })
res.end()
return
}
if (req.method !== 'POST') {
res.writeHead(405, { 'Content-Type': 'application/json' })
res.end(JSON.stringify({ error: 'POST only' }))
return
}
try {
const chunks = []
for await (const chunk of req) chunks.push(chunk)
const { name, franchise } = JSON.parse(Buffer.concat(chunks).toString())
if (!name || !franchise) {
res.writeHead(400, { 'Content-Type': 'application/json' })
res.end(JSON.stringify({ error: 'Missing name or franchise' }))
return
}
const { execFile } = await import('child_process')
const { promisify } = await import('util')
const execFileAsync = promisify(execFile)
// Call the MCP fetcher inside the running Docker container
const safeName = name.replace(/'/g, "\\'")
const safeFranchise = franchise.replace(/'/g, "\\'")
const pyScript = `
import asyncio, json
from character_details.fetcher import fetch_character
c = asyncio.run(fetch_character('${safeName}', '${safeFranchise}'))
print(json.dumps(c.model_dump(), default=str))
`.trim()
const { stdout } = await execFileAsync(
'docker',
['exec', 'character-browser-character-mcp-1', 'python', '-c', pyScript],
{ timeout: 30000 }
)
const data = JSON.parse(stdout.trim())
res.writeHead(200, { 'Content-Type': 'application/json', 'Access-Control-Allow-Origin': '*' })
res.end(JSON.stringify({
name: data.name || name,
franchise: data.franchise || franchise,
description: data.description || '',
background: data.background || '',
appearance: data.appearance || '',
personality: data.personality || '',
abilities: data.abilities || [],
notable_quotes: data.notable_quotes || [],
relationships: data.relationships || [],
sources: data.sources || [],
}))
} catch (err) {
console.error('[character-lookup] failed:', err?.message || err)
const status = err?.message?.includes('timeout') ? 504 : 500
res.writeHead(status, { 'Content-Type': 'application/json', 'Access-Control-Allow-Origin': '*' })
res.end(JSON.stringify({ error: err?.message || 'Lookup failed' }))
}
})
},
}
}
function modePlugin() {
return {
name: 'mode-api',
configureServer(server) {
server.middlewares.use('/api/mode', async (req, res, next) => {
if (req.method === 'OPTIONS') {
res.writeHead(204, { 'Access-Control-Allow-Origin': '*', 'Access-Control-Allow-Methods': 'GET,POST', 'Access-Control-Allow-Headers': 'Content-Type' })
res.end()
return
}
const { readFile, writeFile } = await import('fs/promises')
const DEFAULT_MODE = { mode: 'private', cloud_provider: 'anthropic', cloud_model: 'claude-sonnet-4-20250514', local_model: 'ollama/qwen3.5:35b-a3b', overrides: {}, updated_at: '' }
if (req.method === 'GET') {
try {
const raw = await readFile(MODE_PATH, 'utf-8')
res.writeHead(200, { 'Content-Type': 'application/json', 'Access-Control-Allow-Origin': '*' })
res.end(raw)
} catch {
res.writeHead(200, { 'Content-Type': 'application/json', 'Access-Control-Allow-Origin': '*' })
res.end(JSON.stringify(DEFAULT_MODE))
}
return
}
if (req.method === 'POST') {
try {
const chunks = []
for await (const chunk of req) chunks.push(chunk)
const data = JSON.parse(Buffer.concat(chunks).toString())
data.updated_at = new Date().toISOString()
await writeFile(MODE_PATH, JSON.stringify(data, null, 2))
res.writeHead(200, { 'Content-Type': 'application/json', 'Access-Control-Allow-Origin': '*' })
res.end(JSON.stringify({ ok: true }))
} catch (err) {
res.writeHead(500, { 'Content-Type': 'application/json' })
res.end(JSON.stringify({ error: err.message }))
}
return
}
next()
})
},
}
}
function bridgeProxyPlugin() {
return {
name: 'bridge-proxy',
@@ -172,10 +749,11 @@ function bridgeProxyPlugin() {
proxyReq.write(body)
proxyReq.end()
})
} catch {
} catch (err) {
console.error(`[bridge-proxy] ${targetPath} failed:`, err?.message || err)
if (!res.headersSent) {
res.writeHead(502, { 'Content-Type': 'application/json' })
res.end(JSON.stringify({ error: 'Bridge unreachable' }))
res.end(JSON.stringify({ error: `Bridge unreachable: ${err?.message || 'unknown'}` }))
}
}
}
@@ -189,7 +767,14 @@ function bridgeProxyPlugin() {
export default defineConfig({
plugins: [
characterStoragePlugin(),
satelliteMapPlugin(),
conversationStoragePlugin(),
memoryStoragePlugin(),
gazeProxyPlugin(),
characterLookupPlugin(),
healthCheckPlugin(),
modePlugin(),
bridgeProxyPlugin(),
tailwindcss(),
react(),

View File

@@ -6,7 +6,7 @@
## Goal
Flash ESP32-S3-BOX-3 units with ESPHome. Each unit acts as a dumb room satellite: always-on mic, local wake word detection, audio playback, and an LVGL animated face showing assistant state. All intelligence stays on the Mac Mini.
Flash ESP32-S3-BOX-3 units with ESPHome. Each unit acts as a dumb room satellite: always-on mic, on-device wake word detection, audio playback, and a display showing assistant state via static PNG face illustrations. All intelligence stays on the Mac Mini.
---
@@ -17,11 +17,12 @@ Flash ESP32-S3-BOX-3 units with ESPHome. Each unit acts as a dumb room satellite
| SoC | ESP32-S3 (dual-core Xtensa, 240MHz) |
| RAM | 512KB SRAM + 16MB PSRAM |
| Flash | 16MB |
| Display | 2.4" IPS LCD, 320×240, touchscreen |
| Mic | Dual microphone array |
| Speaker | Built-in 1W speaker |
| Connectivity | WiFi 802.11b/g/n, BT 5.0 |
| USB | USB-C (programming + power) |
| Display | 2.4" IPS LCD, 320×240, touchscreen (ILI9xxx, model S3BOX) |
| Audio ADC | ES7210 (dual mic array, 16kHz 16-bit) |
| Audio DAC | ES8311 (speaker output, 48kHz 16-bit) |
| Speaker | Built-in 1W |
| Connectivity | WiFi 802.11b/g/n (2.4GHz only), BT 5.0 |
| USB | USB-C (programming + power, native USB JTAG serial) |
---
@@ -29,273 +30,100 @@ Flash ESP32-S3-BOX-3 units with ESPHome. Each unit acts as a dumb room satellite
```
ESP32-S3-BOX-3
├── microWakeWord (on-device, always listening)
│ └── triggers Wyoming Satellite on wake detection
├── Wyoming Satellite
│ ├── streams mic audio → Mac Mini Wyoming STT (port 10300)
── receives TTS audio Mac Mini Wyoming TTS (port 10301)
├── LVGL Display
│ └── animated face, driven by HA entity state
├── micro_wake_word (on-device, always listening)
│ └── "hey_jarvis" — triggers voice_assistant on wake detection
├── voice_assistant (ESPHome component)
│ ├── connects to Home Assistant via ESPHome API
── HA routes audio Mac Mini Wyoming STT (10.0.0.101:10300)
│ ├── HA routes text → OpenClaw conversation agent (10.0.0.101:8081)
│ └── HA routes response → Mac Mini Wyoming TTS (10.0.0.101:10301)
├── Display (ili9xxx, model S3BOX, 320×240)
│ └── static PNG faces per state (idle, listening, thinking, replying, error)
└── ESPHome OTA
└── firmware updates over WiFi
```
---
## Pin Map (ESP32-S3-BOX-3)
| Function | Pin(s) | Notes |
|---|---|---|
| I2S LRCLK | GPIO45 | strapping pin — warning ignored |
| I2S BCLK | GPIO17 | |
| I2S MCLK | GPIO2 | |
| I2S DIN (mic) | GPIO16 | ES7210 ADC input |
| I2S DOUT (speaker) | GPIO15 | ES8311 DAC output |
| Speaker enable | GPIO46 | strapping pin — warning ignored |
| I2C SCL | GPIO18 | audio codec control bus |
| I2C SDA | GPIO8 | audio codec control bus |
| SPI CLK (display) | GPIO7 | |
| SPI MOSI (display) | GPIO6 | |
| Display CS | GPIO5 | |
| Display DC | GPIO4 | |
| Display Reset | GPIO48 | inverted |
| Backlight | GPIO47 | LEDC PWM |
| Left top button | GPIO0 | strapping pin — mute toggle / factory reset |
| Sensor dock I2C SCL | GPIO40 | sensor bus (AHT-30, AT581x radar) |
| Sensor dock I2C SDA | GPIO41 | sensor bus (AHT-30, AT581x radar) |
| Radar presence output | GPIO21 | AT581x digital detection pin |
---
## ESPHome Configuration
### Base Config Template
`esphome/base.yaml` — shared across all units:
### Platform & Framework
```yaml
esphome:
name: homeai-${room}
friendly_name: "HomeAI ${room_display}"
platform: esp32
board: esp32-s3-box-3
esp32:
board: esp32s3box
flash_size: 16MB
cpu_frequency: 240MHz
framework:
type: esp-idf
sdkconfig_options:
CONFIG_ESP32S3_DEFAULT_CPU_FREQ_240: "y"
CONFIG_ESP32S3_DATA_CACHE_64KB: "y"
CONFIG_ESP32S3_DATA_CACHE_LINE_64B: "y"
wifi:
ssid: !secret wifi_ssid
password: !secret wifi_password
ap:
ssid: "HomeAI Fallback"
api:
encryption:
key: !secret api_key
ota:
password: !secret ota_password
logger:
level: INFO
psram:
mode: octal
speed: 80MHz
```
### Room-Specific Config
### Audio Stack
`esphome/s3-box-living-room.yaml`:
Uses `i2s_audio` platform with external ADC/DAC codec chips:
```yaml
substitutions:
room: living-room
room_display: "Living Room"
mac_mini_ip: "192.168.1.x" # or Tailscale IP
- **Microphone**: ES7210 ADC via I2S, 16kHz 16-bit mono
- **Speaker**: ES8311 DAC via I2S, 48kHz 16-bit mono (left channel)
- **Media player**: wraps speaker with volume control (min 50%, max 85%)
packages:
base: !include base.yaml
voice: !include voice.yaml
display: !include display.yaml
```
### Wake Word
One file per room, only the substitutions change.
On-device `micro_wake_word` component with `hey_jarvis` model. Can optionally be switched to Home Assistant streaming wake word via a selector entity.
### Voice / Wyoming Satellite — `esphome/voice.yaml`
### Display
```yaml
microphone:
- platform: esp_adf
id: mic
`ili9xxx` platform with model `S3BOX`. Uses `update_interval: never` — display updates are triggered by scripts on voice assistant state changes. Static 320×240 PNG images for each state are compiled into firmware. No text overlays — voice-only interaction.
speaker:
- platform: esp_adf
id: spk
Screen auto-dims after a configurable idle timeout (default 1 min, adjustable 160 min via HA entity). Wakes on voice activity or radar presence detection.
micro_wake_word:
model: hey_jarvis # or custom model path
on_wake_word_detected:
- voice_assistant.start:
### Sensor Dock (ESP32-S3-BOX-3-SENSOR)
voice_assistant:
microphone: mic
speaker: spk
noise_suppression_level: 2
auto_gain: 31dBFS
volume_multiplier: 2.0
Optional accessory dock connected via secondary I2C bus (GPIO40/41, 100kHz):
on_listening:
- display.page.show: page_listening
- script.execute: animate_face_listening
- **AHT-30** (temp/humidity) — `aht10` component with variant AHT20, 30s update interval
- **AT581x mmWave radar** — presence detection via GPIO21, I2C for settings config
- **Radar RF switch** — toggle radar on/off from HA
- Radar configured on boot: sensing_distance=600, trigger_keep=5s, hw_frontend_reset=true
on_stt_vad_end:
- display.page.show: page_thinking
- script.execute: animate_face_thinking
### Voice Assistant
on_tts_start:
- display.page.show: page_speaking
- script.execute: animate_face_speaking
on_end:
- display.page.show: page_idle
- script.execute: animate_face_idle
on_error:
- display.page.show: page_error
- script.execute: animate_face_error
```
**Note:** ESPHome's `voice_assistant` component connects to HA, which routes to Wyoming STT/TTS on the Mac Mini. This is the standard ESPHome → HA → Wyoming path.
### LVGL Display — `esphome/display.yaml`
```yaml
display:
- platform: ili9xxx
model: ILI9341
id: lcd
cs_pin: GPIO5
dc_pin: GPIO4
reset_pin: GPIO48
touchscreen:
- platform: tt21100
id: touch
lvgl:
displays:
- lcd
touchscreens:
- touch
# Face widget — centered on screen
widgets:
- obj:
id: face_container
width: 320
height: 240
bg_color: 0x000000
children:
# Eyes (two circles)
- obj:
id: eye_left
x: 90
y: 90
width: 50
height: 50
radius: 25
bg_color: 0xFFFFFF
- obj:
id: eye_right
x: 180
y: 90
width: 50
height: 50
radius: 25
bg_color: 0xFFFFFF
# Mouth (line/arc)
- arc:
id: mouth
x: 110
y: 160
width: 100
height: 40
start_angle: 180
end_angle: 360
arc_color: 0xFFFFFF
pages:
- id: page_idle
- id: page_listening
- id: page_thinking
- id: page_speaking
- id: page_error
```
### LVGL Face State Animations — `esphome/animations.yaml`
```yaml
script:
- id: animate_face_idle
then:
- lvgl.widget.modify:
id: eye_left
height: 50 # normal open
- lvgl.widget.modify:
id: eye_right
height: 50
- lvgl.widget.modify:
id: mouth
arc_color: 0xFFFFFF
- id: animate_face_listening
then:
- lvgl.widget.modify:
id: eye_left
height: 60 # wider eyes
- lvgl.widget.modify:
id: eye_right
height: 60
- lvgl.widget.modify:
id: mouth
arc_color: 0x00BFFF # blue tint
- id: animate_face_thinking
then:
- lvgl.widget.modify:
id: eye_left
height: 20 # squinting
- lvgl.widget.modify:
id: eye_right
height: 20
- id: animate_face_speaking
then:
- lvgl.widget.modify:
id: mouth
arc_color: 0x00FF88 # green speaking indicator
- id: animate_face_error
then:
- lvgl.widget.modify:
id: eye_left
bg_color: 0xFF2200 # red eyes
- lvgl.widget.modify:
id: eye_right
bg_color: 0xFF2200
```
> **Note:** True lip-sync animation (mouth moving with audio) is complex on ESP32. Phase 1: static states. Phase 2: amplitude-driven mouth height using speaker volume feedback.
---
## Secrets File
`esphome/secrets.yaml` (gitignored):
```yaml
wifi_ssid: "YourNetwork"
wifi_password: "YourPassword"
api_key: "<32-byte base64 key>"
ota_password: "YourOTAPassword"
```
---
## Flash & Deployment Workflow
```bash
# Install ESPHome
pip install esphome
# Compile + flash via USB (first time)
esphome run esphome/s3-box-living-room.yaml
# OTA update (subsequent)
esphome upload esphome/s3-box-living-room.yaml --device <device-ip>
# View logs
esphome logs esphome/s3-box-living-room.yaml
```
---
## Home Assistant Integration
After flashing:
1. HA discovers ESP32 automatically via mDNS
2. Add device in HA → Settings → Devices
3. Assign Wyoming voice assistant pipeline to the device
4. Set up room-specific automations (e.g., "Living Room" light control from that satellite)
ESPHome's `voice_assistant` component connects to HA via the ESPHome native API (not directly to Wyoming). HA orchestrates the pipeline:
1. Audio → Wyoming STT (Mac Mini) → text
2. Text → OpenClaw conversation agent → response
3. Response → Wyoming TTS (Mac Mini) → audio back to ESP32
---
@@ -303,43 +131,71 @@ After flashing:
```
homeai-esp32/
├── PLAN.md
├── setup.sh # env check + flash/ota/logs commands
└── esphome/
├── base.yaml
├── voice.yaml
├── display.yaml
├── animations.yaml
── s3-box-living-room.yaml
├── s3-box-bedroom.yaml # template, fill in when hardware available
├── s3-box-kitchen.yaml # template
└── secrets.yaml # gitignored
├── secrets.yaml # gitignored — WiFi + API key
├── homeai-living-room.yaml # first unit (full config)
├── homeai-bedroom.yaml # future: copy + change substitutions
├── homeai-kitchen.yaml # future: copy + change substitutions
── illustrations/ # 320×240 PNG face images
├── idle.png
├── loading.png
├── listening.png
├── thinking.png
├── replying.png
├── error.png
└── timer_finished.png
```
---
## Wake Word Decisions
## ESPHome Environment
```bash
# Dedicated venv (Python 3.12) — do NOT share with voice/whisper venvs
~/homeai-esphome-env/bin/esphome version # ESPHome 2026.2.4+
# Quick commands
cd ~/gitea/homeai/homeai-esp32
~/homeai-esphome-env/bin/esphome run esphome/homeai-living-room.yaml # compile + flash
~/homeai-esphome-env/bin/esphome logs esphome/homeai-living-room.yaml # stream logs
# Or use the setup script
./setup.sh flash # compile + USB flash
./setup.sh ota # compile + OTA update
./setup.sh logs # stream device logs
./setup.sh validate # check YAML without compiling
```
---
## Wake Word Options
| Option | Latency | Privacy | Effort |
|---|---|---|---|
| `hey_jarvis` (built-in microWakeWord) | ~200ms | On-device | Zero |
| `hey_jarvis` (built-in micro_wake_word) | ~200ms | On-device | Zero |
| Custom word (trained model) | ~200ms | On-device | High — requires 50+ recordings |
| Mac Mini openWakeWord (stream audio) | ~500ms | On Mac | Medium |
| HA streaming wake word | ~500ms | On Mac Mini | Medium — stream all audio |
**Recommendation:** Start with `hey_jarvis`. Train a custom word (character's name) once character name is finalised.
**Current**: `hey_jarvis` on-device. Train a custom word (character's name) once finalised.
---
## Implementation Steps
- [ ] Install ESPHome: `pip install esphome`
- [ ] Write `esphome/secrets.yaml` (gitignored)
- [ ] Write `base.yaml`, `voice.yaml`, `display.yaml`, `animations.yaml`
- [ ] Write `s3-box-living-room.yaml` for first unit
- [ ] Flash first unit via USB: `esphome run s3-box-living-room.yaml`
- [ ] Verify unit appears in HA device list
- [ ] Assign Wyoming voice pipeline to unit in HA
- [ ] Test: speak wake word → transcription → LLM response → spoken reply
- [ ] Test: LVGL face cycles through idle → listening → thinking → speaking
- [ ] Verify OTA update works: change LVGL color, deploy wirelessly
- [x] Install ESPHome in `~/homeai-esphome-env` (Python 3.12)
- [x] Write `esphome/secrets.yaml` (gitignored)
- [x] Write `homeai-living-room.yaml` (based on official S3-BOX-3 reference config)
- [x] Generate placeholder face illustrations (7 PNGs, 320×240)
- [x] Write `setup.sh` with flash/ota/logs/validate commands
- [x] Write `deploy.sh` with OTA deploy, image management, multi-unit support
- [x] Flash first unit via USB (living room)
- [x] Verify unit appears in HA device list
- [x] Assign Wyoming voice pipeline to unit in HA
- [x] Test: speak wake word → transcription → LLM response → spoken reply
- [x] Test: display cycles through idle → listening → thinking → replying
- [x] Verify OTA update works: change config, deploy wirelessly
- [ ] Write config templates for remaining rooms (bedroom, kitchen)
- [ ] Flash remaining units, verify each works independently
- [ ] Document final MAC address → room name mapping
@@ -351,7 +207,17 @@ homeai-esp32/
- [ ] Wake word "hey jarvis" triggers pipeline reliably from 3m distance
- [ ] STT transcription accuracy >90% for clear speech in quiet room
- [ ] TTS audio plays clearly through ESP32 speaker
- [ ] LVGL face shows correct state for idle / listening / thinking / speaking / error
- [ ] Display shows correct state for idle / listening / thinking / replying / error / muted
- [ ] OTA firmware updates work without USB cable
- [ ] Unit reconnects automatically after WiFi drop
- [ ] Unit survives power cycle and resumes normal operation
---
## Known Constraints
- **Memory**: voice_assistant + micro_wake_word + display + sensor dock is near the limit. Do NOT add Bluetooth or LVGL widgets — they will cause crashes.
- **WiFi**: 2.4GHz only. 5GHz networks are not supported.
- **Speaker**: 1W built-in. Volume capped at 85% to avoid distortion.
- **Display**: Static PNGs compiled into firmware. To change images, reflash via OTA (~1-2 min).
- **First compile**: Downloads ESP-IDF toolchain (~500MB), takes 5-10 minutes. Incremental builds are 1-2 minutes.

263
homeai-esp32/deploy.sh Executable file
View File

@@ -0,0 +1,263 @@
#!/usr/bin/env bash
# homeai-esp32/deploy.sh — Quick OTA deploy for ESP32-S3-BOX-3 satellites
#
# Usage:
# ./deploy.sh — deploy config + images to living room (default)
# ./deploy.sh bedroom — deploy to bedroom unit
# ./deploy.sh --images-only — deploy existing PNGs from illustrations/ (no regen)
# ./deploy.sh --regen-images — regenerate placeholder PNGs then deploy
# ./deploy.sh --validate — validate config without deploying
# ./deploy.sh --all — deploy to all configured units
#
# Images are compiled into firmware, so any PNG changes require a reflash.
# To use custom images: drop 320x240 PNGs into esphome/illustrations/ then ./deploy.sh
set -euo pipefail
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
ESPHOME_DIR="${SCRIPT_DIR}/esphome"
ESPHOME_VENV="${HOME}/homeai-esphome-env"
ESPHOME="${ESPHOME_VENV}/bin/esphome"
PYTHON="${ESPHOME_VENV}/bin/python3"
ILLUSTRATIONS_DIR="${ESPHOME_DIR}/illustrations"
# Colors
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
BLUE='\033[0;34m'
CYAN='\033[0;36m'
NC='\033[0m'
log_info() { echo -e "${BLUE}[INFO]${NC} $*"; }
log_ok() { echo -e "${GREEN}[OK]${NC} $*"; }
log_warn() { echo -e "${YELLOW}[WARN]${NC} $*"; }
log_error() { echo -e "${RED}[ERROR]${NC} $*"; exit 1; }
log_step() { echo -e "${CYAN}[STEP]${NC} $*"; }
# ─── Available units ──────────────────────────────────────────────────────────
UNIT_NAMES=(living-room bedroom kitchen)
DEFAULT_UNIT="living-room"
unit_config() {
case "$1" in
living-room) echo "homeai-living-room.yaml" ;;
bedroom) echo "homeai-bedroom.yaml" ;;
kitchen) echo "homeai-kitchen.yaml" ;;
*) echo "" ;;
esac
}
unit_list() {
echo "${UNIT_NAMES[*]}"
}
# ─── Face image generator ────────────────────────────────────────────────────
generate_faces() {
log_step "Generating face illustrations (320x240 PNG)..."
"${PYTHON}" << 'PYEOF'
from PIL import Image, ImageDraw
import os
WIDTH, HEIGHT = 320, 240
OUT = os.environ.get("ILLUSTRATIONS_DIR", "esphome/illustrations")
def draw_face(draw, eye_color, mouth_color, eye_height=40, eye_y=80, mouth_style="smile"):
ex1, ey1 = 95, eye_y
draw.ellipse([ex1-25, ey1-eye_height//2, ex1+25, ey1+eye_height//2], fill=eye_color)
ex2, ey2 = 225, eye_y
draw.ellipse([ex2-25, ey2-eye_height//2, ex2+25, ey2+eye_height//2], fill=eye_color)
if mouth_style == "smile":
draw.arc([110, 140, 210, 200], start=0, end=180, fill=mouth_color, width=3)
elif mouth_style == "open":
draw.ellipse([135, 150, 185, 190], fill=mouth_color)
elif mouth_style == "flat":
draw.line([120, 170, 200, 170], fill=mouth_color, width=3)
elif mouth_style == "frown":
draw.arc([110, 160, 210, 220], start=180, end=360, fill=mouth_color, width=3)
states = {
"idle": {"eye_color": "#FFFFFF", "mouth_color": "#FFFFFF", "eye_height": 40, "mouth_style": "smile"},
"loading": {"eye_color": "#6366F1", "mouth_color": "#6366F1", "eye_height": 30, "mouth_style": "flat"},
"listening": {"eye_color": "#00BFFF", "mouth_color": "#00BFFF", "eye_height": 50, "mouth_style": "open"},
"thinking": {"eye_color": "#A78BFA", "mouth_color": "#A78BFA", "eye_height": 20, "mouth_style": "flat"},
"replying": {"eye_color": "#10B981", "mouth_color": "#10B981", "eye_height": 40, "mouth_style": "open"},
"error": {"eye_color": "#EF4444", "mouth_color": "#EF4444", "eye_height": 40, "mouth_style": "frown"},
"timer_finished": {"eye_color": "#F59E0B", "mouth_color": "#F59E0B", "eye_height": 50, "mouth_style": "smile"},
}
os.makedirs(OUT, exist_ok=True)
for name, p in states.items():
img = Image.new("RGBA", (WIDTH, HEIGHT), (0, 0, 0, 255))
draw = ImageDraw.Draw(img)
draw_face(draw, p["eye_color"], p["mouth_color"], p["eye_height"], mouth_style=p["mouth_style"])
img.save(f"{OUT}/{name}.png")
print(f" {name}.png")
PYEOF
log_ok "Generated 7 face illustrations"
}
# ─── Check existing images ───────────────────────────────────────────────────
REQUIRED_IMAGES=(idle loading listening thinking replying error timer_finished)
check_images() {
local missing=()
for name in "${REQUIRED_IMAGES[@]}"; do
if [[ ! -f "${ILLUSTRATIONS_DIR}/${name}.png" ]]; then
missing+=("${name}.png")
fi
done
if [[ ${#missing[@]} -gt 0 ]]; then
log_error "Missing illustrations: ${missing[*]}
Place 320x240 PNGs in ${ILLUSTRATIONS_DIR}/ or use --regen-images to generate placeholders."
fi
# Resize any images that aren't 320x240
local resized=0
for name in "${REQUIRED_IMAGES[@]}"; do
local img_path="${ILLUSTRATIONS_DIR}/${name}.png"
local dims
dims=$("${PYTHON}" -c "from PIL import Image; im=Image.open('${img_path}'); print(f'{im.width}x{im.height}')")
if [[ "$dims" != "320x240" ]]; then
log_warn "${name}.png is ${dims}, resizing to 320x240..."
"${PYTHON}" -c "
from PIL import Image
im = Image.open('${img_path}')
im = im.resize((320, 240), Image.LANCZOS)
im.save('${img_path}')
"
resized=$((resized + 1))
fi
done
if [[ $resized -gt 0 ]]; then
log_ok "Resized ${resized} image(s) to 320x240"
fi
log_ok "All ${#REQUIRED_IMAGES[@]} illustrations present and 320x240"
for name in "${REQUIRED_IMAGES[@]}"; do
local size
size=$(wc -c < "${ILLUSTRATIONS_DIR}/${name}.png" | tr -d ' ')
echo -e " ${name}.png (${size} bytes)"
done
}
# ─── Deploy to a single unit ─────────────────────────────────────────────────
deploy_unit() {
local unit_name="$1"
local config
config="$(unit_config "$unit_name")"
if [[ -z "$config" ]]; then
log_error "Unknown unit: ${unit_name}. Available: $(unit_list)"
fi
local config_path="${ESPHOME_DIR}/${config}"
if [[ ! -f "$config_path" ]]; then
log_error "Config not found: ${config_path}"
fi
log_step "Validating ${config}..."
cd "${ESPHOME_DIR}"
"${ESPHOME}" config "${config}" > /dev/null
log_ok "Config valid"
log_step "Compiling + OTA deploying ${config}..."
"${ESPHOME}" run "${config}" --device OTA 2>&1
log_ok "Deployed to ${unit_name}"
}
# ─── Main ─────────────────────────────────────────────────────────────────────
IMAGES_ONLY=false
REGEN_IMAGES=false
VALIDATE_ONLY=false
DEPLOY_ALL=false
TARGET="${DEFAULT_UNIT}"
while [[ $# -gt 0 ]]; do
case "$1" in
--images-only) IMAGES_ONLY=true; shift ;;
--regen-images) REGEN_IMAGES=true; shift ;;
--validate) VALIDATE_ONLY=true; shift ;;
--all) DEPLOY_ALL=true; shift ;;
--help|-h)
echo "Usage: $0 [unit-name] [--images-only] [--regen-images] [--validate] [--all]"
echo ""
echo "Units: $(unit_list)"
echo ""
echo "Options:"
echo " --images-only Deploy existing PNGs from illustrations/ (for custom images)"
echo " --regen-images Regenerate placeholder face PNGs then deploy"
echo " --validate Validate config without deploying"
echo " --all Deploy to all configured units"
echo ""
echo "Examples:"
echo " $0 # deploy config to living-room"
echo " $0 bedroom # deploy to bedroom"
echo " $0 --images-only # deploy with current images (custom or generated)"
echo " $0 --regen-images # regenerate placeholder faces + deploy"
echo " $0 --all # deploy to all units"
echo ""
echo "Custom images: drop 320x240 PNGs into esphome/illustrations/"
echo "Required files: ${REQUIRED_IMAGES[*]}"
exit 0
;;
*)
if [[ -n "$(unit_config "$1")" ]]; then
TARGET="$1"
else
log_error "Unknown option or unit: $1. Use --help for usage."
fi
shift
;;
esac
done
# Check ESPHome
if [[ ! -x "${ESPHOME}" ]]; then
log_error "ESPHome not found at ${ESPHOME}. Run setup.sh first."
fi
# Regenerate placeholder images if requested
if $REGEN_IMAGES; then
export ILLUSTRATIONS_DIR
generate_faces
fi
# Check existing images (verify present + resize if not 320x240)
check_images
# Validate only
if $VALIDATE_ONLY; then
cd "${ESPHOME_DIR}"
for unit_name in "${UNIT_NAMES[@]}"; do
config="$(unit_config "$unit_name")"
if [[ -f "${config}" ]]; then
log_step "Validating ${config}..."
"${ESPHOME}" config "${config}" > /dev/null && log_ok "${config} valid" || log_warn "${config} invalid"
fi
done
exit 0
fi
# Deploy
if $DEPLOY_ALL; then
for unit_name in "${UNIT_NAMES[@]}"; do
config="$(unit_config "$unit_name")"
if [[ -f "${ESPHOME_DIR}/${config}" ]]; then
deploy_unit "$unit_name"
else
log_warn "Skipping ${unit_name}${config} not found"
fi
done
else
deploy_unit "$TARGET"
fi
echo ""
log_ok "Deploy complete!"

5
homeai-esp32/esphome/.gitignore vendored Normal file
View File

@@ -0,0 +1,5 @@
# Gitignore settings for ESPHome
# This is an example and may include too much for your use-case.
# You can modify this file to suit your needs.
/.esphome/
/secrets.yaml

View File

@@ -0,0 +1,885 @@
---
# HomeAI Living Room Satellite — ESP32-S3-BOX-3
# Based on official ESPHome voice assistant config
# https://github.com/esphome/wake-word-voice-assistants
substitutions:
name: homeai-living-room
friendly_name: HomeAI Living Room
# Face illustrations — compiled into firmware (320x240 PNG)
loading_illustration_file: illustrations/loading.png
idle_illustration_file: illustrations/idle.png
listening_illustration_file: illustrations/listening.png
thinking_illustration_file: illustrations/thinking.png
replying_illustration_file: illustrations/replying.png
error_illustration_file: illustrations/error.png
timer_finished_illustration_file: illustrations/timer_finished.png
# Dark background for all states (matches HomeAI dashboard theme)
loading_illustration_background_color: "000000"
idle_illustration_background_color: "000000"
listening_illustration_background_color: "000000"
thinking_illustration_background_color: "000000"
replying_illustration_background_color: "000000"
error_illustration_background_color: "000000"
voice_assist_idle_phase_id: "1"
voice_assist_listening_phase_id: "2"
voice_assist_thinking_phase_id: "3"
voice_assist_replying_phase_id: "4"
voice_assist_not_ready_phase_id: "10"
voice_assist_error_phase_id: "11"
voice_assist_muted_phase_id: "12"
voice_assist_timer_finished_phase_id: "20"
font_family: Figtree
font_glyphsets: "GF_Latin_Core"
esphome:
name: ${name}
friendly_name: ${friendly_name}
min_version: 2025.5.0
name_add_mac_suffix: false
on_boot:
priority: 600
then:
- script.execute: draw_display
- at581x.settings:
id: radar
hw_frontend_reset: true
sensing_distance: 600
trigger_keep: 5000ms
- delay: 30s
- if:
condition:
lambda: return id(init_in_progress);
then:
- lambda: id(init_in_progress) = false;
- script.execute: draw_display
esp32:
board: esp32s3box
flash_size: 16MB
cpu_frequency: 240MHz
framework:
type: esp-idf
sdkconfig_options:
CONFIG_ESP32S3_DEFAULT_CPU_FREQ_240: "y"
CONFIG_ESP32S3_DATA_CACHE_64KB: "y"
CONFIG_ESP32S3_DATA_CACHE_LINE_64B: "y"
psram:
mode: octal
speed: 80MHz
wifi:
ssid: !secret wifi_ssid
password: !secret wifi_password
ap:
ssid: "HomeAI Fallback"
on_connect:
- script.execute: draw_display
on_disconnect:
- script.execute: draw_display
captive_portal:
api:
encryption:
key: !secret api_key
# Prevent device from rebooting if HA connection drops temporarily
reboot_timeout: 0s
on_client_connected:
- script.execute: draw_display
on_client_disconnected:
# Debounce: wait 5s before showing "HA not found" to avoid flicker on brief drops
- delay: 5s
- if:
condition:
not:
api.connected:
then:
- script.execute: draw_display
ota:
- platform: esphome
id: ota_esphome
logger:
hardware_uart: USB_SERIAL_JTAG
button:
- platform: factory_reset
id: factory_reset_btn
internal: true
binary_sensor:
- platform: gpio
pin:
number: GPIO0
ignore_strapping_warning: true
mode: INPUT_PULLUP
inverted: true
id: left_top_button
internal: true
on_multi_click:
# Short press: dismiss timer / toggle mute
- timing:
- ON for at least 50ms
- OFF for at least 50ms
then:
- if:
condition:
switch.is_on: timer_ringing
then:
- switch.turn_off: timer_ringing
else:
- switch.toggle: mute
# Long press (10s): factory reset
- timing:
- ON for at least 10s
then:
- button.press: factory_reset_btn
- platform: gpio
pin: GPIO21
name: Presence
id: radar_presence
device_class: occupancy
on_press:
- script.execute: screen_wake
- script.execute: screen_idle_timer
# --- Display backlight ---
output:
- platform: ledc
pin: GPIO47
id: backlight_output
light:
- platform: monochromatic
id: led
name: Screen
icon: "mdi:television"
entity_category: config
output: backlight_output
restore_mode: RESTORE_DEFAULT_ON
default_transition_length: 250ms
# --- Audio hardware ---
i2c:
- id: audio_bus
scl: GPIO18
sda: GPIO8
- id: sensor_bus
scl: GPIO40
sda: GPIO41
frequency: 100kHz
i2s_audio:
- id: i2s_audio_bus
i2s_lrclk_pin:
number: GPIO45
ignore_strapping_warning: true
i2s_bclk_pin: GPIO17
i2s_mclk_pin: GPIO2
audio_adc:
- platform: es7210
id: es7210_adc
i2c_id: audio_bus
bits_per_sample: 16bit
sample_rate: 16000
audio_dac:
- platform: es8311
id: es8311_dac
i2c_id: audio_bus
bits_per_sample: 16bit
sample_rate: 48000
microphone:
- platform: i2s_audio
id: box_mic
sample_rate: 16000
i2s_din_pin: GPIO16
bits_per_sample: 16bit
adc_type: external
speaker:
- platform: i2s_audio
id: box_speaker
i2s_dout_pin: GPIO15
dac_type: external
sample_rate: 48000
bits_per_sample: 16bit
channel: left
audio_dac: es8311_dac
buffer_duration: 100ms
media_player:
- platform: speaker
name: None
id: speaker_media_player
volume_min: 0.5
volume_max: 0.85
announcement_pipeline:
speaker: box_speaker
format: FLAC
sample_rate: 48000
num_channels: 1
files:
- id: timer_finished_sound
file: https://github.com/esphome/home-assistant-voice-pe/raw/dev/sounds/timer_finished.flac
on_announcement:
- if:
condition:
- microphone.is_capturing:
then:
- script.execute: stop_wake_word
- if:
condition:
- lambda: return id(wake_word_engine_location).current_option() == "In Home Assistant";
then:
- wait_until:
- not:
voice_assistant.is_running:
- if:
condition:
not:
voice_assistant.is_running:
then:
- lambda: id(voice_assistant_phase) = ${voice_assist_muted_phase_id};
- script.execute: draw_display
on_idle:
- if:
condition:
not:
voice_assistant.is_running:
then:
- script.execute: start_wake_word
- script.execute: set_idle_or_mute_phase
- script.execute: draw_display
# --- Wake word (on-device) ---
micro_wake_word:
id: mww
models:
- hey_jarvis
on_wake_word_detected:
- voice_assistant.start:
wake_word: !lambda return wake_word;
# --- Voice assistant ---
voice_assistant:
id: va
microphone: box_mic
media_player: speaker_media_player
micro_wake_word: mww
noise_suppression_level: 2
auto_gain: 31dBFS
volume_multiplier: 2.0
on_listening:
- lambda: id(voice_assistant_phase) = ${voice_assist_listening_phase_id};
- script.execute: draw_display
on_stt_vad_end:
- lambda: id(voice_assistant_phase) = ${voice_assist_thinking_phase_id};
- script.execute: draw_display
on_tts_start:
- lambda: id(voice_assistant_phase) = ${voice_assist_replying_phase_id};
- script.execute: draw_display
on_end:
- wait_until:
condition:
- media_player.is_announcing:
timeout: 0.5s
- wait_until:
- and:
- not:
media_player.is_announcing:
- not:
speaker.is_playing:
- if:
condition:
- lambda: return id(wake_word_engine_location).current_option() == "On device";
then:
- lambda: id(va).set_use_wake_word(false);
- micro_wake_word.start:
- script.execute: set_idle_or_mute_phase
- script.execute: draw_display
on_error:
- if:
condition:
lambda: return !id(init_in_progress);
then:
- lambda: id(voice_assistant_phase) = ${voice_assist_error_phase_id};
- script.execute: draw_display
- delay: 1s
- if:
condition:
switch.is_off: mute
then:
- lambda: id(voice_assistant_phase) = ${voice_assist_idle_phase_id};
else:
- lambda: id(voice_assistant_phase) = ${voice_assist_muted_phase_id};
- script.execute: draw_display
on_client_connected:
- lambda: id(init_in_progress) = false;
- script.execute: start_wake_word
- script.execute: set_idle_or_mute_phase
- script.execute: draw_display
on_client_disconnected:
- script.execute: stop_wake_word
- lambda: id(voice_assistant_phase) = ${voice_assist_not_ready_phase_id};
- script.execute: draw_display
on_timer_started:
- script.execute: draw_display
on_timer_cancelled:
- script.execute: draw_display
on_timer_updated:
- script.execute: draw_display
on_timer_tick:
- script.execute: draw_display
on_timer_finished:
- switch.turn_on: timer_ringing
- wait_until:
media_player.is_announcing:
- lambda: id(voice_assistant_phase) = ${voice_assist_timer_finished_phase_id};
- script.execute: draw_display
# --- Scripts ---
script:
- id: draw_display
then:
- if:
condition:
lambda: return !id(init_in_progress);
then:
- if:
condition:
wifi.connected:
then:
- if:
condition:
api.connected:
then:
- lambda: |
switch(id(voice_assistant_phase)) {
case ${voice_assist_listening_phase_id}:
id(screen_wake).execute();
id(s3_box_lcd).show_page(listening_page);
id(s3_box_lcd).update();
break;
case ${voice_assist_thinking_phase_id}:
id(screen_wake).execute();
id(s3_box_lcd).show_page(thinking_page);
id(s3_box_lcd).update();
break;
case ${voice_assist_replying_phase_id}:
id(screen_wake).execute();
id(s3_box_lcd).show_page(replying_page);
id(s3_box_lcd).update();
break;
case ${voice_assist_error_phase_id}:
id(screen_wake).execute();
id(s3_box_lcd).show_page(error_page);
id(s3_box_lcd).update();
break;
case ${voice_assist_muted_phase_id}:
id(s3_box_lcd).show_page(muted_page);
id(s3_box_lcd).update();
id(screen_idle_timer).execute();
break;
case ${voice_assist_not_ready_phase_id}:
id(s3_box_lcd).show_page(no_ha_page);
id(s3_box_lcd).update();
break;
case ${voice_assist_timer_finished_phase_id}:
id(screen_wake).execute();
id(s3_box_lcd).show_page(timer_finished_page);
id(s3_box_lcd).update();
break;
default:
id(s3_box_lcd).show_page(idle_page);
id(s3_box_lcd).update();
id(screen_idle_timer).execute();
}
else:
- display.page.show: no_ha_page
- component.update: s3_box_lcd
else:
- display.page.show: no_wifi_page
- component.update: s3_box_lcd
else:
- display.page.show: initializing_page
- component.update: s3_box_lcd
- id: fetch_first_active_timer
then:
- lambda: |
const auto &timers = id(va).get_timers();
auto output_timer = timers.begin()->second;
for (const auto &timer : timers) {
if (timer.second.is_active && timer.second.seconds_left <= output_timer.seconds_left) {
output_timer = timer.second;
}
}
id(global_first_active_timer) = output_timer;
- id: check_if_timers_active
then:
- lambda: |
const auto &timers = id(va).get_timers();
bool output = false;
for (const auto &timer : timers) {
if (timer.second.is_active) { output = true; }
}
id(global_is_timer_active) = output;
- id: fetch_first_timer
then:
- lambda: |
const auto &timers = id(va).get_timers();
auto output_timer = timers.begin()->second;
for (const auto &timer : timers) {
if (timer.second.seconds_left <= output_timer.seconds_left) {
output_timer = timer.second;
}
}
id(global_first_timer) = output_timer;
- id: check_if_timers
then:
- lambda: |
const auto &timers = id(va).get_timers();
bool output = false;
for (const auto &timer : timers) {
if (timer.second.is_active) { output = true; }
}
id(global_is_timer) = output;
- id: draw_timer_timeline
then:
- lambda: |
id(check_if_timers_active).execute();
id(check_if_timers).execute();
if (id(global_is_timer_active)){
id(fetch_first_active_timer).execute();
int active_pixels = round( 320 * id(global_first_active_timer).seconds_left / max(id(global_first_active_timer).total_seconds, static_cast<uint32_t>(1)) );
if (active_pixels > 0){
id(s3_box_lcd).filled_rectangle(0, 225, 320, 15, Color::WHITE);
id(s3_box_lcd).filled_rectangle(0, 226, active_pixels, 13, id(active_timer_color));
}
} else if (id(global_is_timer)){
id(fetch_first_timer).execute();
int active_pixels = round( 320 * id(global_first_timer).seconds_left / max(id(global_first_timer).total_seconds, static_cast<uint32_t>(1)));
if (active_pixels > 0){
id(s3_box_lcd).filled_rectangle(0, 225, 320, 15, Color::WHITE);
id(s3_box_lcd).filled_rectangle(0, 226, active_pixels, 13, id(paused_timer_color));
}
}
- id: draw_active_timer_widget
then:
- lambda: |
id(check_if_timers_active).execute();
if (id(global_is_timer_active)){
id(s3_box_lcd).filled_rectangle(80, 40, 160, 50, Color::WHITE);
id(s3_box_lcd).rectangle(80, 40, 160, 50, Color::BLACK);
id(fetch_first_active_timer).execute();
int hours_left = floor(id(global_first_active_timer).seconds_left / 3600);
int minutes_left = floor((id(global_first_active_timer).seconds_left - hours_left * 3600) / 60);
int seconds_left = id(global_first_active_timer).seconds_left - hours_left * 3600 - minutes_left * 60;
auto display_hours = (hours_left < 10 ? "0" : "") + std::to_string(hours_left);
auto display_minute = (minutes_left < 10 ? "0" : "") + std::to_string(minutes_left);
auto display_seconds = (seconds_left < 10 ? "0" : "") + std::to_string(seconds_left);
std::string display_string = "";
if (hours_left > 0) {
display_string = display_hours + ":" + display_minute;
} else {
display_string = display_minute + ":" + display_seconds;
}
id(s3_box_lcd).printf(120, 47, id(font_timer), Color::BLACK, "%s", display_string.c_str());
}
- id: start_wake_word
then:
- if:
condition:
and:
- not:
- voice_assistant.is_running:
- lambda: return id(wake_word_engine_location).current_option() == "On device";
then:
- lambda: id(va).set_use_wake_word(false);
- micro_wake_word.start:
- if:
condition:
and:
- not:
- voice_assistant.is_running:
- lambda: return id(wake_word_engine_location).current_option() == "In Home Assistant";
then:
- lambda: id(va).set_use_wake_word(true);
- voice_assistant.start_continuous:
- id: stop_wake_word
then:
- if:
condition:
lambda: return id(wake_word_engine_location).current_option() == "In Home Assistant";
then:
- lambda: id(va).set_use_wake_word(false);
- voice_assistant.stop:
- if:
condition:
lambda: return id(wake_word_engine_location).current_option() == "On device";
then:
- micro_wake_word.stop:
- id: set_idle_or_mute_phase
then:
- if:
condition:
switch.is_off: mute
then:
- lambda: id(voice_assistant_phase) = ${voice_assist_idle_phase_id};
else:
- lambda: id(voice_assistant_phase) = ${voice_assist_muted_phase_id};
- id: screen_idle_timer
mode: restart
then:
- delay: !lambda return id(screen_off_delay).state * 60000;
- light.turn_off: led
- id: screen_wake
mode: restart
then:
- if:
condition:
light.is_off: led
then:
- light.turn_on:
id: led
brightness: 100%
# --- Switches ---
switch:
- platform: gpio
name: Speaker Enable
pin:
number: GPIO46
ignore_strapping_warning: true
restore_mode: RESTORE_DEFAULT_ON
entity_category: config
disabled_by_default: true
- platform: at581x
at581x_id: radar
name: Radar RF
entity_category: config
- platform: template
name: Mute
id: mute
icon: "mdi:microphone-off"
optimistic: true
restore_mode: RESTORE_DEFAULT_OFF
entity_category: config
on_turn_off:
- microphone.unmute:
- lambda: id(voice_assistant_phase) = ${voice_assist_idle_phase_id};
- script.execute: draw_display
on_turn_on:
- microphone.mute:
- lambda: id(voice_assistant_phase) = ${voice_assist_muted_phase_id};
- script.execute: draw_display
- platform: template
id: timer_ringing
optimistic: true
internal: true
restore_mode: ALWAYS_OFF
on_turn_off:
- lambda: |-
id(speaker_media_player)
->make_call()
.set_command(media_player::MediaPlayerCommand::MEDIA_PLAYER_COMMAND_REPEAT_OFF)
.set_announcement(true)
.perform();
id(speaker_media_player)->set_playlist_delay_ms(speaker::AudioPipelineType::ANNOUNCEMENT, 0);
- media_player.stop:
announcement: true
on_turn_on:
- lambda: |-
id(speaker_media_player)
->make_call()
.set_command(media_player::MediaPlayerCommand::MEDIA_PLAYER_COMMAND_REPEAT_ONE)
.set_announcement(true)
.perform();
id(speaker_media_player)->set_playlist_delay_ms(speaker::AudioPipelineType::ANNOUNCEMENT, 1000);
- media_player.speaker.play_on_device_media_file:
media_file: timer_finished_sound
announcement: true
- delay: 15min
- switch.turn_off: timer_ringing
# --- Wake word engine location selector ---
select:
- platform: template
entity_category: config
name: Wake word engine location
id: wake_word_engine_location
icon: "mdi:account-voice"
optimistic: true
restore_value: true
options:
- In Home Assistant
- On device
initial_option: On device
on_value:
- if:
condition:
lambda: return !id(init_in_progress);
then:
- wait_until:
lambda: return id(voice_assistant_phase) == ${voice_assist_muted_phase_id} || id(voice_assistant_phase) == ${voice_assist_idle_phase_id};
- if:
condition:
lambda: return x == "In Home Assistant";
then:
- micro_wake_word.stop
- delay: 500ms
- if:
condition:
switch.is_off: mute
then:
- lambda: id(va).set_use_wake_word(true);
- voice_assistant.start_continuous:
- if:
condition:
lambda: return x == "On device";
then:
- lambda: id(va).set_use_wake_word(false);
- voice_assistant.stop
- delay: 500ms
- if:
condition:
switch.is_off: mute
then:
- micro_wake_word.start
# --- Screen idle timeout (minutes) ---
number:
- platform: template
name: Screen off delay
id: screen_off_delay
icon: "mdi:timer-outline"
entity_category: config
unit_of_measurement: min
optimistic: true
restore_value: true
min_value: 1
max_value: 60
step: 1
initial_value: 1
# --- Sensor dock (ESP32-S3-BOX-3-SENSOR) ---
sensor:
- platform: aht10
variant: AHT20
i2c_id: sensor_bus
temperature:
name: Temperature
filters:
- sliding_window_moving_average:
window_size: 5
send_every: 5
humidity:
name: Humidity
filters:
- sliding_window_moving_average:
window_size: 5
send_every: 5
update_interval: 30s
at581x:
i2c_id: sensor_bus
id: radar
# --- Global variables ---
globals:
- id: init_in_progress
type: bool
restore_value: false
initial_value: "true"
- id: voice_assistant_phase
type: int
restore_value: false
initial_value: ${voice_assist_not_ready_phase_id}
- id: global_first_active_timer
type: voice_assistant::Timer
restore_value: false
- id: global_is_timer_active
type: bool
restore_value: false
- id: global_first_timer
type: voice_assistant::Timer
restore_value: false
- id: global_is_timer
type: bool
restore_value: false
# --- Display images ---
image:
- file: ${error_illustration_file}
id: casita_error
resize: 320x240
type: RGB
transparency: alpha_channel
- file: ${idle_illustration_file}
id: casita_idle
resize: 320x240
type: RGB
transparency: alpha_channel
- file: ${listening_illustration_file}
id: casita_listening
resize: 320x240
type: RGB
transparency: alpha_channel
- file: ${thinking_illustration_file}
id: casita_thinking
resize: 320x240
type: RGB
transparency: alpha_channel
- file: ${replying_illustration_file}
id: casita_replying
resize: 320x240
type: RGB
transparency: alpha_channel
- file: ${timer_finished_illustration_file}
id: casita_timer_finished
resize: 320x240
type: RGB
transparency: alpha_channel
- file: ${loading_illustration_file}
id: casita_initializing
resize: 320x240
type: RGB
transparency: alpha_channel
- file: https://github.com/esphome/wake-word-voice-assistants/raw/main/error_box_illustrations/error-no-wifi.png
id: error_no_wifi
resize: 320x240
type: RGB
transparency: alpha_channel
- file: https://github.com/esphome/wake-word-voice-assistants/raw/main/error_box_illustrations/error-no-ha.png
id: error_no_ha
resize: 320x240
type: RGB
transparency: alpha_channel
# --- Fonts (timer widget only) ---
font:
- file:
type: gfonts
family: ${font_family}
weight: 300
id: font_timer
size: 30
glyphsets:
- ${font_glyphsets}
# --- Colors ---
color:
- id: idle_color
hex: ${idle_illustration_background_color}
- id: listening_color
hex: ${listening_illustration_background_color}
- id: thinking_color
hex: ${thinking_illustration_background_color}
- id: replying_color
hex: ${replying_illustration_background_color}
- id: loading_color
hex: ${loading_illustration_background_color}
- id: error_color
hex: ${error_illustration_background_color}
- id: active_timer_color
hex: "26ed3a"
- id: paused_timer_color
hex: "3b89e3"
# --- SPI + Display ---
spi:
- id: spi_bus
clk_pin: 7
mosi_pin: 6
display:
- platform: ili9xxx
id: s3_box_lcd
model: S3BOX
invert_colors: false
data_rate: 40MHz
cs_pin: 5
dc_pin: 4
reset_pin:
number: 48
inverted: true
update_interval: never
pages:
- id: idle_page
lambda: |-
it.fill(id(idle_color));
it.image((it.get_width() / 2), (it.get_height() / 2), id(casita_idle), ImageAlign::CENTER);
id(draw_timer_timeline).execute();
id(draw_active_timer_widget).execute();
- id: listening_page
lambda: |-
it.fill(id(listening_color));
it.image((it.get_width() / 2), (it.get_height() / 2), id(casita_listening), ImageAlign::CENTER);
id(draw_timer_timeline).execute();
- id: thinking_page
lambda: |-
it.fill(id(thinking_color));
it.image((it.get_width() / 2), (it.get_height() / 2), id(casita_thinking), ImageAlign::CENTER);
id(draw_timer_timeline).execute();
- id: replying_page
lambda: |-
it.fill(id(replying_color));
it.image((it.get_width() / 2), (it.get_height() / 2), id(casita_replying), ImageAlign::CENTER);
id(draw_timer_timeline).execute();
- id: timer_finished_page
lambda: |-
it.fill(id(idle_color));
it.image((it.get_width() / 2), (it.get_height() / 2), id(casita_timer_finished), ImageAlign::CENTER);
- id: error_page
lambda: |-
it.fill(id(error_color));
it.image((it.get_width() / 2), (it.get_height() / 2), id(casita_error), ImageAlign::CENTER);
- id: no_ha_page
lambda: |-
it.image((it.get_width() / 2), (it.get_height() / 2), id(error_no_ha), ImageAlign::CENTER);
- id: no_wifi_page
lambda: |-
it.image((it.get_width() / 2), (it.get_height() / 2), id(error_no_wifi), ImageAlign::CENTER);
- id: initializing_page
lambda: |-
it.fill(id(loading_color));
it.image((it.get_width() / 2), (it.get_height() / 2), id(casita_initializing), ImageAlign::CENTER);
- id: muted_page
lambda: |-
it.fill(Color::BLACK);
id(draw_timer_timeline).execute();
id(draw_active_timer_widget).execute();

Binary file not shown.

After

Width:  |  Height:  |  Size: 76 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 90 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 89 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 91 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 85 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 89 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 88 KiB

View File

@@ -1,76 +1,177 @@
#!/usr/bin/env bash
# homeai-esp32/setup.sh — P6: ESPHome firmware for ESP32-S3-BOX-3
#
# Components:
# - ESPHomefirmware build + flash tool
# - base.yaml — shared device config
# - voice.yaml — Wyoming Satellite + microWakeWord
# - display.yaml — LVGL animated face
# - Per-room configs — s3-box-living-room.yaml, etc.
# Usage:
# ./setup.sh check environment + validate config
# ./setup.sh flash — compile + flash via USB (first time)
# ./setup.sh ota — compile + flash via OTA (wireless)
# ./setup.sh logs — stream device logs
# ./setup.sh validate — validate YAML without compiling
#
# Prerequisites:
# - P1 (homeai-infra) — Home Assistant running
# - P3 (homeai-voice) — Wyoming STT/TTS running (ports 10300/10301)
# - Python 3.10+
# - USB-C cable for first flash (subsequent updates via OTA)
# - On Linux: ensure user is in the dialout group for USB access
# - ~/homeai-esphome-env — Python 3.12 venv with ESPHome
# - Home Assistant running on 10.0.0.199
# - Wyoming STT/TTS running on Mac Mini (ports 10300/10301)
set -euo pipefail
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
REPO_DIR="$(cd "${SCRIPT_DIR}/.." && pwd)"
source "${REPO_DIR}/scripts/common.sh"
ESPHOME_VENV="${HOME}/homeai-esphome-env"
ESPHOME="${ESPHOME_VENV}/bin/esphome"
ESPHOME_DIR="${SCRIPT_DIR}/esphome"
DEFAULT_CONFIG="${ESPHOME_DIR}/homeai-living-room.yaml"
log_section "P6: ESP32 Firmware (ESPHome)"
detect_platform
# Colors
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
BLUE='\033[0;34m'
NC='\033[0m'
# ─── Prerequisite check ────────────────────────────────────────────────────────
log_info "Checking prerequisites..."
log_info() { echo -e "${BLUE}[INFO]${NC} $*"; }
log_ok() { echo -e "${GREEN}[OK]${NC} $*"; }
log_warn() { echo -e "${YELLOW}[WARN]${NC} $*"; }
log_error() { echo -e "${RED}[ERROR]${NC} $*"; }
if ! command_exists python3; then
log_warn "python3 not found — required for ESPHome"
fi
# ─── Environment checks ──────────────────────────────────────────────────────
if ! command_exists esphome; then
log_info "ESPHome not installed. To install: pip install esphome"
fi
check_env() {
local ok=true
if [[ "$OS_TYPE" == "linux" ]]; then
if ! groups "$USER" | grep -q dialout; then
log_warn "User '$USER' not in 'dialout' group — USB flashing may fail."
log_warn "Fix: sudo usermod -aG dialout $USER (then log out and back in)"
log_info "Checking environment..."
# ESPHome venv
if [[ -x "${ESPHOME}" ]]; then
local version
version=$("${ESPHOME}" version 2>/dev/null)
log_ok "ESPHome: ${version}"
else
log_error "ESPHome not found at ${ESPHOME}"
echo " Install: /opt/homebrew/opt/python@3.12/bin/python3.12 -m venv ${ESPHOME_VENV}"
echo " ${ESPHOME_VENV}/bin/pip install 'esphome>=2025.5.0'"
ok=false
fi
fi
# Check P3 dependency
if ! curl -sf http://localhost:8123 -o /dev/null 2>/dev/null; then
log_warn "Home Assistant (P1) not reachable — ESP32 units won't auto-discover"
fi
# secrets.yaml
if [[ -f "${ESPHOME_DIR}/secrets.yaml" ]]; then
if grep -q "YOUR_" "${ESPHOME_DIR}/secrets.yaml" 2>/dev/null; then
log_warn "secrets.yaml contains placeholder values — edit before flashing"
ok=false
else
log_ok "secrets.yaml configured"
fi
else
log_error "secrets.yaml not found at ${ESPHOME_DIR}/secrets.yaml"
ok=false
fi
# ─── TODO: Implementation ──────────────────────────────────────────────────────
cat <<'EOF'
# Config file
if [[ -f "${DEFAULT_CONFIG}" ]]; then
log_ok "Config: $(basename "${DEFAULT_CONFIG}")"
else
log_error "Config not found: ${DEFAULT_CONFIG}"
ok=false
fi
┌─────────────────────────────────────────────────────────────────┐
│ P6: homeai-esp32 — NOT YET IMPLEMENTED │
│ │
│ Implementation steps: │
│ 1. pip install esphome │
2. Create esphome/secrets.yaml (gitignored) │
│ 3. Create esphome/base.yaml (WiFi, API, OTA) │
4. Create esphome/voice.yaml (Wyoming Satellite, wakeword) │
│ 5. Create esphome/display.yaml (LVGL face, 5 states) │
│ 6. Create esphome/animations.yaml (face state scripts) │
│ 7. Create per-room configs (s3-box-living-room.yaml, etc.) │
│ 8. First flash via USB: esphome run esphome/<room>.yaml │
│ 9. Subsequent OTA: esphome upload esphome/<room>.yaml │
│ 10. Add to Home Assistant → assign Wyoming voice pipeline │
│ │
│ Quick flash (once esphome/ is ready): │
│ esphome run esphome/s3-box-living-room.yaml │
│ esphome logs esphome/s3-box-living-room.yaml │
└─────────────────────────────────────────────────────────────────┘
# Illustrations
local illust_dir="${ESPHOME_DIR}/illustrations"
local illust_count
illust_count=$(find "${illust_dir}" -name "*.png" 2>/dev/null | wc -l | tr -d ' ')
if [[ "${illust_count}" -ge 7 ]]; then
log_ok "Illustrations: ${illust_count} PNGs in illustrations/"
else
log_warn "Missing illustrations (found ${illust_count}, need 7)"
fi
EOF
# Wyoming services on Mac Mini
if curl -sf "http://localhost:10300" -o /dev/null 2>/dev/null || nc -z localhost 10300 2>/dev/null; then
log_ok "Wyoming STT (port 10300) reachable"
else
log_warn "Wyoming STT (port 10300) not reachable"
fi
log_info "P6 is not yet implemented. See homeai-esp32/PLAN.md for details."
exit 0
if curl -sf "http://localhost:10301" -o /dev/null 2>/dev/null || nc -z localhost 10301 2>/dev/null; then
log_ok "Wyoming TTS (port 10301) reachable"
else
log_warn "Wyoming TTS (port 10301) not reachable"
fi
# Home Assistant
if curl -sk "https://10.0.0.199:8123" -o /dev/null 2>/dev/null; then
log_ok "Home Assistant (10.0.0.199:8123) reachable"
else
log_warn "Home Assistant not reachable — ESP32 won't be able to connect"
fi
if $ok; then
log_ok "Environment ready"
else
log_warn "Some issues found — fix before flashing"
fi
}
# ─── Commands ─────────────────────────────────────────────────────────────────
cmd_flash() {
local config="${1:-${DEFAULT_CONFIG}}"
log_info "Compiling + flashing via USB: $(basename "${config}")"
log_info "First compile downloads ESP-IDF toolchain (~500MB), takes 5-10 min..."
cd "${ESPHOME_DIR}"
"${ESPHOME}" run "$(basename "${config}")"
}
cmd_ota() {
local config="${1:-${DEFAULT_CONFIG}}"
log_info "Compiling + OTA upload: $(basename "${config}")"
cd "${ESPHOME_DIR}"
"${ESPHOME}" run "$(basename "${config}")"
}
cmd_logs() {
local config="${1:-${DEFAULT_CONFIG}}"
log_info "Streaming logs for: $(basename "${config}")"
cd "${ESPHOME_DIR}"
"${ESPHOME}" logs "$(basename "${config}")"
}
cmd_validate() {
local config="${1:-${DEFAULT_CONFIG}}"
log_info "Validating: $(basename "${config}")"
cd "${ESPHOME_DIR}"
"${ESPHOME}" config "$(basename "${config}")"
log_ok "Config valid"
}
# ─── Main ─────────────────────────────────────────────────────────────────────
case "${1:-}" in
flash)
check_env
echo ""
cmd_flash "${2:-}"
;;
ota)
check_env
echo ""
cmd_ota "${2:-}"
;;
logs)
cmd_logs "${2:-}"
;;
validate)
cmd_validate "${2:-}"
;;
*)
check_env
echo ""
echo "Usage: $0 {flash|ota|logs|validate} [config.yaml]"
echo ""
echo " flash Compile + flash via USB (first time)"
echo " ota Compile + flash via OTA (wireless, after first flash)"
echo " logs Stream device logs"
echo " validate Validate YAML config without compiling"
echo ""
echo "Default config: $(basename "${DEFAULT_CONFIG}")"
;;
esac

219
homeai-images/API_GUIDE.md Normal file
View File

@@ -0,0 +1,219 @@
# GAZE REST API Guide
## Setup
1. Open **Settings** in the GAZE web UI
2. Scroll to **REST API Key** and click **Regenerate**
3. Copy the key — you'll need it for all API requests
## Authentication
Every request must include your API key via one of:
- **Header (recommended):** `X-API-Key: <your-key>`
- **Query parameter:** `?api_key=<your-key>`
Responses for auth failures:
| Status | Meaning |
|--------|---------|
| `401` | Missing API key |
| `403` | Invalid API key |
## Endpoints
### List Presets
```
GET /api/v1/presets
```
Returns all available presets.
**Response:**
```json
{
"presets": [
{
"preset_id": "example_01",
"slug": "example_01",
"name": "Example Preset",
"has_cover": true
}
]
}
```
### Generate Image
```
POST /api/v1/generate/<preset_slug>
```
Queue one or more image generations using a preset's configuration. All body parameters are optional — when omitted, the preset's own settings are used.
**Request body (JSON):**
| Field | Type | Default | Description |
|-------|------|---------|-------------|
| `count` | int | `1` | Number of images to generate (120) |
| `checkpoint` | string | — | Override checkpoint path (e.g. `"Illustrious/model.safetensors"`) |
| `extra_positive` | string | `""` | Additional positive prompt tags appended to the generated prompt |
| `extra_negative` | string | `""` | Additional negative prompt tags |
| `seed` | int | random | Fixed seed for reproducible generation |
| `width` | int | — | Output width in pixels (must provide both width and height) |
| `height` | int | — | Output height in pixels (must provide both width and height) |
**Response (202):**
```json
{
"jobs": [
{ "job_id": "783f0268-ba85-4426-8ca2-6393c844c887", "status": "queued" }
]
}
```
**Errors:**
| Status | Cause |
|--------|-------|
| `400` | Invalid parameters (bad count, seed, or mismatched width/height) |
| `404` | Preset slug not found |
| `500` | Internal generation error |
### Check Job Status
```
GET /api/v1/job/<job_id>
```
Poll this endpoint to track generation progress.
**Response:**
```json
{
"id": "783f0268-ba85-4426-8ca2-6393c844c887",
"label": "Preset: Example Preset preview",
"status": "done",
"error": null,
"result": {
"image_url": "/static/uploads/presets/example_01/gen_1773601346.png",
"relative_path": "presets/example_01/gen_1773601346.png",
"seed": 927640517599332
}
}
```
**Job statuses:**
| Status | Meaning |
|--------|---------|
| `pending` | Waiting in queue |
| `processing` | Currently generating |
| `done` | Complete — `result` contains image info |
| `failed` | Error occurred — check `error` field |
The `result` object is only present when status is `done`. Use `seed` from the result to reproduce the exact same image later.
**Retrieving the image:** The `image_url` is a path relative to the server root. Fetch it directly:
```
GET http://<host>:5782/static/uploads/presets/example_01/gen_1773601346.png
```
Image retrieval does not require authentication.
## Examples
### Generate a single image and wait for it
```bash
API_KEY="your-key-here"
HOST="http://localhost:5782"
# Queue generation
JOB_ID=$(curl -s -X POST \
-H "X-API-Key: $API_KEY" \
-H "Content-Type: application/json" \
-d '{}' \
"$HOST/api/v1/generate/example_01" | python3 -c "import sys,json; print(json.load(sys.stdin)['jobs'][0]['job_id'])")
echo "Job: $JOB_ID"
# Poll until done
while true; do
RESULT=$(curl -s -H "X-API-Key: $API_KEY" "$HOST/api/v1/job/$JOB_ID")
STATUS=$(echo "$RESULT" | python3 -c "import sys,json; print(json.load(sys.stdin)['status'])")
echo "Status: $STATUS"
if [ "$STATUS" = "done" ] || [ "$STATUS" = "failed" ]; then
echo "$RESULT" | python3 -m json.tool
break
fi
sleep 5
done
```
### Generate 3 images with extra prompts
```bash
curl -X POST \
-H "X-API-Key: $API_KEY" \
-H "Content-Type: application/json" \
-d '{
"count": 3,
"extra_positive": "smiling, outdoors",
"extra_negative": "blurry"
}' \
"$HOST/api/v1/generate/example_01"
```
### Reproduce a specific image
```bash
curl -X POST \
-H "X-API-Key: $API_KEY" \
-H "Content-Type: application/json" \
-d '{"seed": 927640517599332}' \
"$HOST/api/v1/generate/example_01"
```
### Python example
```python
import requests
import time
HOST = "http://localhost:5782"
API_KEY = "your-key-here"
HEADERS = {"X-API-Key": API_KEY, "Content-Type": "application/json"}
# List presets
presets = requests.get(f"{HOST}/api/v1/presets", headers=HEADERS).json()
print(f"Available presets: {[p['name'] for p in presets['presets']]}")
# Generate
resp = requests.post(
f"{HOST}/api/v1/generate/{presets['presets'][0]['slug']}",
headers=HEADERS,
json={"count": 1},
).json()
job_id = resp["jobs"][0]["job_id"]
print(f"Queued job: {job_id}")
# Poll
while True:
status = requests.get(f"{HOST}/api/v1/job/{job_id}", headers=HEADERS).json()
print(f"Status: {status['status']}")
if status["status"] in ("done", "failed"):
break
time.sleep(5)
if status["status"] == "done":
image_url = f"{HOST}{status['result']['image_url']}"
print(f"Image: {image_url}")
print(f"Seed: {status['result']['seed']}")
```

View File

@@ -85,6 +85,7 @@ services:
- homeai.service=n8n
- homeai.url=http://localhost:5678
networks:
homeai:
external: true

View File

@@ -12,17 +12,27 @@
<string>/Users/aodhan/gitea/homeai/homeai-llm/scripts/preload-models.sh</string>
</array>
<key>EnvironmentVariables</key>
<dict>
<!-- Override to change which medium model stays warm -->
<key>HOMEAI_MEDIUM_MODEL</key>
<string>qwen3.5:35b-a3b</string>
</dict>
<key>RunAtLoad</key>
<true/>
<key>KeepAlive</key>
<true/>
<key>StandardOutPath</key>
<string>/tmp/homeai-preload-models.log</string>
<key>StandardErrorPath</key>
<string>/tmp/homeai-preload-models-error.log</string>
<!-- Delay 15s to let Ollama start first -->
<!-- If the script exits/crashes, wait 30s before restarting -->
<key>ThrottleInterval</key>
<integer>15</integer>
<integer>30</integer>
</dict>
</plist>

View File

@@ -1,19 +1,73 @@
#!/bin/bash
# Pre-load voice pipeline models into Ollama with infinite keep_alive.
# Run after Ollama starts (called by launchd or manually).
# Keep voice pipeline models warm in Ollama VRAM.
# Runs as a loop — checks every 5 minutes, re-pins any model that got evicted.
# Only pins lightweight/MoE models — large dense models (70B) use default expiry.
OLLAMA_URL="http://localhost:11434"
CHECK_INTERVAL=300 # seconds between checks
# Wait for Ollama to be ready
for i in $(seq 1 30); do
curl -sf "$OLLAMA_URL/api/tags" > /dev/null 2>&1 && break
sleep 2
# Medium model can be overridden via env var (e.g. by persona config)
HOMEAI_MEDIUM_MODEL="${HOMEAI_MEDIUM_MODEL:-qwen3.5:35b-a3b}"
# Models to keep warm: "name|description"
MODELS=(
"qwen2.5:7b|small (4.7GB) — fast fallback"
"${HOMEAI_MEDIUM_MODEL}|medium — persona default"
)
wait_for_ollama() {
for i in $(seq 1 30); do
curl -sf "$OLLAMA_URL/api/tags" > /dev/null 2>&1 && return 0
sleep 2
done
return 1
}
is_model_loaded() {
local model="$1"
curl -sf "$OLLAMA_URL/api/ps" 2>/dev/null \
| python3 -c "
import json, sys
data = json.load(sys.stdin)
names = [m['name'] for m in data.get('models', [])]
sys.exit(0 if '$model' in names else 1)
" 2>/dev/null
}
pin_model() {
local model="$1"
local desc="$2"
if is_model_loaded "$model"; then
echo "[keepwarm] $model already loaded — skipping"
return 0
fi
echo "[keepwarm] Loading $model ($desc) with keep_alive=-1..."
curl -sf "$OLLAMA_URL/api/generate" \
-d "{\"model\":\"$model\",\"prompt\":\"ready\",\"stream\":false,\"keep_alive\":-1,\"options\":{\"num_ctx\":512}}" \
> /dev/null 2>&1
if [ $? -eq 0 ]; then
echo "[keepwarm] $model pinned in VRAM"
else
echo "[keepwarm] ERROR: failed to load $model"
fi
}
# --- Main loop ---
echo "[keepwarm] Starting model keep-warm daemon (interval: ${CHECK_INTERVAL}s)"
# Initial wait for Ollama
if ! wait_for_ollama; then
echo "[keepwarm] ERROR: Ollama not reachable after 60s, exiting"
exit 1
fi
echo "[keepwarm] Ollama is online"
while true; do
for entry in "${MODELS[@]}"; do
IFS='|' read -r model desc <<< "$entry"
pin_model "$model" "$desc"
done
sleep "$CHECK_INTERVAL"
done
# Pin qwen3.5:35b-a3b (MoE, 38.7GB VRAM, voice pipeline default)
echo "[preload] Loading qwen3.5:35b-a3b with keep_alive=-1..."
curl -sf "$OLLAMA_URL/api/generate" \
-d '{"model":"qwen3.5:35b-a3b","prompt":"ready","stream":false,"keep_alive":-1,"options":{"num_ctx":512}}' \
> /dev/null 2>&1
echo "[preload] qwen3.5:35b-a3b pinned in memory"

159
homeai-rpi/deploy.sh Executable file
View File

@@ -0,0 +1,159 @@
#!/usr/bin/env bash
# homeai-rpi/deploy.sh — Deploy/manage Wyoming Satellite on Raspberry Pi from Mac Mini
#
# Usage:
# ./deploy.sh — full setup (push + install on Pi)
# ./deploy.sh --status — check satellite status
# ./deploy.sh --restart — restart satellite service
# ./deploy.sh --logs — tail satellite logs
# ./deploy.sh --test-audio — record 3s from mic, play back through speaker
# ./deploy.sh --update — update Python packages only
set -euo pipefail
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
# ─── Pi connection ──────────────────────────────────────────────────────────
PI_HOST="SELBINA.local"
PI_USER="aodhan"
PI_SSH="${PI_USER}@${PI_HOST}"
# Colors
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
BLUE='\033[0;34m'
CYAN='\033[0;36m'
NC='\033[0m'
log_info() { echo -e "${BLUE}[INFO]${NC} $*"; }
log_ok() { echo -e "${GREEN}[OK]${NC} $*"; }
log_warn() { echo -e "${YELLOW}[WARN]${NC} $*"; }
log_error() { echo -e "${RED}[ERROR]${NC} $*"; exit 1; }
log_step() { echo -e "${CYAN}[STEP]${NC} $*"; }
# ─── SSH helpers ────────────────────────────────────────────────────────────
pi_ssh() {
ssh -o ConnectTimeout=5 -o StrictHostKeyChecking=accept-new "${PI_SSH}" "$@"
}
pi_scp() {
scp -o ConnectTimeout=5 -o StrictHostKeyChecking=accept-new "$@"
}
check_connectivity() {
log_step "Checking connectivity to ${PI_HOST}..."
if ! ping -c 1 -t 3 "${PI_HOST}" &>/dev/null; then
log_error "Cannot reach ${PI_HOST}. Is the Pi on?"
fi
if ! pi_ssh "echo ok" &>/dev/null; then
log_error "SSH to ${PI_SSH} failed. Set up SSH keys:
ssh-copy-id ${PI_SSH}"
fi
log_ok "Connected to ${PI_HOST}"
}
# ─── Commands ───────────────────────────────────────────────────────────────
cmd_setup() {
check_connectivity
log_step "Pushing setup script to Pi..."
pi_scp "${SCRIPT_DIR}/setup.sh" "${PI_SSH}:~/homeai-satellite-setup.sh"
log_step "Running setup on Pi..."
pi_ssh "chmod +x ~/homeai-satellite-setup.sh && ~/homeai-satellite-setup.sh"
log_ok "Setup complete!"
}
cmd_status() {
check_connectivity
log_step "Satellite status:"
pi_ssh "systemctl status homeai-satellite.service --no-pager" || true
}
cmd_restart() {
check_connectivity
log_step "Restarting satellite..."
pi_ssh "sudo systemctl restart homeai-satellite.service"
sleep 2
pi_ssh "systemctl is-active homeai-satellite.service" && log_ok "Satellite running" || log_warn "Satellite not active"
}
cmd_logs() {
check_connectivity
log_info "Tailing satellite logs (Ctrl+C to stop)..."
pi_ssh "journalctl -u homeai-satellite.service -f --no-hostname"
}
cmd_test_audio() {
check_connectivity
log_step "Recording 3 seconds from mic..."
pi_ssh "arecord -D plughw:2,0 -d 3 -f S16_LE -r 16000 -c 1 /tmp/homeai-test.wav 2>/dev/null"
log_step "Playing back through speaker..."
pi_ssh "aplay -D plughw:2,0 /tmp/homeai-test.wav 2>/dev/null"
log_ok "Audio test complete. Did you hear yourself?"
}
cmd_update() {
check_connectivity
log_step "Updating Python packages on Pi..."
pi_ssh "source ~/homeai-satellite/venv/bin/activate && pip install --upgrade wyoming-satellite openwakeword -q"
log_step "Pushing latest scripts..."
pi_scp "${SCRIPT_DIR}/satellite_wrapper.py" "${PI_SSH}:~/homeai-satellite/satellite_wrapper.py"
pi_ssh "sudo systemctl restart homeai-satellite.service"
log_ok "Updated and restarted"
}
cmd_push_wrapper() {
check_connectivity
log_step "Pushing satellite_wrapper.py..."
pi_scp "${SCRIPT_DIR}/satellite_wrapper.py" "${PI_SSH}:~/homeai-satellite/satellite_wrapper.py"
log_step "Restarting satellite..."
pi_ssh "sudo systemctl restart homeai-satellite.service"
sleep 2
pi_ssh "systemctl is-active homeai-satellite.service" && log_ok "Satellite running" || log_warn "Satellite not active — check logs"
}
cmd_test_logs() {
check_connectivity
log_info "Filtered satellite logs — key events only (Ctrl+C to stop)..."
pi_ssh "journalctl -u homeai-satellite.service -f --no-hostname" \
| grep --line-buffered -iE \
'Waiting for wake|Streaming audio|transcript|synthesize|Speaker active|unmute|_writer|timeout|error|Error|Wake word detected|re-arming|resetting'
}
# ─── Main ───────────────────────────────────────────────────────────────────
case "${1:-}" in
--status) cmd_status ;;
--restart) cmd_restart ;;
--logs) cmd_logs ;;
--test-audio) cmd_test_audio ;;
--test-logs) cmd_test_logs ;;
--update) cmd_update ;;
--push-wrapper) cmd_push_wrapper ;;
--help|-h)
echo "Usage: $0 [command]"
echo ""
echo "Commands:"
echo " (none) Full setup — push and install satellite on Pi"
echo " --status Check satellite service status"
echo " --restart Restart satellite service"
echo " --logs Tail satellite logs (live, all)"
echo " --test-logs Tail filtered logs (key events only)"
echo " --test-audio Record 3s from mic, play back on speaker"
echo " --push-wrapper Push satellite_wrapper.py and restart (fast iteration)"
echo " --update Update packages and restart"
echo " --help Show this help"
echo ""
echo "Pi: ${PI_SSH} (${PI_HOST})"
;;
"") cmd_setup ;;
*) log_error "Unknown command: $1. Use --help for usage." ;;
esac

View File

@@ -0,0 +1,203 @@
#!/usr/bin/env python3
"""Wyoming Satellite wrapper — echo suppression, writer resilience, streaming timeout.
Monkey-patches WakeStreamingSatellite to fix three compounding bugs that cause
the satellite to freeze after the first voice command:
1. TTS Echo: Mic picks up speaker audio → false wake word trigger → Whisper
hallucinates on silence. Fix: mute mic→wake forwarding while speaker is active.
2. Server Writer Race: HA disconnects after first command, _writer becomes None.
If wake word fires before HA reconnects, _send_run_pipeline() silently drops
the event → satellite stuck in is_streaming=True forever.
Fix: check _writer before entering streaming mode; re-arm wake if no server.
3. No Streaming Timeout: Once stuck in streaming mode, there's no recovery.
Fix: auto-reset after 30s if no Transcript arrives.
4. Error events don't reset streaming state in upstream code.
Fix: reset is_streaming on Error events from server.
Usage: python3 satellite_wrapper.py <same args as wyoming_satellite>
"""
import asyncio
import logging
import time
from wyoming.audio import AudioChunk, AudioStart, AudioStop
from wyoming.error import Error
from wyoming.wake import Detection
from wyoming_satellite.satellite import WakeStreamingSatellite
_LOGGER = logging.getLogger()
# ─── Tuning constants ────────────────────────────────────────────────────────
# How long to keep wake muted after the last AudioStop from the server.
# Must be long enough for sox→aplay buffer to drain (~1-2s) plus audio decay.
_GRACE_SECONDS = 5.0
# Safety valve — unmute even if no AudioStop arrives (e.g. long TTS response).
_MAX_MUTE_SECONDS = 45.0
# Max time in streaming mode without receiving a Transcript or Error.
# Prevents permanent freeze if server never responds.
_STREAMING_TIMEOUT = 30.0
# ─── Save original methods ───────────────────────────────────────────────────
_orig_event_from_server = WakeStreamingSatellite.event_from_server
_orig_event_from_mic = WakeStreamingSatellite.event_from_mic
_orig_event_from_wake = WakeStreamingSatellite.event_from_wake
_orig_trigger_detection = WakeStreamingSatellite.trigger_detection
_orig_trigger_transcript = WakeStreamingSatellite.trigger_transcript
# ─── Patch A: Mute wake on awake.wav ─────────────────────────────────────────
async def _patched_trigger_detection(self, detection):
"""Mute wake word detection when awake.wav starts playing."""
self._speaker_mute_start = time.monotonic()
self._speaker_active = True
_LOGGER.debug("Speaker active (awake.wav) — wake detection muted")
await _orig_trigger_detection(self, detection)
# ─── Patch B: Mute wake on done.wav ──────────────────────────────────────────
async def _patched_trigger_transcript(self, transcript):
"""Keep muted through done.wav playback."""
self._speaker_active = True
_LOGGER.debug("Speaker active (done.wav) — wake detection muted")
await _orig_trigger_transcript(self, transcript)
# ─── Patch C: Echo tracking + error recovery ─────────────────────────────────
async def _patched_event_from_server(self, event):
"""Track TTS audio for echo suppression; reset streaming on errors."""
# Echo suppression: track when speaker is active
if AudioStart.is_type(event.type):
self._speaker_active = True
self._speaker_mute_start = time.monotonic()
_LOGGER.debug("Speaker active (TTS) — wake detection muted")
elif AudioStop.is_type(event.type):
self._speaker_unmute_at = time.monotonic() + _GRACE_SECONDS
_LOGGER.debug(
"TTS finished — will unmute wake in %.1fs", _GRACE_SECONDS
)
# Error recovery: reset streaming state if server reports an error
if Error.is_type(event.type) and self.is_streaming:
_LOGGER.warning("Error from server while streaming — resetting")
self.is_streaming = False
# Call original handler (plays done.wav, forwards TTS audio, etc.)
await _orig_event_from_server(self, event)
# After original handler: if Error arrived, re-arm wake detection
if Error.is_type(event.type) and not self.is_streaming:
await self.trigger_streaming_stop()
await self._send_wake_detect()
_LOGGER.info("Waiting for wake word (after error)")
# ─── Patch D: Echo suppression + streaming timeout ───────────────────────────
async def _patched_event_from_mic(self, event, audio_bytes=None):
"""Drop mic audio during speaker playback; timeout stuck streaming."""
# --- Streaming timeout ---
if self.is_streaming:
elapsed = time.monotonic() - getattr(self, "_streaming_start_time", 0)
if elapsed > _STREAMING_TIMEOUT:
_LOGGER.warning(
"Streaming timeout (%.0fs) — no Transcript received, resetting",
elapsed,
)
self.is_streaming = False
# Tell server we're done sending audio
await self.event_to_server(AudioStop().event())
await self.trigger_streaming_stop()
await self._send_wake_detect()
_LOGGER.info("Waiting for wake word (after timeout)")
return
# --- Echo suppression ---
if getattr(self, "_speaker_active", False) and not self.is_streaming:
now = time.monotonic()
# Check if grace period has elapsed after AudioStop
unmute_at = getattr(self, "_speaker_unmute_at", None)
if unmute_at and now >= unmute_at:
self._speaker_active = False
self._speaker_unmute_at = None
_LOGGER.debug("Wake detection unmuted (grace period elapsed)")
# Safety valve — don't stay muted forever
elif now - getattr(self, "_speaker_mute_start", now) > _MAX_MUTE_SECONDS:
self._speaker_active = False
self._speaker_unmute_at = None
_LOGGER.warning("Wake detection force-unmuted (max mute timeout)")
elif AudioChunk.is_type(event.type):
# Drop this mic chunk — don't feed speaker audio to wake word
return
await _orig_event_from_mic(self, event, audio_bytes)
# ─── Patch E: Writer check before streaming (THE CRITICAL FIX) ───────────────
async def _patched_event_from_wake(self, event):
"""Check server connection before entering streaming mode."""
if self.is_streaming:
return
if Detection.is_type(event.type):
# THE FIX: If no server connection, don't enter streaming mode.
# Without this, _send_run_pipeline() silently drops the RunPipeline
# event, and the satellite is stuck in is_streaming=True forever.
if self._writer is None:
_LOGGER.warning(
"Wake word detected but no server connection — re-arming"
)
await self._send_wake_detect()
return
self.is_streaming = True
self._streaming_start_time = time.monotonic()
_LOGGER.debug("Streaming audio")
await self._send_run_pipeline()
await self.forward_event(event)
await self.trigger_detection(Detection.from_event(event))
await self.trigger_streaming_start()
# ─── Apply patches ───────────────────────────────────────────────────────────
WakeStreamingSatellite.event_from_server = _patched_event_from_server
WakeStreamingSatellite.event_from_mic = _patched_event_from_mic
WakeStreamingSatellite.event_from_wake = _patched_event_from_wake
WakeStreamingSatellite.trigger_detection = _patched_trigger_detection
WakeStreamingSatellite.trigger_transcript = _patched_trigger_transcript
# Instance attributes (set as class defaults so they exist before __init__)
WakeStreamingSatellite._speaker_active = False
WakeStreamingSatellite._speaker_unmute_at = None
WakeStreamingSatellite._speaker_mute_start = 0.0
WakeStreamingSatellite._streaming_start_time = 0.0
# ─── Run the original main ───────────────────────────────────────────────────
if __name__ == "__main__":
from wyoming_satellite.__main__ import main
try:
asyncio.run(main())
except KeyboardInterrupt:
pass

527
homeai-rpi/setup.sh Executable file
View File

@@ -0,0 +1,527 @@
#!/usr/bin/env bash
# homeai-rpi/setup.sh — Bootstrap a Raspberry Pi as a Wyoming Satellite
#
# Run this ON the Pi (or push via deploy.sh from Mac Mini):
# curl -sL http://10.0.0.101:3000/aodhan/homeai/raw/branch/main/homeai-rpi/setup.sh | bash
# — or —
# ./setup.sh
#
# Prerequisites:
# - Raspberry Pi 5 with Raspberry Pi OS (Bookworm)
# - ReSpeaker 2-Mics pHAT installed and driver loaded (card shows in aplay -l)
# - Network connectivity to Mac Mini (10.0.0.101)
set -euo pipefail
# ─── Configuration ──────────────────────────────────────────────────────────
SATELLITE_NAME="homeai-kitchen"
SATELLITE_AREA="Kitchen"
MAC_MINI_IP="10.0.0.101"
# ReSpeaker 2-Mics pHAT — card 2 on Pi 5
# Using plughw for automatic format conversion (sample rate, channels)
MIC_DEVICE="plughw:2,0"
SPK_DEVICE="plughw:2,0"
# Wyoming satellite port (unique per satellite if running multiple)
SATELLITE_PORT="10700"
# Directories
INSTALL_DIR="${HOME}/homeai-satellite"
VENV_DIR="${INSTALL_DIR}/venv"
SOUNDS_DIR="${INSTALL_DIR}/sounds"
# Colors
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
BLUE='\033[0;34m'
CYAN='\033[0;36m'
NC='\033[0m'
log_info() { echo -e "${BLUE}[INFO]${NC} $*"; }
log_ok() { echo -e "${GREEN}[OK]${NC} $*"; }
log_warn() { echo -e "${YELLOW}[WARN]${NC} $*"; }
log_error() { echo -e "${RED}[ERROR]${NC} $*"; exit 1; }
log_step() { echo -e "${CYAN}[STEP]${NC} $*"; }
# ─── Preflight checks ──────────────────────────────────────────────────────
log_step "Preflight checks..."
# Check we're on a Pi
if ! grep -qi "raspberry\|bcm" /proc/cpuinfo 2>/dev/null; then
log_warn "This doesn't look like a Raspberry Pi — proceeding anyway"
fi
# Check ReSpeaker is available
if ! aplay -l 2>/dev/null | grep -q "seeed-2mic-voicecard"; then
log_error "ReSpeaker 2-Mics pHAT not found in aplay -l. Is the driver loaded?"
fi
log_ok "ReSpeaker 2-Mics pHAT detected"
# Check Python 3
if ! command -v python3 &>/dev/null; then
log_error "python3 not found. Install with: sudo apt install python3 python3-venv python3-pip"
fi
log_ok "Python $(python3 --version | cut -d' ' -f2)"
# ─── Install system dependencies ───────────────────────────────────────────
log_step "Installing system dependencies..."
sudo apt-get update -qq
# Allow non-zero exit — pre-existing DKMS/kernel issues (e.g. seeed-voicecard
# failing to build against a pending kernel update) can cause apt to return
# errors even though our packages installed successfully.
sudo apt-get install -y -qq \
python3-venv \
python3-pip \
alsa-utils \
sox \
libsox-fmt-all \
libopenblas0 \
2>/dev/null || log_warn "apt-get returned errors (likely pre-existing kernel/DKMS issue — continuing)"
# Verify the packages we actually need are present
for cmd in sox arecord aplay; do
command -v "$cmd" &>/dev/null || log_error "${cmd} not found after install"
done
log_ok "System dependencies installed"
# ─── Create install directory ───────────────────────────────────────────────
log_step "Setting up ${INSTALL_DIR}..."
mkdir -p "${INSTALL_DIR}" "${SOUNDS_DIR}"
# ─── Create Python venv ────────────────────────────────────────────────────
if [[ ! -d "${VENV_DIR}" ]]; then
log_step "Creating Python virtual environment..."
python3 -m venv "${VENV_DIR}"
fi
source "${VENV_DIR}/bin/activate"
pip install --upgrade pip setuptools wheel -q
# ─── Install Wyoming Satellite + openWakeWord ──────────────────────────────
log_step "Installing Wyoming Satellite..."
pip install wyoming-satellite -q
log_step "Installing openWakeWord..."
pip install openwakeword -q
log_step "Installing numpy..."
pip install numpy -q
log_ok "All Python packages installed"
# ─── Copy wakeword command script ──────────────────────────────────────────
log_step "Installing wake word detection script..."
cat > "${INSTALL_DIR}/wakeword_command.py" << 'PYEOF'
#!/usr/bin/env python3
"""Wake word detection command for Wyoming Satellite.
The satellite feeds raw 16kHz 16-bit mono audio via stdin.
This script reads that audio, runs openWakeWord, and prints
the wake word name to stdout when detected.
Usage (called by wyoming-satellite --wake-command):
python wakeword_command.py [--wake-word hey_jarvis] [--threshold 0.3]
"""
import argparse
import sys
import numpy as np
import logging
_LOGGER = logging.getLogger(__name__)
SAMPLE_RATE = 16000
CHUNK_SIZE = 1280 # ~80ms at 16kHz — recommended by openWakeWord
def main():
parser = argparse.ArgumentParser()
parser.add_argument("--wake-word", default="hey_jarvis")
parser.add_argument("--threshold", type=float, default=0.5)
parser.add_argument("--cooldown", type=float, default=3.0)
parser.add_argument("--debug", action="store_true")
args = parser.parse_args()
logging.basicConfig(
level=logging.DEBUG if args.debug else logging.WARNING,
format="%(asctime)s %(levelname)s %(message)s",
stream=sys.stderr,
)
import openwakeword
from openwakeword.model import Model
oww = Model(
wakeword_models=[args.wake_word],
inference_framework="onnx",
)
import time
last_trigger = 0.0
bytes_per_chunk = CHUNK_SIZE * 2 # 16-bit = 2 bytes per sample
_LOGGER.debug("Wake word command ready, reading audio from stdin")
try:
while True:
raw = sys.stdin.buffer.read(bytes_per_chunk)
if not raw:
break
if len(raw) < bytes_per_chunk:
raw = raw + b'\x00' * (bytes_per_chunk - len(raw))
chunk = np.frombuffer(raw, dtype=np.int16)
oww.predict(chunk)
for ww, scores in oww.prediction_buffer.items():
score = scores[-1] if scores else 0.0
if score >= args.threshold:
now = time.time()
if now - last_trigger >= args.cooldown:
last_trigger = now
print(ww, flush=True)
_LOGGER.debug("Wake word detected: %s (score=%.3f)", ww, score)
except (KeyboardInterrupt, BrokenPipeError):
pass
if __name__ == "__main__":
main()
PYEOF
chmod +x "${INSTALL_DIR}/wakeword_command.py"
log_ok "Wake word script installed"
# ─── Copy satellite wrapper ──────────────────────────────────────────────
log_step "Installing satellite wrapper (echo suppression + writer resilience)..."
cat > "${INSTALL_DIR}/satellite_wrapper.py" << 'WRAPEOF'
#!/usr/bin/env python3
"""Wyoming Satellite wrapper — echo suppression, writer resilience, streaming timeout.
Monkey-patches WakeStreamingSatellite to fix three compounding bugs that cause
the satellite to freeze after the first voice command:
1. TTS Echo: Mic picks up speaker audio → false wake word trigger
2. Server Writer Race: _writer is None when wake word fires → silent drop
3. No Streaming Timeout: stuck in is_streaming=True forever
4. Error events don't reset streaming state in upstream code
"""
import asyncio
import logging
import time
from wyoming.audio import AudioChunk, AudioStart, AudioStop
from wyoming.error import Error
from wyoming.wake import Detection
from wyoming_satellite.satellite import WakeStreamingSatellite
_LOGGER = logging.getLogger()
_GRACE_SECONDS = 5.0
_MAX_MUTE_SECONDS = 45.0
_STREAMING_TIMEOUT = 30.0
_orig_event_from_server = WakeStreamingSatellite.event_from_server
_orig_event_from_mic = WakeStreamingSatellite.event_from_mic
_orig_event_from_wake = WakeStreamingSatellite.event_from_wake
_orig_trigger_detection = WakeStreamingSatellite.trigger_detection
_orig_trigger_transcript = WakeStreamingSatellite.trigger_transcript
async def _patched_trigger_detection(self, detection):
self._speaker_mute_start = time.monotonic()
self._speaker_active = True
_LOGGER.debug("Speaker active (awake.wav) — wake detection muted")
await _orig_trigger_detection(self, detection)
async def _patched_trigger_transcript(self, transcript):
self._speaker_active = True
_LOGGER.debug("Speaker active (done.wav) — wake detection muted")
await _orig_trigger_transcript(self, transcript)
async def _patched_event_from_server(self, event):
if AudioStart.is_type(event.type):
self._speaker_active = True
self._speaker_mute_start = time.monotonic()
_LOGGER.debug("Speaker active (TTS) — wake detection muted")
elif AudioStop.is_type(event.type):
self._speaker_unmute_at = time.monotonic() + _GRACE_SECONDS
_LOGGER.debug("TTS finished — will unmute wake in %.1fs", _GRACE_SECONDS)
if Error.is_type(event.type) and self.is_streaming:
_LOGGER.warning("Error from server while streaming — resetting")
self.is_streaming = False
await _orig_event_from_server(self, event)
if Error.is_type(event.type) and not self.is_streaming:
await self.trigger_streaming_stop()
await self._send_wake_detect()
_LOGGER.info("Waiting for wake word (after error)")
async def _patched_event_from_mic(self, event, audio_bytes=None):
if self.is_streaming:
elapsed = time.monotonic() - getattr(self, "_streaming_start_time", 0)
if elapsed > _STREAMING_TIMEOUT:
_LOGGER.warning(
"Streaming timeout (%.0fs) — no Transcript received, resetting",
elapsed,
)
self.is_streaming = False
await self.event_to_server(AudioStop().event())
await self.trigger_streaming_stop()
await self._send_wake_detect()
_LOGGER.info("Waiting for wake word (after timeout)")
return
if getattr(self, "_speaker_active", False) and not self.is_streaming:
now = time.monotonic()
unmute_at = getattr(self, "_speaker_unmute_at", None)
if unmute_at and now >= unmute_at:
self._speaker_active = False
self._speaker_unmute_at = None
_LOGGER.debug("Wake detection unmuted (grace period elapsed)")
elif now - getattr(self, "_speaker_mute_start", now) > _MAX_MUTE_SECONDS:
self._speaker_active = False
self._speaker_unmute_at = None
_LOGGER.warning("Wake detection force-unmuted (max mute timeout)")
elif AudioChunk.is_type(event.type):
return
await _orig_event_from_mic(self, event, audio_bytes)
async def _patched_event_from_wake(self, event):
if self.is_streaming:
return
if Detection.is_type(event.type):
if self._writer is None:
_LOGGER.warning(
"Wake word detected but no server connection — re-arming"
)
await self._send_wake_detect()
return
self.is_streaming = True
self._streaming_start_time = time.monotonic()
_LOGGER.debug("Streaming audio")
await self._send_run_pipeline()
await self.forward_event(event)
await self.trigger_detection(Detection.from_event(event))
await self.trigger_streaming_start()
WakeStreamingSatellite.event_from_server = _patched_event_from_server
WakeStreamingSatellite.event_from_mic = _patched_event_from_mic
WakeStreamingSatellite.event_from_wake = _patched_event_from_wake
WakeStreamingSatellite.trigger_detection = _patched_trigger_detection
WakeStreamingSatellite.trigger_transcript = _patched_trigger_transcript
WakeStreamingSatellite._speaker_active = False
WakeStreamingSatellite._speaker_unmute_at = None
WakeStreamingSatellite._speaker_mute_start = 0.0
WakeStreamingSatellite._streaming_start_time = 0.0
if __name__ == "__main__":
from wyoming_satellite.__main__ import main
try:
asyncio.run(main())
except KeyboardInterrupt:
pass
WRAPEOF
chmod +x "${INSTALL_DIR}/satellite_wrapper.py"
log_ok "Satellite wrapper installed"
# ─── Download wake word model ──────────────────────────────────────────────
log_step "Downloading hey_jarvis wake word model..."
"${VENV_DIR}/bin/python3" -c "
import openwakeword
openwakeword.utils.download_models(model_names=['hey_jarvis'])
print('Model downloaded')
" 2>&1 | grep -v "device_discovery"
log_ok "Wake word model ready"
# ─── Create mic capture wrapper ────────────────────────────────────────────
log_step "Creating mic capture wrapper (stereo → mono conversion)..."
cat > "${INSTALL_DIR}/mic-capture.sh" << 'MICEOF'
#!/bin/bash
# Record stereo from ReSpeaker WM8960, convert to mono 16kHz 16-bit for Wyoming
arecord -D plughw:2,0 -r 16000 -c 2 -f S16_LE -t raw -q - | sox -t raw -r 16000 -c 2 -b 16 -e signed-integer - -t raw -r 16000 -c 1 -b 16 -e signed-integer -
MICEOF
chmod +x "${INSTALL_DIR}/mic-capture.sh"
log_ok "Mic capture wrapper installed"
# ─── Create speaker playback wrapper ──────────────────────────────────────
log_step "Creating speaker playback wrapper (mono → stereo conversion)..."
cat > "${INSTALL_DIR}/speaker-playback.sh" << 'SPKEOF'
#!/bin/bash
# Convert mono 24kHz 16-bit input to stereo for WM8960 playback
sox -t raw -r 24000 -c 1 -b 16 -e signed-integer - -t raw -r 24000 -c 2 -b 16 -e signed-integer - | aplay -D plughw:2,0 -r 24000 -c 2 -f S16_LE -t raw -q -
SPKEOF
chmod +x "${INSTALL_DIR}/speaker-playback.sh"
log_ok "Speaker playback wrapper installed"
# ─── Fix ReSpeaker overlay for Pi 5 ────────────────────────────────────────
log_step "Configuring wm8960-soundcard overlay (Pi 5 compatible)..."
# Disable the seeed-voicecard service (loads wrong overlay for Pi 5)
if systemctl is-enabled seeed-voicecard.service &>/dev/null; then
sudo systemctl disable seeed-voicecard.service 2>/dev/null || true
log_info "Disabled seeed-voicecard service"
fi
# Add upstream wm8960-soundcard overlay to config.txt if not present
if ! grep -q "dtoverlay=wm8960-soundcard" /boot/firmware/config.txt 2>/dev/null; then
sudo bash -c 'echo "dtoverlay=wm8960-soundcard" >> /boot/firmware/config.txt'
log_info "Added wm8960-soundcard overlay to /boot/firmware/config.txt"
fi
# Load overlay now if not already active
if ! dtoverlay -l 2>/dev/null | grep -q wm8960-soundcard; then
sudo dtoverlay -r seeed-2mic-voicecard 2>/dev/null || true
sudo dtoverlay wm8960-soundcard 2>/dev/null || true
fi
log_ok "Audio overlay configured"
# ─── Generate feedback sounds ──────────────────────────────────────────────
log_step "Generating feedback sounds..."
# Must be plain 16-bit PCM WAV — Python wave module can't read WAVE_FORMAT_EXTENSIBLE
# Awake chime — short rising tone
sox -n -r 16000 -b 16 -c 1 -e signed-integer "${SOUNDS_DIR}/awake.wav" \
synth 0.15 sin 800 fade t 0.01 0.15 0.05 \
vol 0.5 \
2>/dev/null || log_warn "Could not generate awake.wav (sox issue)"
# Done chime — short falling tone
sox -n -r 16000 -b 16 -c 1 -e signed-integer "${SOUNDS_DIR}/done.wav" \
synth 0.15 sin 600 fade t 0.01 0.15 0.05 \
vol 0.5 \
2>/dev/null || log_warn "Could not generate done.wav (sox issue)"
log_ok "Feedback sounds ready"
# ─── Set ALSA mixer defaults ───────────────────────────────────────────────
log_step "Configuring ALSA mixer for ReSpeaker..."
# Playback — 80% volume, unmute
amixer -c 2 sset 'Playback' 80% unmute 2>/dev/null || true
amixer -c 2 sset 'Speaker' 80% unmute 2>/dev/null || true
# Capture — max out capture volume
amixer -c 2 sset 'Capture' 100% cap 2>/dev/null || true
# Enable mic input boost (critical — without this, signal is near-silent)
amixer -c 2 cset name='Left Input Mixer Boost Switch' on 2>/dev/null || true
amixer -c 2 cset name='Right Input Mixer Boost Switch' on 2>/dev/null || true
# Mic preamp boost to +13dB (1 of 3 — higher causes clipping)
amixer -c 2 cset name='Left Input Boost Mixer LINPUT1 Volume' 1 2>/dev/null || true
amixer -c 2 cset name='Right Input Boost Mixer RINPUT1 Volume' 1 2>/dev/null || true
# ADC capture volume — moderate to avoid clipping (max=255)
amixer -c 2 cset name='ADC PCM Capture Volume' 180,180 2>/dev/null || true
log_ok "ALSA mixer configured"
# ─── Install systemd service ───────────────────────────────────────────────
log_step "Installing systemd service..."
sudo tee /etc/systemd/system/homeai-satellite.service > /dev/null << SVCEOF
[Unit]
Description=HomeAI Wyoming Satellite (${SATELLITE_AREA})
After=network-online.target sound.target
Wants=network-online.target
[Service]
Type=simple
User=${USER}
WorkingDirectory=${INSTALL_DIR}
ExecStart=${VENV_DIR}/bin/python3 ${INSTALL_DIR}/satellite_wrapper.py \\
--uri tcp://0.0.0.0:${SATELLITE_PORT} \\
--name "${SATELLITE_NAME}" \\
--area "${SATELLITE_AREA}" \\
--mic-command ${INSTALL_DIR}/mic-capture.sh \\
--snd-command ${INSTALL_DIR}/speaker-playback.sh \\
--mic-command-rate 16000 \\
--mic-command-width 2 \\
--mic-command-channels 1 \\
--snd-command-rate 24000 \\
--snd-command-width 2 \\
--snd-command-channels 1 \\
--wake-command "${VENV_DIR}/bin/python3 ${INSTALL_DIR}/wakeword_command.py --wake-word hey_jarvis --threshold 0.5" \\
--wake-command-rate 16000 \\
--wake-command-width 2 \\
--wake-command-channels 1 \\
--awake-wav ${SOUNDS_DIR}/awake.wav \\
--done-wav ${SOUNDS_DIR}/done.wav
Restart=always
RestartSec=5
[Install]
WantedBy=multi-user.target
SVCEOF
sudo systemctl daemon-reload
sudo systemctl enable homeai-satellite.service
sudo systemctl restart homeai-satellite.service
log_ok "systemd service installed and started"
# ─── Verify ────────────────────────────────────────────────────────────────
log_step "Verifying satellite..."
sleep 2
if systemctl is-active --quiet homeai-satellite.service; then
log_ok "Satellite is running!"
else
log_warn "Satellite may not have started cleanly. Check logs:"
echo " journalctl -u homeai-satellite.service -f"
fi
echo ""
echo -e "${GREEN}═══════════════════════════════════════════════════════════════${NC}"
echo -e "${GREEN} HomeAI Kitchen Satellite — Setup Complete${NC}"
echo -e "${GREEN}═══════════════════════════════════════════════════════════════${NC}"
echo ""
echo " Satellite: ${SATELLITE_NAME} (${SATELLITE_AREA})"
echo " Port: ${SATELLITE_PORT}"
echo " Mic: ${MIC_DEVICE} (ReSpeaker 2-Mics)"
echo " Speaker: ${SPK_DEVICE} (ReSpeaker 3.5mm)"
echo " Wake word: hey_jarvis"
echo ""
echo " Next steps:"
echo " 1. In Home Assistant, go to Settings → Devices & Services → Add Integration"
echo " 2. Search for 'Wyoming Protocol'"
echo " 3. Enter host: $(hostname -I | awk '{print $1}') port: ${SATELLITE_PORT}"
echo " 4. Assign the HomeAI voice pipeline to this satellite"
echo ""
echo " Useful commands:"
echo " journalctl -u homeai-satellite.service -f # live logs"
echo " sudo systemctl restart homeai-satellite # restart"
echo " sudo systemctl status homeai-satellite # status"
echo " arecord -D ${MIC_DEVICE} -d 3 -f S16_LE -r 16000 /tmp/test.wav # test mic"
echo " aplay -D ${SPK_DEVICE} /tmp/test.wav # test speaker"
echo ""

3
homeai-rpi/speaker-playback.sh Executable file
View File

@@ -0,0 +1,3 @@
#!/bin/bash
# Convert mono 24kHz 16-bit input to stereo for WM8960 playback
sox -t raw -r 24000 -c 1 -b 16 -e signed-integer - -t raw -r 24000 -c 2 -b 16 -e signed-integer - | aplay -D plughw:2,0 -r 24000 -c 2 -f S16_LE -t raw -q -

View File

@@ -0,0 +1,40 @@
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN"
"http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>Label</key>
<string>com.homeai.vtube-bridge</string>
<key>ProgramArguments</key>
<array>
<string>/Users/aodhan/homeai-visual-env/bin/python3</string>
<string>/Users/aodhan/gitea/homeai/homeai-visual/vtube-bridge.py</string>
<string>--port</string>
<string>8002</string>
<string>--character</string>
<string>/Users/aodhan/gitea/homeai/homeai-dashboard/characters/aria.json</string>
</array>
<key>RunAtLoad</key>
<true/>
<key>KeepAlive</key>
<true/>
<key>StandardOutPath</key>
<string>/tmp/homeai-vtube-bridge.log</string>
<key>StandardErrorPath</key>
<string>/tmp/homeai-vtube-bridge-error.log</string>
<key>ThrottleInterval</key>
<integer>10</integer>
<key>EnvironmentVariables</key>
<dict>
<key>PATH</key>
<string>/opt/homebrew/bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin</string>
</dict>
</dict>
</plist>

View File

@@ -0,0 +1,170 @@
#!/usr/bin/env python3
"""
Test script for VTube Studio Expression Bridge.
Usage:
python3 test-expressions.py # test all expressions
python3 test-expressions.py --auth # run auth flow first
python3 test-expressions.py --lipsync # test lip sync parameter
python3 test-expressions.py --latency # measure round-trip latency
Requires the vtube-bridge to be running on port 8002.
"""
import argparse
import json
import sys
import time
import urllib.request
BRIDGE_URL = "http://localhost:8002"
EXPRESSIONS = ["idle", "listening", "thinking", "speaking", "happy", "sad", "surprised", "error"]
def _post(path: str, data: dict | None = None) -> dict:
body = json.dumps(data or {}).encode()
req = urllib.request.Request(
f"{BRIDGE_URL}{path}",
data=body,
headers={"Content-Type": "application/json"},
method="POST",
)
with urllib.request.urlopen(req, timeout=10) as resp:
return json.loads(resp.read())
def _get(path: str) -> dict:
req = urllib.request.Request(f"{BRIDGE_URL}{path}")
with urllib.request.urlopen(req, timeout=10) as resp:
return json.loads(resp.read())
def check_bridge():
"""Verify bridge is running and connected."""
try:
status = _get("/status")
print(f"Bridge status: connected={status['connected']}, authenticated={status['authenticated']}")
print(f" Expressions: {', '.join(status.get('expressions', []))}")
if not status["connected"]:
print("\n WARNING: Not connected to VTube Studio. Is it running?")
if not status["authenticated"]:
print(" WARNING: Not authenticated. Run with --auth to initiate auth flow.")
return status
except Exception as e:
print(f"ERROR: Cannot reach bridge at {BRIDGE_URL}: {e}")
print(" Is vtube-bridge.py running?")
sys.exit(1)
def run_auth():
"""Initiate auth flow — user must click Allow in VTube Studio."""
print("Requesting authentication token...")
print(" >>> Click 'Allow' in VTube Studio when prompted <<<")
result = _post("/auth")
print(f" Result: {json.dumps(result, indent=2)}")
return result
def test_expressions(delay: float = 2.0):
"""Cycle through all expressions with a pause between each."""
print(f"\nCycling through {len(EXPRESSIONS)} expressions ({delay}s each):\n")
for expr in EXPRESSIONS:
print(f"{expr}...", end=" ", flush=True)
t0 = time.monotonic()
result = _post("/expression", {"event": expr})
dt = (time.monotonic() - t0) * 1000
if result.get("ok"):
print(f"OK ({dt:.0f}ms)")
else:
print(f"FAILED: {result.get('error', 'unknown')}")
time.sleep(delay)
# Return to idle
_post("/expression", {"event": "idle"})
print("\n Returned to idle.")
def test_lipsync(duration: float = 3.0):
"""Simulate lip sync by sweeping MouthOpen 0→1→0."""
import math
print(f"\nTesting lip sync (MouthOpen sweep, {duration}s)...\n")
fps = 20
frames = int(duration * fps)
for i in range(frames):
t = i / frames
# Sine wave for smooth open/close
value = abs(math.sin(t * math.pi * 4))
value = round(value, 3)
_post("/parameter", {"name": "MouthOpen", "value": value})
print(f"\r MouthOpen = {value:.3f}", end="", flush=True)
time.sleep(1.0 / fps)
_post("/parameter", {"name": "MouthOpen", "value": 0.0})
print("\r MouthOpen = 0.000 (done) ")
def test_latency(iterations: int = 20):
"""Measure expression trigger round-trip latency."""
print(f"\nMeasuring latency ({iterations} iterations)...\n")
times = []
for i in range(iterations):
expr = "thinking" if i % 2 == 0 else "idle"
t0 = time.monotonic()
_post("/expression", {"event": expr})
dt = (time.monotonic() - t0) * 1000
times.append(dt)
print(f" {i+1:2d}. {expr:10s}{dt:.1f}ms")
avg = sum(times) / len(times)
mn = min(times)
mx = max(times)
print(f"\n Avg: {avg:.1f}ms Min: {mn:.1f}ms Max: {mx:.1f}ms")
if avg < 100:
print(" PASS: Average latency under 100ms target")
else:
print(" WARNING: Average latency exceeds 100ms target")
# Return to idle
_post("/expression", {"event": "idle"})
def main():
parser = argparse.ArgumentParser(description="VTube Studio Expression Bridge Tester")
parser.add_argument("--auth", action="store_true", help="Run auth flow")
parser.add_argument("--lipsync", action="store_true", help="Test lip sync parameter sweep")
parser.add_argument("--latency", action="store_true", help="Measure round-trip latency")
parser.add_argument("--delay", type=float, default=2.0, help="Delay between expressions (default: 2s)")
parser.add_argument("--all", action="store_true", help="Run all tests")
args = parser.parse_args()
print("VTube Studio Expression Bridge Tester")
print("=" * 42)
status = check_bridge()
if args.auth:
run_auth()
print()
status = check_bridge()
if not status.get("authenticated") and not args.auth:
print("\nNot authenticated — skipping expression tests.")
print("Run with --auth to authenticate, or start VTube Studio first.")
return
if args.all:
test_expressions(args.delay)
test_lipsync()
test_latency()
elif args.lipsync:
test_lipsync()
elif args.latency:
test_latency()
else:
test_expressions(args.delay)
if __name__ == "__main__":
main()

View File

@@ -1,17 +1,16 @@
#!/usr/bin/env bash
# homeai-visual/setup.sh — P7: VTube Studio bridge + Live2D expressions
# homeai-visual/setup.sh — P7: VTube Studio Expression Bridge
#
# Components:
# - vtube_studio.py — WebSocket client skill for OpenClaw
# - lipsync.py — amplitude-based lip sync
# - auth.py — VTube Studio token management
# Sets up:
# - Python venv with websockets
# - vtube-bridge daemon (HTTP ↔ WebSocket bridge)
# - vtube-ctl CLI (symlinked to PATH)
# - launchd service
#
# Prerequisites:
# - P4 (homeai-agent) — OpenClaw running
# - P5 (homeai-character) — aria.json with live2d_expressions set
# - macOS: VTube Studio installed (Mac App Store)
# - Linux: N/A — VTube Studio is macOS/Windows/iOS only
# Linux dev can test the skill code but not the VTube Studio side
# - VTube Studio installed (Mac App Store) with WebSocket API enabled
set -euo pipefail
@@ -19,42 +18,61 @@ SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
REPO_DIR="$(cd "${SCRIPT_DIR}/.." && pwd)"
source "${REPO_DIR}/scripts/common.sh"
log_section "P7: VTube Studio Bridge"
detect_platform
VENV_DIR="$HOME/homeai-visual-env"
PLIST_SRC="${SCRIPT_DIR}/launchd/com.homeai.vtube-bridge.plist"
PLIST_DST="$HOME/Library/LaunchAgents/com.homeai.vtube-bridge.plist"
VTUBE_CTL_SRC="$HOME/.openclaw/skills/vtube-studio/scripts/vtube-ctl"
if [[ "$OS_TYPE" == "linux" ]]; then
log_warn "VTube Studio is not available on Linux."
log_warn "This sub-project requires macOS (Mac Mini)."
log_section "P7: VTube Studio Expression Bridge"
# ─── Python venv ──────────────────────────────────────────────────────────────
if [[ ! -d "$VENV_DIR" ]]; then
log_info "Creating Python venv at $VENV_DIR..."
python3 -m venv "$VENV_DIR"
fi
# ─── TODO: Implementation ──────────────────────────────────────────────────────
cat <<'EOF'
log_info "Installing dependencies..."
"$VENV_DIR/bin/pip" install --upgrade pip -q
"$VENV_DIR/bin/pip" install websockets -q
log_ok "Python venv ready ($(${VENV_DIR}/bin/python3 --version))"
─────────────────────────────────────────────────────────────────┐
│ P7: homeai-visual — NOT YET IMPLEMENTED │
│ │
│ macOS only (VTube Studio is macOS/iOS/Windows) │
│ │
│ Implementation steps: │
│ 1. Install VTube Studio from Mac App Store │
│ 2. Enable WebSocket API in VTube Studio (Settings → port 8001) │
│ 3. Source/purchase Live2D model │
│ 4. Create expression hotkeys for 8 states │
│ 5. Implement skills/vtube_studio.py (WebSocket client) │
│ 6. Implement skills/lipsync.py (amplitude → MouthOpen param) │
│ 7. Implement skills/auth.py (token request + persistence) │
│ 8. Register vtube_studio skill with OpenClaw │
│ 9. Update aria.json live2d_expressions with hotkey IDs │
│ 10. Test all 8 expression states │
│ │
│ On Linux: implement Python skills, test WebSocket protocol │
│ with a mock server before connecting to real VTube Studio. │
│ │
│ Interface contracts: │
│ VTUBE_WS_URL=ws://localhost:8001 │
└─────────────────────────────────────────────────────────────────┘
# ─── vtube-ctl symlink ───────────────────────────────────────────────────────
EOF
if [[ -f "$VTUBE_CTL_SRC" ]]; then
chmod +x "$VTUBE_CTL_SRC"
ln -sf "$VTUBE_CTL_SRC" /opt/homebrew/bin/vtube-ctl
log_ok "vtube-ctl symlinked to /opt/homebrew/bin/vtube-ctl"
else
log_warn "vtube-ctl not found at $VTUBE_CTL_SRC — skipping symlink"
fi
log_info "P7 is not yet implemented. See homeai-visual/PLAN.md for details."
exit 0
# ─── launchd service ─────────────────────────────────────────────────────────
if [[ -f "$PLIST_SRC" ]]; then
# Unload if already loaded
launchctl bootout "gui/$(id -u)/com.homeai.vtube-bridge" 2>/dev/null || true
cp "$PLIST_SRC" "$PLIST_DST"
launchctl bootstrap "gui/$(id -u)" "$PLIST_DST"
log_ok "launchd service loaded: com.homeai.vtube-bridge"
else
log_warn "Plist not found at $PLIST_SRC — skipping launchd setup"
fi
# ─── Status ──────────────────────────────────────────────────────────────────
echo ""
log_info "VTube Bridge setup complete."
log_info ""
log_info "Next steps:"
log_info " 1. Install VTube Studio from Mac App Store"
log_info " 2. Enable WebSocket API: Settings > WebSocket API > port 8001"
log_info " 3. Load a Live2D model"
log_info " 4. Create expression hotkeys (idle, listening, thinking, speaking, happy, sad, surprised, error)"
log_info " 5. Run: vtube-ctl auth (click Allow in VTube Studio)"
log_info " 6. Run: python3 ${SCRIPT_DIR}/scripts/test-expressions.py --all"
log_info " 7. Update aria.json with real hotkey UUIDs"
log_info ""
log_info "Logs: /tmp/homeai-vtube-bridge.log"
log_info "Bridge: http://localhost:8002/status"

View File

@@ -0,0 +1,454 @@
#!/usr/bin/env python3
"""
VTube Studio Expression Bridge — persistent WebSocket ↔ HTTP bridge.
Maintains a long-lived WebSocket connection to VTube Studio and exposes
a simple HTTP API so other HomeAI components can trigger expressions and
inject parameters (lip sync) without managing their own WS connections.
HTTP API (port 8002):
POST /expression {"event": "thinking"} → trigger hotkey
POST /parameter {"name": "MouthOpen", "value": 0.5} → inject param
POST /parameters [{"name": "MouthOpen", "value": 0.5}, ...]
POST /auth {} → request new token
GET /status → connection info
GET /expressions → list available expressions
Requires: pip install websockets
"""
import argparse
import asyncio
import json
import logging
import signal
import sys
import time
from http import HTTPStatus
from pathlib import Path
try:
import websockets
from websockets.exceptions import ConnectionClosed
except ImportError:
print("ERROR: 'websockets' package required. Install with: pip install websockets", file=sys.stderr)
sys.exit(1)
# ---------------------------------------------------------------------------
# Config
# ---------------------------------------------------------------------------
DEFAULT_VTUBE_WS_URL = "ws://localhost:8001"
DEFAULT_HTTP_PORT = 8002
TOKEN_PATH = Path.home() / ".openclaw" / "vtube_token.json"
DEFAULT_CHARACTER_PATH = (
Path.home() / "gitea" / "homeai" / "homeai-dashboard" / "characters" / "aria.json"
)
logger = logging.getLogger("vtube-bridge")
# ---------------------------------------------------------------------------
# VTube Studio WebSocket Client
# ---------------------------------------------------------------------------
class VTubeClient:
"""Persistent async WebSocket client for VTube Studio API."""
def __init__(self, ws_url: str, character_path: Path):
self.ws_url = ws_url
self.character_path = character_path
self._ws = None
self._token: str | None = None
self._authenticated = False
self._current_expression: str | None = None
self._connected = False
self._request_id = 0
self._lock = asyncio.Lock()
self._load_token()
self._load_character()
# ── Character config ──────────────────────────────────────────────
def _load_character(self):
"""Load expression mappings from character JSON."""
self.expression_map: dict[str, str] = {}
self.ws_triggers: dict = {}
try:
if self.character_path.exists():
cfg = json.loads(self.character_path.read_text())
self.expression_map = cfg.get("live2d_expressions", {})
self.ws_triggers = cfg.get("vtube_ws_triggers", {})
logger.info("Loaded %d expressions from %s", len(self.expression_map), self.character_path.name)
else:
logger.warning("Character file not found: %s", self.character_path)
except Exception as e:
logger.error("Failed to load character config: %s", e)
def reload_character(self):
"""Hot-reload character config without restarting."""
self._load_character()
return {"expressions": self.expression_map, "triggers": self.ws_triggers}
# ── Token persistence ─────────────────────────────────────────────
def _load_token(self):
try:
if TOKEN_PATH.exists():
data = json.loads(TOKEN_PATH.read_text())
self._token = data.get("token")
logger.info("Loaded auth token from %s", TOKEN_PATH)
except Exception as e:
logger.warning("Could not load token: %s", e)
def _save_token(self, token: str):
TOKEN_PATH.parent.mkdir(parents=True, exist_ok=True)
TOKEN_PATH.write_text(json.dumps({"token": token}, indent=2))
self._token = token
logger.info("Saved auth token to %s", TOKEN_PATH)
# ── WebSocket comms ───────────────────────────────────────────────
def _next_id(self) -> str:
self._request_id += 1
return f"homeai-{self._request_id}"
async def _send(self, message_type: str, data: dict | None = None) -> dict:
"""Send a VTube Studio API message and return the response."""
payload = {
"apiName": "VTubeStudioPublicAPI",
"apiVersion": "1.0",
"requestID": self._next_id(),
"messageType": message_type,
"data": data or {},
}
await self._ws.send(json.dumps(payload))
resp = json.loads(await asyncio.wait_for(self._ws.recv(), timeout=10))
return resp
# ── Connection lifecycle ──────────────────────────────────────────
async def connect(self):
"""Connect and authenticate to VTube Studio."""
try:
self._ws = await websockets.connect(self.ws_url, ping_interval=20, ping_timeout=10)
self._connected = True
logger.info("Connected to VTube Studio at %s", self.ws_url)
if self._token:
await self._authenticate()
else:
logger.warning("No auth token — call POST /auth to initiate authentication")
except Exception as e:
self._connected = False
self._authenticated = False
logger.error("Connection failed: %s", e)
raise
async def _authenticate(self):
"""Authenticate with an existing token."""
resp = await self._send("AuthenticationRequest", {
"pluginName": "HomeAI",
"pluginDeveloper": "HomeAI",
"authenticationToken": self._token,
})
self._authenticated = resp.get("data", {}).get("authenticated", False)
if self._authenticated:
logger.info("Authenticated successfully")
else:
logger.warning("Token rejected — request a new one via POST /auth")
self._authenticated = False
async def request_new_token(self) -> dict:
"""Request a new auth token. User must click Allow in VTube Studio."""
if not self._connected:
return {"error": "Not connected to VTube Studio"}
resp = await self._send("AuthenticationTokenRequest", {
"pluginName": "HomeAI",
"pluginDeveloper": "HomeAI",
"pluginIcon": None,
})
token = resp.get("data", {}).get("authenticationToken")
if token:
self._save_token(token)
await self._authenticate()
return {"authenticated": self._authenticated, "token_saved": True}
return {"error": "No token received", "response": resp}
async def disconnect(self):
if self._ws:
await self._ws.close()
self._connected = False
self._authenticated = False
async def ensure_connected(self):
"""Reconnect if the connection dropped."""
if not self._connected or self._ws is None or self._ws.closed:
logger.info("Reconnecting...")
await self.connect()
# ── Expression & parameter API ────────────────────────────────────
async def trigger_expression(self, event: str) -> dict:
"""Trigger a named expression from the character config."""
async with self._lock:
await self.ensure_connected()
if not self._authenticated:
return {"error": "Not authenticated"}
hotkey_id = self.expression_map.get(event)
if not hotkey_id:
return {"error": f"Unknown expression: {event}", "available": list(self.expression_map.keys())}
resp = await self._send("HotkeyTriggerRequest", {"hotkeyID": hotkey_id})
self._current_expression = event
return {"ok": True, "expression": event, "hotkey_id": hotkey_id}
async def set_parameter(self, name: str, value: float, weight: float = 1.0) -> dict:
"""Inject a single VTube Studio parameter value."""
async with self._lock:
await self.ensure_connected()
if not self._authenticated:
return {"error": "Not authenticated"}
resp = await self._send("InjectParameterDataRequest", {
"parameterValues": [{"id": name, "value": value, "weight": weight}],
})
return {"ok": True, "name": name, "value": value}
async def set_parameters(self, params: list[dict]) -> dict:
"""Inject multiple VTube Studio parameters at once."""
async with self._lock:
await self.ensure_connected()
if not self._authenticated:
return {"error": "Not authenticated"}
param_values = [
{"id": p["name"], "value": p["value"], "weight": p.get("weight", 1.0)}
for p in params
]
resp = await self._send("InjectParameterDataRequest", {
"parameterValues": param_values,
})
return {"ok": True, "count": len(param_values)}
async def list_hotkeys(self) -> dict:
"""List all hotkeys available in the current model."""
async with self._lock:
await self.ensure_connected()
if not self._authenticated:
return {"error": "Not authenticated"}
resp = await self._send("HotkeysInCurrentModelRequest", {})
return resp.get("data", {})
async def list_parameters(self) -> dict:
"""List all input parameters for the current model."""
async with self._lock:
await self.ensure_connected()
if not self._authenticated:
return {"error": "Not authenticated"}
resp = await self._send("InputParameterListRequest", {})
return resp.get("data", {})
def status(self) -> dict:
return {
"connected": self._connected,
"authenticated": self._authenticated,
"ws_url": self.ws_url,
"current_expression": self._current_expression,
"expression_count": len(self.expression_map),
"expressions": list(self.expression_map.keys()),
}
# ---------------------------------------------------------------------------
# HTTP Server (asyncio-based, no external deps)
# ---------------------------------------------------------------------------
class BridgeHTTPHandler:
"""Simple async HTTP request handler for the bridge API."""
def __init__(self, client: VTubeClient):
self.client = client
async def handle(self, reader: asyncio.StreamReader, writer: asyncio.StreamWriter):
try:
request_line = await asyncio.wait_for(reader.readline(), timeout=5)
if not request_line:
writer.close()
return
method, path, _ = request_line.decode().strip().split(" ", 2)
path = path.split("?")[0] # strip query params
# Read headers
content_length = 0
while True:
line = await reader.readline()
if line == b"\r\n" or not line:
break
if line.lower().startswith(b"content-length:"):
content_length = int(line.split(b":")[1].strip())
# Read body
body = None
if content_length > 0:
body = await reader.read(content_length)
# Route
try:
result = await self._route(method, path, body)
await self._respond(writer, 200, result)
except Exception as e:
logger.error("Handler error: %s", e, exc_info=True)
await self._respond(writer, 500, {"error": str(e)})
except asyncio.TimeoutError:
writer.close()
except Exception as e:
logger.error("Connection error: %s", e)
try:
writer.close()
except Exception:
pass
async def _route(self, method: str, path: str, body: bytes | None) -> dict:
data = {}
if body:
try:
data = json.loads(body)
except json.JSONDecodeError:
return {"error": "Invalid JSON"}
if method == "GET" and path == "/status":
return self.client.status()
if method == "GET" and path == "/expressions":
return {
"expressions": self.client.expression_map,
"triggers": self.client.ws_triggers,
}
if method == "GET" and path == "/hotkeys":
return await self.client.list_hotkeys()
if method == "GET" and path == "/parameters":
return await self.client.list_parameters()
if method == "POST" and path == "/expression":
event = data.get("event")
if not event:
return {"error": "Missing 'event' field"}
return await self.client.trigger_expression(event)
if method == "POST" and path == "/parameter":
name = data.get("name")
value = data.get("value")
if name is None or value is None:
return {"error": "Missing 'name' or 'value' field"}
return await self.client.set_parameter(name, float(value), float(data.get("weight", 1.0)))
if method == "POST" and path == "/parameters":
if not isinstance(data, list):
return {"error": "Expected JSON array of {name, value} objects"}
return await self.client.set_parameters(data)
if method == "POST" and path == "/auth":
return await self.client.request_new_token()
if method == "POST" and path == "/reload":
return self.client.reload_character()
return {"error": f"Unknown route: {method} {path}"}
async def _respond(self, writer: asyncio.StreamWriter, status: int, data: dict):
body = json.dumps(data, indent=2).encode()
status_text = HTTPStatus(status).phrase
header = (
f"HTTP/1.1 {status} {status_text}\r\n"
f"Content-Type: application/json\r\n"
f"Content-Length: {len(body)}\r\n"
f"Access-Control-Allow-Origin: *\r\n"
f"Access-Control-Allow-Methods: GET, POST, OPTIONS\r\n"
f"Access-Control-Allow-Headers: Content-Type\r\n"
f"\r\n"
)
writer.write(header.encode() + body)
await writer.drain()
writer.close()
# ---------------------------------------------------------------------------
# Auto-reconnect loop
# ---------------------------------------------------------------------------
async def reconnect_loop(client: VTubeClient, interval: float = 5.0):
"""Background task that keeps the VTube Studio connection alive."""
while True:
try:
if not client._connected or client._ws is None or client._ws.closed:
logger.info("Connection lost — attempting reconnect...")
await client.connect()
except Exception as e:
logger.debug("Reconnect failed: %s (retrying in %.0fs)", e, interval)
await asyncio.sleep(interval)
# ---------------------------------------------------------------------------
# Main
# ---------------------------------------------------------------------------
async def main(args):
logging.basicConfig(
level=logging.DEBUG if args.verbose else logging.INFO,
format="%(asctime)s [%(name)s] %(levelname)s: %(message)s",
datefmt="%H:%M:%S",
)
character_path = Path(args.character)
client = VTubeClient(args.vtube_url, character_path)
# Try initial connection (don't fail if VTube Studio isn't running yet)
try:
await client.connect()
except Exception as e:
logger.warning("Initial connection failed: %s (will keep retrying)", e)
# Start reconnect loop
reconnect_task = asyncio.create_task(reconnect_loop(client, interval=5.0))
# Start HTTP server
handler = BridgeHTTPHandler(client)
server = await asyncio.start_server(handler.handle, "0.0.0.0", args.port)
logger.info("HTTP API listening on http://0.0.0.0:%d", args.port)
logger.info("Endpoints: /status /expression /parameter /parameters /auth /reload /hotkeys")
# Graceful shutdown
stop = asyncio.Event()
def _signal_handler():
logger.info("Shutting down...")
stop.set()
loop = asyncio.get_event_loop()
for sig in (signal.SIGINT, signal.SIGTERM):
loop.add_signal_handler(sig, _signal_handler)
async with server:
await stop.wait()
reconnect_task.cancel()
await client.disconnect()
logger.info("Goodbye.")
if __name__ == "__main__":
parser = argparse.ArgumentParser(description="VTube Studio Expression Bridge")
parser.add_argument("--port", type=int, default=DEFAULT_HTTP_PORT, help="HTTP API port (default: 8002)")
parser.add_argument("--vtube-url", default=DEFAULT_VTUBE_WS_URL, help="VTube Studio WebSocket URL")
parser.add_argument("--character", default=str(DEFAULT_CHARACTER_PATH), help="Path to character JSON")
parser.add_argument("--verbose", "-v", action="store_true", help="Debug logging")
args = parser.parse_args()
asyncio.run(main(args))

View File

@@ -18,6 +18,12 @@
<string>1.0</string>
</array>
<key>EnvironmentVariables</key>
<dict>
<key>ELEVENLABS_API_KEY</key>
<string>sk_ec10e261c6190307a37aa161a9583504dcf25a0cabe5dbd5</string>
</dict>
<key>RunAtLoad</key>
<true/>

View File

@@ -7,8 +7,10 @@ Usage:
import argparse
import asyncio
import json
import logging
import os
import urllib.request
import numpy as np
@@ -20,10 +22,76 @@ from wyoming.tts import Synthesize
_LOGGER = logging.getLogger(__name__)
ACTIVE_TTS_VOICE_PATH = os.path.expanduser("~/homeai-data/active-tts-voice.json")
SAMPLE_RATE = 24000
SAMPLE_WIDTH = 2 # int16
CHANNELS = 1
CHUNK_SECONDS = 1 # stream in 1-second chunks
VTUBE_BRIDGE_URL = "http://localhost:8002"
LIPSYNC_ENABLED = True
LIPSYNC_FRAME_SAMPLES = 1200 # 50ms frames at 24kHz → 20 updates/sec
LIPSYNC_SCALE = 10.0 # amplitude multiplier (tuned for Kokoro output levels)
def _send_lipsync(value: float):
"""Fire-and-forget POST to vtube-bridge with mouth open value."""
try:
body = json.dumps({"name": "MouthOpen", "value": value}).encode()
req = urllib.request.Request(
f"{VTUBE_BRIDGE_URL}/parameter",
data=body,
headers={"Content-Type": "application/json"},
method="POST",
)
urllib.request.urlopen(req, timeout=0.5)
except Exception:
pass # bridge may not be running
def _compute_lipsync_frames(samples_int16: np.ndarray) -> list[float]:
"""Compute per-frame RMS amplitude scaled to 01 for lip sync."""
frames = []
for i in range(0, len(samples_int16), LIPSYNC_FRAME_SAMPLES):
frame = samples_int16[i : i + LIPSYNC_FRAME_SAMPLES].astype(np.float32)
rms = np.sqrt(np.mean(frame ** 2)) / 32768.0
mouth = min(rms * LIPSYNC_SCALE, 1.0)
frames.append(round(mouth, 3))
return frames
def _get_active_tts_config() -> dict | None:
"""Read the active TTS config set by the OpenClaw bridge."""
try:
with open(ACTIVE_TTS_VOICE_PATH) as f:
return json.load(f)
except Exception:
return None
def _synthesize_elevenlabs(text: str, voice_id: str, model: str = "eleven_multilingual_v2") -> bytes:
"""Call ElevenLabs TTS API and return raw PCM audio bytes (24kHz 16-bit mono)."""
api_key = os.environ.get("ELEVENLABS_API_KEY", "")
if not api_key:
raise RuntimeError("ELEVENLABS_API_KEY not set")
url = f"https://api.elevenlabs.io/v1/text-to-speech/{voice_id}?output_format=pcm_24000"
payload = json.dumps({
"text": text,
"model_id": model,
"voice_settings": {"stability": 0.5, "similarity_boost": 0.75},
}).encode()
req = urllib.request.Request(
url,
data=payload,
headers={
"Content-Type": "application/json",
"xi-api-key": api_key,
},
method="POST",
)
with urllib.request.urlopen(req, timeout=30) as resp:
return resp.read()
def _load_kokoro():
@@ -76,26 +144,53 @@ class KokoroEventHandler(AsyncEventHandler):
synthesize = Synthesize.from_event(event)
text = synthesize.text
voice = self._default_voice
use_elevenlabs = False
if synthesize.voice and synthesize.voice.name:
# Bridge state file takes priority (set per-request by OpenClaw bridge)
tts_config = _get_active_tts_config()
if tts_config and tts_config.get("engine") == "elevenlabs":
use_elevenlabs = True
voice = tts_config.get("elevenlabs_voice_id", "")
_LOGGER.debug("Synthesizing %r with ElevenLabs voice=%s", text, voice)
elif tts_config and tts_config.get("kokoro_voice"):
voice = tts_config["kokoro_voice"]
elif synthesize.voice and synthesize.voice.name:
voice = synthesize.voice.name
_LOGGER.debug("Synthesizing %r with voice=%s speed=%.1f", text, voice, self._speed)
try:
loop = asyncio.get_event_loop()
samples, sample_rate = await loop.run_in_executor(
None, lambda: self._tts.create(text, voice=voice, speed=self._speed)
)
samples_int16 = (np.clip(samples, -1.0, 1.0) * 32767).astype(np.int16)
audio_bytes = samples_int16.tobytes()
if use_elevenlabs and voice:
# ElevenLabs returns PCM 24kHz 16-bit mono
model = tts_config.get("elevenlabs_model", "eleven_multilingual_v2")
_LOGGER.info("Using ElevenLabs TTS (model=%s, voice=%s)", model, voice)
pcm_bytes = await loop.run_in_executor(
None, lambda: _synthesize_elevenlabs(text, voice, model)
)
samples_int16 = np.frombuffer(pcm_bytes, dtype=np.int16)
audio_bytes = pcm_bytes
else:
_LOGGER.debug("Synthesizing %r with Kokoro voice=%s speed=%.1f", text, voice, self._speed)
samples, sample_rate = await loop.run_in_executor(
None, lambda: self._tts.create(text, voice=voice, speed=self._speed)
)
samples_int16 = (np.clip(samples, -1.0, 1.0) * 32767).astype(np.int16)
audio_bytes = samples_int16.tobytes()
# Pre-compute lip sync frames for the entire utterance
lipsync_frames = []
if LIPSYNC_ENABLED:
lipsync_frames = _compute_lipsync_frames(samples_int16)
await self.write_event(
AudioStart(rate=SAMPLE_RATE, width=SAMPLE_WIDTH, channels=CHANNELS).event()
)
chunk_size = SAMPLE_RATE * SAMPLE_WIDTH * CHANNELS * CHUNK_SECONDS
lipsync_idx = 0
samples_per_chunk = SAMPLE_RATE * CHUNK_SECONDS
frames_per_chunk = samples_per_chunk // LIPSYNC_FRAME_SAMPLES
for i in range(0, len(audio_bytes), chunk_size):
await self.write_event(
AudioChunk(
@@ -106,8 +201,22 @@ class KokoroEventHandler(AsyncEventHandler):
).event()
)
# Send lip sync frames for this audio chunk
if LIPSYNC_ENABLED and lipsync_frames:
chunk_frames = lipsync_frames[lipsync_idx : lipsync_idx + frames_per_chunk]
for mouth_val in chunk_frames:
await asyncio.get_event_loop().run_in_executor(
None, _send_lipsync, mouth_val
)
lipsync_idx += frames_per_chunk
# Close mouth after speech
if LIPSYNC_ENABLED:
await asyncio.get_event_loop().run_in_executor(None, _send_lipsync, 0.0)
await self.write_event(AudioStop().event())
_LOGGER.info("Synthesized %.1fs of audio", len(samples) / sample_rate)
duration = len(samples_int16) / SAMPLE_RATE
_LOGGER.info("Synthesized %.1fs of audio (%d lipsync frames)", duration, len(lipsync_frames))
except Exception:
_LOGGER.exception("Synthesis error")

389
plans/OPENCLAW_SKILLS.md Normal file
View File

@@ -0,0 +1,389 @@
# OpenClaw Skills Expansion Plan
## Context
The HomeAI project has 4 custom OpenClaw skills (home-assistant, voice-assistant, image-generation, vtube-studio) and a working voice pipeline. The user wants to build 8 new skills plus a Public/Private mode system to dramatically expand what the assistant can do via voice and chat.
## Skill Format Convention
Every skill follows the established pattern from ha-ctl:
- Lives in `~/.openclaw/skills/<name>/`
- `SKILL.md` with YAML frontmatter (name, description) + agent instructions
- Optional Python CLI (stdlib only: `urllib.request`, `json`, `os`, `sys`, `re`, `datetime`)
- CLI symlinked to `/opt/homebrew/bin/` for PATH access
- Agent invokes via `exec` tool
- Entry added to `~/.openclaw/workspace/TOOLS.md` for reinforcement
- New env vars added to `homeai-agent/launchd/com.homeai.openclaw.plist`
---
## Phase A — Core Skills (no new services needed)
### 1. Memory Recall (`memory-ctl`)
**Purpose:** Let the agent actively store, search, and recall memories mid-conversation.
**Files:**
- `~/.openclaw/skills/memory/SKILL.md`
- `~/.openclaw/skills/memory/memory-ctl` → symlink `/opt/homebrew/bin/memory-ctl`
**Commands:**
```
memory-ctl add <personal|general> "<content>" [--category preference|fact|routine] [--character-id ID]
memory-ctl search "<query>" [--type personal|general] [--character-id ID]
memory-ctl list [--type personal|general] [--character-id ID] [--limit 10]
memory-ctl delete <memory_id> [--type personal|general] [--character-id ID]
```
**Details:**
- Reads/writes existing files: `~/homeai-data/memories/personal/{char_id}.json` and `general.json`
- Matches existing schema: `{"memories": [{"id": "m_<timestamp>", "content": "...", "category": "...", "createdAt": "..."}]}`
- Search: keyword token matching (split query, score by substring hits in content)
- `--character-id` defaults to `HOMEAI_CHARACTER_ID` env var or satellite-map default
- Dashboard memory UI will immediately reflect agent-created memories (same files)
- **Env vars:** `HOMEAI_CHARACTER_ID` (optional, set by bridge)
---
### 2. Service Monitor (`monitor-ctl`)
**Purpose:** "Is everything running?" → spoken health report.
**Files:**
- `~/.openclaw/skills/service-monitor/SKILL.md`
- `~/.openclaw/skills/service-monitor/monitor-ctl` → symlink `/opt/homebrew/bin/monitor-ctl`
**Commands:**
```
monitor-ctl status # All services summary
monitor-ctl check <service> # Single service (ollama, bridge, ha, tts, stt, dashboard, n8n, gitea, kuma)
monitor-ctl ollama # Ollama-specific: loaded models, VRAM
monitor-ctl docker # Docker container status (runs: docker ps --format json)
```
**Details:**
- Checks hardcoded service endpoints with 3s timeout:
- Ollama (`localhost:11434/api/ps`), Bridge (`localhost:8081/status`), Gateway (`localhost:8080/status`)
- Wyoming STT (TCP `localhost:10300`), TTS (TCP `localhost:10301`)
- Dashboard (`localhost:5173`), n8n (`localhost:5678`), Kuma (`localhost:3001`)
- HA (`10.0.0.199:8123/api/`), Gitea (`10.0.0.199:3000`)
- `ollama` subcommand parses `/api/ps` for model names, sizes, expiry
- `docker` runs `docker ps --format '{{json .}}'` via subprocess
- Pure stdlib (`urllib.request` + `socket.create_connection` for TCP)
- **Env vars:** Uses existing `HASS_TOKEN`, `HA_URL`
---
### 3. Character Switcher (`character-ctl`)
**Purpose:** "Talk to Aria" → swap persona, TTS voice, system prompt.
**Files:**
- `~/.openclaw/skills/character/SKILL.md`
- `~/.openclaw/skills/character/character-ctl` → symlink `/opt/homebrew/bin/character-ctl`
**Commands:**
```
character-ctl list # All characters (name, id, tts engine)
character-ctl active # Current default character
character-ctl switch "<name_or_id>" # Set as default
character-ctl info "<name_or_id>" # Profile summary
character-ctl map <satellite_id> <character_id> # Map satellite → character
```
**Details:**
- Reads character JSONs from `~/homeai-data/characters/`
- `switch` updates `satellite-map.json` default + writes `active-tts-voice.json`
- Fuzzy name resolution: case-insensitive match on `display_name``name``id` → partial match
- Switch takes effect on next bridge request (`SKILL.md` tells agent to inform user)
- **Env vars:** None new
---
## Phase B — Home Assistant Extensions
### 4. Routine/Scene Builder (`routine-ctl`)
**Purpose:** Create and trigger multi-device scenes and routines from voice.
**Files:**
- `~/.openclaw/skills/routine/SKILL.md`
- `~/.openclaw/skills/routine/routine-ctl` → symlink `/opt/homebrew/bin/routine-ctl`
**Commands:**
```
routine-ctl list-scenes # HA scenes
routine-ctl list-scripts # HA scripts
routine-ctl trigger "<scene_or_script>" # Activate
routine-ctl create-scene "<name>" --entities '[{"entity_id":"light.x","state":"on","brightness":128}]'
routine-ctl list-routines # Local multi-step routines
routine-ctl create-routine "<name>" --steps '[{"type":"scene","target":"movie_mode"},{"type":"delay","seconds":5},{"type":"ha","cmd":"off \"TV Backlight\""}]'
routine-ctl run "<routine_name>" # Execute steps sequentially
```
**Details:**
- HA scenes via REST API: `POST /api/services/scene/turn_on`, `POST /api/services/scene/create`
- Local routines stored in `~/homeai-data/routines/*.json`
- Step types: `scene` (trigger HA scene), `ha` (subprocess call to ha-ctl), `delay` (sleep), `tts` (curl to bridge `/api/tts`)
- `run` executes steps sequentially, reports progress
- **New data path:** `~/homeai-data/routines/`
- **Env vars:** Uses existing `HASS_TOKEN`, `HA_URL`
---
### 5. Music Control (`music-ctl`)
**Purpose:** Play/control music with multi-room and Spotify support.
**Files:**
- `~/.openclaw/skills/music/SKILL.md`
- `~/.openclaw/skills/music/music-ctl` → symlink `/opt/homebrew/bin/music-ctl`
**Commands:**
```
music-ctl players # List media_player entities
music-ctl play ["query"] [--player ID] # Play/resume (search + play if query given)
music-ctl pause [--player ID] # Pause
music-ctl next / prev [--player ID] # Skip tracks
music-ctl volume <0-100> [--player ID] # Set volume
music-ctl now-playing [--player ID] # Current track info
music-ctl queue [--player ID] # Queue contents
music-ctl shuffle <on|off> [--player ID] # Toggle shuffle
music-ctl search "<query>" # Search library
```
**Details:**
- All commands go through HA `media_player` services (same API pattern as ha-ctl)
- `play` with query uses `media_player/play_media` with `media_content_type: music`
- Spotify appears as a `media_player` entity via HA Spotify integration — no separate API needed
- `players` lists all `media_player` entities (Music Assistant zones, Spotify Connect, Chromecast, etc.)
- `--player` defaults to first active player or a configurable default
- Multi-room: Snapcast zones appear as separate `media_player` entities
- `now-playing` reads state attributes: `media_title`, `media_artist`, `media_album`, `media_position`
- **Env vars:** Uses existing `HASS_TOKEN`, `HA_URL`
- **Prerequisite:** Music Assistant Docker container configured + HA integration, OR Spotify HA integration
---
## Phase C — External Service Skills
### 6. n8n Workflow Trigger (`workflow-ctl`)
**Purpose:** List and trigger n8n workflows by voice.
**Files:**
- `~/.openclaw/skills/workflow/SKILL.md`
- `~/.openclaw/skills/workflow/workflow-ctl` → symlink `/opt/homebrew/bin/workflow-ctl`
**Commands:**
```
workflow-ctl list # All workflows (name, active, id)
workflow-ctl trigger "<name_or_id>" [--data '{"key":"val"}'] # Fire webhook
workflow-ctl status <execution_id> # Execution status
workflow-ctl history [--limit 10] # Recent executions
```
**Details:**
- n8n REST API at `http://localhost:5678/api/v1/`
- Auth via API key header: `X-N8N-API-KEY`
- `trigger` prefers webhook trigger (`POST /webhook/<path>`), falls back to `POST /api/v1/workflows/<id>/execute`
- Fuzzy name matching on workflow names
- **Env vars (new):** `N8N_URL` (default `http://localhost:5678`), `N8N_API_KEY` (generate in n8n Settings → API)
---
### 7. Gitea Integration (`gitea-ctl`)
**Purpose:** Query self-hosted repos, commits, issues, PRs.
**Files:**
- `~/.openclaw/skills/gitea/SKILL.md`
- `~/.openclaw/skills/gitea/gitea-ctl` → symlink `/opt/homebrew/bin/gitea-ctl`
**Commands:**
```
gitea-ctl repos [--limit 20] # List repos
gitea-ctl commits <owner/repo> [--limit 10] # Recent commits
gitea-ctl issues <owner/repo> [--state open] # List issues
gitea-ctl prs <owner/repo> [--state open] # List PRs
gitea-ctl create-issue <owner/repo> "<title>" [--body TEXT]
```
**Details:**
- Gitea REST API v1 at `http://10.0.0.199:3000/api/v1/`
- Auth: `Authorization: token <GITEA_TOKEN>`
- Pure stdlib `urllib.request`
- **Env vars (new):** `GITEA_URL` (default `http://10.0.0.199:3000`), `GITEA_TOKEN` (generate in Gitea → Settings → Applications)
---
### 8. Calendar/Reminders (`calendar-ctl`)
**Purpose:** Read calendar, create events, set voice reminders.
**Files:**
- `~/.openclaw/skills/calendar/SKILL.md`
- `~/.openclaw/skills/calendar/calendar-ctl` → symlink `/opt/homebrew/bin/calendar-ctl`
**Commands:**
```
calendar-ctl today [--calendar ID] # Today's events
calendar-ctl upcoming [--days 7] # Next N days
calendar-ctl add "<summary>" --start <ISO> --end <ISO> [--calendar ID]
calendar-ctl remind "<message>" --at "<time>" # Set reminder (e.g. "in 30 minutes", "at 5pm", "tomorrow 9am")
calendar-ctl reminders # List pending reminders
calendar-ctl cancel-reminder <id> # Cancel reminder
```
**Details:**
- Calendar read: `GET /api/calendars/<entity_id>?start=<ISO>&end=<ISO>` via HA API
- Calendar write: `POST /api/services/calendar/create_event`
- Reminders stored locally in `~/homeai-data/reminders.json`
- Relative time parsing with `datetime` + `re` (stdlib): "in 30 minutes", "at 5pm", "tomorrow 9am"
- Reminder daemon (`com.homeai.reminder-daemon`): Python script checking `reminders.json` every 60s, fires TTS via `POST http://localhost:8081/api/tts` when due
- **New data path:** `~/homeai-data/reminders.json`
- **New daemon:** `homeai-agent/reminder-daemon.py` + `homeai-agent/launchd/com.homeai.reminder-daemon.plist`
- **Env vars:** Uses existing `HASS_TOKEN`, `HA_URL`
---
## Phase D — Public/Private Mode System
### 9. Mode Controller (`mode-ctl`)
**Purpose:** Route requests to cloud LLMs (speed/power) or local LLMs (privacy) with per-task rules and manual toggle.
**Files:**
- `~/.openclaw/skills/mode/SKILL.md`
- `~/.openclaw/skills/mode/mode-ctl` → symlink `/opt/homebrew/bin/mode-ctl`
**Commands:**
```
mode-ctl status # Current mode + overrides
mode-ctl private # Switch to local-only
mode-ctl public # Switch to cloud LLMs
mode-ctl set-provider <anthropic|openai> # Preferred cloud provider
mode-ctl override <category> <private|public> # Per-category routing
mode-ctl list-overrides # Show all overrides
```
**State file:** `~/homeai-data/active-mode.json`
```json
{
"mode": "private",
"cloud_provider": "anthropic",
"cloud_model": "claude-sonnet-4-20250514",
"overrides": {
"web_search": "public",
"coding": "public",
"personal_finance": "private",
"health": "private"
},
"updated_at": "2026-03-17T..."
}
```
**How model routing works — Bridge modification:**
The HTTP bridge (`openclaw-http-bridge.py`) is modified to:
1. New function `load_mode()` reads `active-mode.json`
2. New function `resolve_model(mode, category=None)` returns model string
3. In `_handle_agent_request()`, after character resolution, check mode → pass `--model` flag to OpenClaw CLI
- **Private:** `ollama/qwen3.5:35b-a3b` (current default, no change)
- **Public:** `anthropic/claude-sonnet-4-20250514` or `openai/gpt-4o` (per provider setting)
**OpenClaw config changes (`openclaw.json`):** Add cloud providers to `models.providers`:
```json
"anthropic": {
"baseUrl": "https://api.anthropic.com/v1",
"apiKey": "${ANTHROPIC_API_KEY}",
"api": "anthropic",
"models": [{"id": "claude-sonnet-4-20250514", "contextWindow": 200000, "maxTokens": 8192}]
},
"openai": {
"baseUrl": "https://api.openai.com/v1",
"apiKey": "${OPENAI_API_KEY}",
"api": "openai",
"models": [{"id": "gpt-4o", "contextWindow": 128000, "maxTokens": 4096}]
}
```
**Per-task classification:** The `SKILL.md` provides a category reference table. The agent self-classifies each request and checks overrides. Default categories:
- **Always private:** personal finance, health, passwords, private conversations
- **Always public:** web search, coding help, complex reasoning, translation
- **Follow global mode:** general chat, smart home, music, calendar
**Dashboard integration:** Add mode toggle to dashboard sidebar via new Vite middleware endpoint `GET/POST /api/mode` reading/writing `active-mode.json`.
- **Env vars (new):** `ANTHROPIC_API_KEY`, `OPENAI_API_KEY` (add to OpenClaw plist)
- **Bridge file modified:** `homeai-agent/openclaw-http-bridge.py` — add ~40 lines for mode loading + model resolution
---
## Implementation Order
| # | Skill | Complexity | Dependencies |
|---|-------|-----------|-------------|
| 1 | `memory-ctl` | Simple | None |
| 2 | `monitor-ctl` | Simple | None |
| 3 | `character-ctl` | Simple | None |
| 4 | `routine-ctl` | Medium | ha-ctl existing |
| 5 | `music-ctl` | Medium | Music Assistant or Spotify in HA |
| 6 | `workflow-ctl` | Simple | n8n API key |
| 7 | `gitea-ctl` | Simple | Gitea API token |
| 8 | `calendar-ctl` | Medium | HA calendar + new reminder daemon |
| 9 | `mode-ctl` | High | Cloud API keys + bridge modification |
## Per-Skill Implementation Steps
For each skill:
1. Create `SKILL.md` with frontmatter + agent instructions + examples
2. Create Python CLI (`chmod +x`), stdlib only
3. Symlink to `/opt/homebrew/bin/`
4. Test CLI standalone: `<tool> --help`, `<tool> <command>`
5. Add env vars to `com.homeai.openclaw.plist` if needed
6. Restart OpenClaw: `launchctl kickstart -k gui/501/com.homeai.openclaw`
7. Add section to `~/.openclaw/workspace/TOOLS.md`
8. Test via: `openclaw agent --message "test prompt" --agent main`
9. Test via voice: wake word + spoken command
## Verification
- **Unit test each CLI:** Run each command manually and verify JSON output
- **Agent test:** `openclaw agent --message "remember that my favorite color is blue"` (memory-ctl)
- **Voice test:** Wake word → "Is everything running?" → spoken health report (monitor-ctl)
- **Mode test:** `mode-ctl public` → send a complex query → verify it routes to cloud model in bridge logs
- **Dashboard test:** Check memory UI shows agent-created memories, mode toggle works
- **Cross-skill test:** "Switch to Sucy and play some jazz" → character-ctl + music-ctl in one turn
## Critical Files to Modify
| File | Changes |
|------|---------|
| `~/.openclaw/workspace/TOOLS.md` | Add sections for all 9 new skills |
| `homeai-agent/openclaw-http-bridge.py` | Mode routing (Phase D only) |
| `homeai-agent/launchd/com.homeai.openclaw.plist` | New env vars |
| `~/.openclaw/openclaw.json` | Add anthropic + openai providers (Phase D) |
| `homeai-dashboard/vite.config.js` | `/api/mode` endpoint (Phase D) |
## New Files Created
- `~/.openclaw/skills/memory/` (SKILL.md + memory-ctl)
- `~/.openclaw/skills/service-monitor/` (SKILL.md + monitor-ctl)
- `~/.openclaw/skills/character/` (SKILL.md + character-ctl)
- `~/.openclaw/skills/routine/` (SKILL.md + routine-ctl)
- `~/.openclaw/skills/music/` (SKILL.md + music-ctl)
- `~/.openclaw/skills/workflow/` (SKILL.md + workflow-ctl)
- `~/.openclaw/skills/gitea/` (SKILL.md + gitea-ctl)
- `~/.openclaw/skills/calendar/` (SKILL.md + calendar-ctl + reminder-daemon.py)
- `~/.openclaw/skills/mode/` (SKILL.md + mode-ctl)
- `~/homeai-data/routines/` (directory)
- `~/homeai-data/reminders.json` (file)
- `~/homeai-data/active-mode.json` (file)
- `homeai-agent/reminder-daemon.py` + launchd plist

View File

@@ -228,11 +228,19 @@ install_service() {
log_warn "No launchd plist at $launchd_file — skipping service install."
return
fi
local plist_dest="${HOME}/Library/LaunchAgents/$(basename "$launchd_file")"
local plist_name
plist_name="$(basename "$launchd_file")"
local plist_dest="${HOME}/Library/LaunchAgents/${plist_name}"
local plist_label="${plist_name%.plist}"
local abs_source
abs_source="$(cd "$(dirname "$launchd_file")" && pwd)/$(basename "$launchd_file")"
log_step "Installing launchd agent: $name"
cp "$launchd_file" "$plist_dest"
launchctl load -w "$plist_dest"
log_success "LaunchAgent '$name' installed and loaded."
# Unload existing service if running
launchctl bootout "gui/$(id -u)/${plist_label}" 2>/dev/null || true
# Symlink so edits to repo source take effect on reload
ln -sf "$abs_source" "$plist_dest"
launchctl bootstrap "gui/$(id -u)" "$plist_dest"
log_success "LaunchAgent '$name' symlinked and loaded."
fi
}