# P6: homeai-esp32 — Room Satellite Hardware > Phase 4 | Depends on: P1 (HA running), P3 (Wyoming STT/TTS servers running) --- ## Goal Flash ESP32-S3-BOX-3 units with ESPHome. Each unit acts as a dumb room satellite: always-on mic, local wake word detection, audio playback, and an LVGL animated face showing assistant state. All intelligence stays on the Mac Mini. --- ## Hardware: ESP32-S3-BOX-3 | Feature | Spec | |---|---| | SoC | ESP32-S3 (dual-core Xtensa, 240MHz) | | RAM | 512KB SRAM + 16MB PSRAM | | Flash | 16MB | | Display | 2.4" IPS LCD, 320×240, touchscreen | | Mic | Dual microphone array | | Speaker | Built-in 1W speaker | | Connectivity | WiFi 802.11b/g/n, BT 5.0 | | USB | USB-C (programming + power) | --- ## Architecture Per Unit ``` ESP32-S3-BOX-3 ├── microWakeWord (on-device, always listening) │ └── triggers Wyoming Satellite on wake detection ├── Wyoming Satellite │ ├── streams mic audio → Mac Mini Wyoming STT (port 10300) │ └── receives TTS audio ← Mac Mini Wyoming TTS (port 10301) ├── LVGL Display │ └── animated face, driven by HA entity state └── ESPHome OTA └── firmware updates over WiFi ``` --- ## ESPHome Configuration ### Base Config Template `esphome/base.yaml` — shared across all units: ```yaml esphome: name: homeai-${room} friendly_name: "HomeAI ${room_display}" platform: esp32 board: esp32-s3-box-3 wifi: ssid: !secret wifi_ssid password: !secret wifi_password ap: ssid: "HomeAI Fallback" api: encryption: key: !secret api_key ota: password: !secret ota_password logger: level: INFO ``` ### Room-Specific Config `esphome/s3-box-living-room.yaml`: ```yaml substitutions: room: living-room room_display: "Living Room" mac_mini_ip: "192.168.1.x" # or Tailscale IP packages: base: !include base.yaml voice: !include voice.yaml display: !include display.yaml ``` One file per room, only the substitutions change. ### Voice / Wyoming Satellite — `esphome/voice.yaml` ```yaml microphone: - platform: esp_adf id: mic speaker: - platform: esp_adf id: spk micro_wake_word: model: hey_jarvis # or custom model path on_wake_word_detected: - voice_assistant.start: voice_assistant: microphone: mic speaker: spk noise_suppression_level: 2 auto_gain: 31dBFS volume_multiplier: 2.0 on_listening: - display.page.show: page_listening - script.execute: animate_face_listening on_stt_vad_end: - display.page.show: page_thinking - script.execute: animate_face_thinking on_tts_start: - display.page.show: page_speaking - script.execute: animate_face_speaking on_end: - display.page.show: page_idle - script.execute: animate_face_idle on_error: - display.page.show: page_error - script.execute: animate_face_error ``` **Note:** ESPHome's `voice_assistant` component connects to HA, which routes to Wyoming STT/TTS on the Mac Mini. This is the standard ESPHome → HA → Wyoming path. ### LVGL Display — `esphome/display.yaml` ```yaml display: - platform: ili9xxx model: ILI9341 id: lcd cs_pin: GPIO5 dc_pin: GPIO4 reset_pin: GPIO48 touchscreen: - platform: tt21100 id: touch lvgl: displays: - lcd touchscreens: - touch # Face widget — centered on screen widgets: - obj: id: face_container width: 320 height: 240 bg_color: 0x000000 children: # Eyes (two circles) - obj: id: eye_left x: 90 y: 90 width: 50 height: 50 radius: 25 bg_color: 0xFFFFFF - obj: id: eye_right x: 180 y: 90 width: 50 height: 50 radius: 25 bg_color: 0xFFFFFF # Mouth (line/arc) - arc: id: mouth x: 110 y: 160 width: 100 height: 40 start_angle: 180 end_angle: 360 arc_color: 0xFFFFFF pages: - id: page_idle - id: page_listening - id: page_thinking - id: page_speaking - id: page_error ``` ### LVGL Face State Animations — `esphome/animations.yaml` ```yaml script: - id: animate_face_idle then: - lvgl.widget.modify: id: eye_left height: 50 # normal open - lvgl.widget.modify: id: eye_right height: 50 - lvgl.widget.modify: id: mouth arc_color: 0xFFFFFF - id: animate_face_listening then: - lvgl.widget.modify: id: eye_left height: 60 # wider eyes - lvgl.widget.modify: id: eye_right height: 60 - lvgl.widget.modify: id: mouth arc_color: 0x00BFFF # blue tint - id: animate_face_thinking then: - lvgl.widget.modify: id: eye_left height: 20 # squinting - lvgl.widget.modify: id: eye_right height: 20 - id: animate_face_speaking then: - lvgl.widget.modify: id: mouth arc_color: 0x00FF88 # green speaking indicator - id: animate_face_error then: - lvgl.widget.modify: id: eye_left bg_color: 0xFF2200 # red eyes - lvgl.widget.modify: id: eye_right bg_color: 0xFF2200 ``` > **Note:** True lip-sync animation (mouth moving with audio) is complex on ESP32. Phase 1: static states. Phase 2: amplitude-driven mouth height using speaker volume feedback. --- ## Secrets File `esphome/secrets.yaml` (gitignored): ```yaml wifi_ssid: "YourNetwork" wifi_password: "YourPassword" api_key: "<32-byte base64 key>" ota_password: "YourOTAPassword" ``` --- ## Flash & Deployment Workflow ```bash # Install ESPHome pip install esphome # Compile + flash via USB (first time) esphome run esphome/s3-box-living-room.yaml # OTA update (subsequent) esphome upload esphome/s3-box-living-room.yaml --device # View logs esphome logs esphome/s3-box-living-room.yaml ``` --- ## Home Assistant Integration After flashing: 1. HA discovers ESP32 automatically via mDNS 2. Add device in HA → Settings → Devices 3. Assign Wyoming voice assistant pipeline to the device 4. Set up room-specific automations (e.g., "Living Room" light control from that satellite) --- ## Directory Layout ``` homeai-esp32/ └── esphome/ ├── base.yaml ├── voice.yaml ├── display.yaml ├── animations.yaml ├── s3-box-living-room.yaml ├── s3-box-bedroom.yaml # template, fill in when hardware available ├── s3-box-kitchen.yaml # template └── secrets.yaml # gitignored ``` --- ## Wake Word Decisions | Option | Latency | Privacy | Effort | |---|---|---|---| | `hey_jarvis` (built-in microWakeWord) | ~200ms | On-device | Zero | | Custom word (trained model) | ~200ms | On-device | High — requires 50+ recordings | | Mac Mini openWakeWord (stream audio) | ~500ms | On Mac | Medium | **Recommendation:** Start with `hey_jarvis`. Train a custom word (character's name) once character name is finalised. --- ## Implementation Steps - [ ] Install ESPHome: `pip install esphome` - [ ] Write `esphome/secrets.yaml` (gitignored) - [ ] Write `base.yaml`, `voice.yaml`, `display.yaml`, `animations.yaml` - [ ] Write `s3-box-living-room.yaml` for first unit - [ ] Flash first unit via USB: `esphome run s3-box-living-room.yaml` - [ ] Verify unit appears in HA device list - [ ] Assign Wyoming voice pipeline to unit in HA - [ ] Test: speak wake word → transcription → LLM response → spoken reply - [ ] Test: LVGL face cycles through idle → listening → thinking → speaking - [ ] Verify OTA update works: change LVGL color, deploy wirelessly - [ ] Write config templates for remaining rooms (bedroom, kitchen) - [ ] Flash remaining units, verify each works independently - [ ] Document final MAC address → room name mapping --- ## Success Criteria - [ ] Wake word "hey jarvis" triggers pipeline reliably from 3m distance - [ ] STT transcription accuracy >90% for clear speech in quiet room - [ ] TTS audio plays clearly through ESP32 speaker - [ ] LVGL face shows correct state for idle / listening / thinking / speaking / error - [ ] OTA firmware updates work without USB cable - [ ] Unit reconnects automatically after WiFi drop - [ ] Unit survives power cycle and resumes normal operation