Files
homeai/homeai-esp32/PLAN.md
Aodhan Collins 38247d7cc4 Initial project structure and planning docs
Full project plan across 8 sub-projects (homeai-infra, homeai-llm,
homeai-voice, homeai-agent, homeai-character, homeai-esp32,
homeai-visual, homeai-images). Includes per-project PLAN.md files,
top-level PROJECT_PLAN.md, and master TODO.md.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-04 01:11:37 +00:00

358 lines
8.4 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# P6: homeai-esp32 — Room Satellite Hardware
> Phase 4 | Depends on: P1 (HA running), P3 (Wyoming STT/TTS servers running)
---
## Goal
Flash ESP32-S3-BOX-3 units with ESPHome. Each unit acts as a dumb room satellite: always-on mic, local wake word detection, audio playback, and an LVGL animated face showing assistant state. All intelligence stays on the Mac Mini.
---
## Hardware: ESP32-S3-BOX-3
| Feature | Spec |
|---|---|
| SoC | ESP32-S3 (dual-core Xtensa, 240MHz) |
| RAM | 512KB SRAM + 16MB PSRAM |
| Flash | 16MB |
| Display | 2.4" IPS LCD, 320×240, touchscreen |
| Mic | Dual microphone array |
| Speaker | Built-in 1W speaker |
| Connectivity | WiFi 802.11b/g/n, BT 5.0 |
| USB | USB-C (programming + power) |
---
## Architecture Per Unit
```
ESP32-S3-BOX-3
├── microWakeWord (on-device, always listening)
│ └── triggers Wyoming Satellite on wake detection
├── Wyoming Satellite
│ ├── streams mic audio → Mac Mini Wyoming STT (port 10300)
│ └── receives TTS audio ← Mac Mini Wyoming TTS (port 10301)
├── LVGL Display
│ └── animated face, driven by HA entity state
└── ESPHome OTA
└── firmware updates over WiFi
```
---
## ESPHome Configuration
### Base Config Template
`esphome/base.yaml` — shared across all units:
```yaml
esphome:
name: homeai-${room}
friendly_name: "HomeAI ${room_display}"
platform: esp32
board: esp32-s3-box-3
wifi:
ssid: !secret wifi_ssid
password: !secret wifi_password
ap:
ssid: "HomeAI Fallback"
api:
encryption:
key: !secret api_key
ota:
password: !secret ota_password
logger:
level: INFO
```
### Room-Specific Config
`esphome/s3-box-living-room.yaml`:
```yaml
substitutions:
room: living-room
room_display: "Living Room"
mac_mini_ip: "192.168.1.x" # or Tailscale IP
packages:
base: !include base.yaml
voice: !include voice.yaml
display: !include display.yaml
```
One file per room, only the substitutions change.
### Voice / Wyoming Satellite — `esphome/voice.yaml`
```yaml
microphone:
- platform: esp_adf
id: mic
speaker:
- platform: esp_adf
id: spk
micro_wake_word:
model: hey_jarvis # or custom model path
on_wake_word_detected:
- voice_assistant.start:
voice_assistant:
microphone: mic
speaker: spk
noise_suppression_level: 2
auto_gain: 31dBFS
volume_multiplier: 2.0
on_listening:
- display.page.show: page_listening
- script.execute: animate_face_listening
on_stt_vad_end:
- display.page.show: page_thinking
- script.execute: animate_face_thinking
on_tts_start:
- display.page.show: page_speaking
- script.execute: animate_face_speaking
on_end:
- display.page.show: page_idle
- script.execute: animate_face_idle
on_error:
- display.page.show: page_error
- script.execute: animate_face_error
```
**Note:** ESPHome's `voice_assistant` component connects to HA, which routes to Wyoming STT/TTS on the Mac Mini. This is the standard ESPHome → HA → Wyoming path.
### LVGL Display — `esphome/display.yaml`
```yaml
display:
- platform: ili9xxx
model: ILI9341
id: lcd
cs_pin: GPIO5
dc_pin: GPIO4
reset_pin: GPIO48
touchscreen:
- platform: tt21100
id: touch
lvgl:
displays:
- lcd
touchscreens:
- touch
# Face widget — centered on screen
widgets:
- obj:
id: face_container
width: 320
height: 240
bg_color: 0x000000
children:
# Eyes (two circles)
- obj:
id: eye_left
x: 90
y: 90
width: 50
height: 50
radius: 25
bg_color: 0xFFFFFF
- obj:
id: eye_right
x: 180
y: 90
width: 50
height: 50
radius: 25
bg_color: 0xFFFFFF
# Mouth (line/arc)
- arc:
id: mouth
x: 110
y: 160
width: 100
height: 40
start_angle: 180
end_angle: 360
arc_color: 0xFFFFFF
pages:
- id: page_idle
- id: page_listening
- id: page_thinking
- id: page_speaking
- id: page_error
```
### LVGL Face State Animations — `esphome/animations.yaml`
```yaml
script:
- id: animate_face_idle
then:
- lvgl.widget.modify:
id: eye_left
height: 50 # normal open
- lvgl.widget.modify:
id: eye_right
height: 50
- lvgl.widget.modify:
id: mouth
arc_color: 0xFFFFFF
- id: animate_face_listening
then:
- lvgl.widget.modify:
id: eye_left
height: 60 # wider eyes
- lvgl.widget.modify:
id: eye_right
height: 60
- lvgl.widget.modify:
id: mouth
arc_color: 0x00BFFF # blue tint
- id: animate_face_thinking
then:
- lvgl.widget.modify:
id: eye_left
height: 20 # squinting
- lvgl.widget.modify:
id: eye_right
height: 20
- id: animate_face_speaking
then:
- lvgl.widget.modify:
id: mouth
arc_color: 0x00FF88 # green speaking indicator
- id: animate_face_error
then:
- lvgl.widget.modify:
id: eye_left
bg_color: 0xFF2200 # red eyes
- lvgl.widget.modify:
id: eye_right
bg_color: 0xFF2200
```
> **Note:** True lip-sync animation (mouth moving with audio) is complex on ESP32. Phase 1: static states. Phase 2: amplitude-driven mouth height using speaker volume feedback.
---
## Secrets File
`esphome/secrets.yaml` (gitignored):
```yaml
wifi_ssid: "YourNetwork"
wifi_password: "YourPassword"
api_key: "<32-byte base64 key>"
ota_password: "YourOTAPassword"
```
---
## Flash & Deployment Workflow
```bash
# Install ESPHome
pip install esphome
# Compile + flash via USB (first time)
esphome run esphome/s3-box-living-room.yaml
# OTA update (subsequent)
esphome upload esphome/s3-box-living-room.yaml --device <device-ip>
# View logs
esphome logs esphome/s3-box-living-room.yaml
```
---
## Home Assistant Integration
After flashing:
1. HA discovers ESP32 automatically via mDNS
2. Add device in HA → Settings → Devices
3. Assign Wyoming voice assistant pipeline to the device
4. Set up room-specific automations (e.g., "Living Room" light control from that satellite)
---
## Directory Layout
```
homeai-esp32/
└── esphome/
├── base.yaml
├── voice.yaml
├── display.yaml
├── animations.yaml
├── s3-box-living-room.yaml
├── s3-box-bedroom.yaml # template, fill in when hardware available
├── s3-box-kitchen.yaml # template
└── secrets.yaml # gitignored
```
---
## Wake Word Decisions
| Option | Latency | Privacy | Effort |
|---|---|---|---|
| `hey_jarvis` (built-in microWakeWord) | ~200ms | On-device | Zero |
| Custom word (trained model) | ~200ms | On-device | High — requires 50+ recordings |
| Mac Mini openWakeWord (stream audio) | ~500ms | On Mac | Medium |
**Recommendation:** Start with `hey_jarvis`. Train a custom word (character's name) once character name is finalised.
---
## Implementation Steps
- [ ] Install ESPHome: `pip install esphome`
- [ ] Write `esphome/secrets.yaml` (gitignored)
- [ ] Write `base.yaml`, `voice.yaml`, `display.yaml`, `animations.yaml`
- [ ] Write `s3-box-living-room.yaml` for first unit
- [ ] Flash first unit via USB: `esphome run s3-box-living-room.yaml`
- [ ] Verify unit appears in HA device list
- [ ] Assign Wyoming voice pipeline to unit in HA
- [ ] Test: speak wake word → transcription → LLM response → spoken reply
- [ ] Test: LVGL face cycles through idle → listening → thinking → speaking
- [ ] Verify OTA update works: change LVGL color, deploy wirelessly
- [ ] Write config templates for remaining rooms (bedroom, kitchen)
- [ ] Flash remaining units, verify each works independently
- [ ] Document final MAC address → room name mapping
---
## Success Criteria
- [ ] Wake word "hey jarvis" triggers pipeline reliably from 3m distance
- [ ] STT transcription accuracy >90% for clear speech in quiet room
- [ ] TTS audio plays clearly through ESP32 speaker
- [ ] LVGL face shows correct state for idle / listening / thinking / speaking / error
- [ ] OTA firmware updates work without USB cable
- [ ] Unit reconnects automatically after WiFi drop
- [ ] Unit survives power cycle and resumes normal operation