Full project plan across 8 sub-projects (homeai-infra, homeai-llm, homeai-voice, homeai-agent, homeai-character, homeai-esp32, homeai-visual, homeai-images). Includes per-project PLAN.md files, top-level PROJECT_PLAN.md, and master TODO.md. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
358 lines
8.4 KiB
Markdown
358 lines
8.4 KiB
Markdown
# P6: homeai-esp32 — Room Satellite Hardware
|
||
|
||
> Phase 4 | Depends on: P1 (HA running), P3 (Wyoming STT/TTS servers running)
|
||
|
||
---
|
||
|
||
## Goal
|
||
|
||
Flash ESP32-S3-BOX-3 units with ESPHome. Each unit acts as a dumb room satellite: always-on mic, local wake word detection, audio playback, and an LVGL animated face showing assistant state. All intelligence stays on the Mac Mini.
|
||
|
||
---
|
||
|
||
## Hardware: ESP32-S3-BOX-3
|
||
|
||
| Feature | Spec |
|
||
|---|---|
|
||
| SoC | ESP32-S3 (dual-core Xtensa, 240MHz) |
|
||
| RAM | 512KB SRAM + 16MB PSRAM |
|
||
| Flash | 16MB |
|
||
| Display | 2.4" IPS LCD, 320×240, touchscreen |
|
||
| Mic | Dual microphone array |
|
||
| Speaker | Built-in 1W speaker |
|
||
| Connectivity | WiFi 802.11b/g/n, BT 5.0 |
|
||
| USB | USB-C (programming + power) |
|
||
|
||
---
|
||
|
||
## Architecture Per Unit
|
||
|
||
```
|
||
ESP32-S3-BOX-3
|
||
├── microWakeWord (on-device, always listening)
|
||
│ └── triggers Wyoming Satellite on wake detection
|
||
├── Wyoming Satellite
|
||
│ ├── streams mic audio → Mac Mini Wyoming STT (port 10300)
|
||
│ └── receives TTS audio ← Mac Mini Wyoming TTS (port 10301)
|
||
├── LVGL Display
|
||
│ └── animated face, driven by HA entity state
|
||
└── ESPHome OTA
|
||
└── firmware updates over WiFi
|
||
```
|
||
|
||
---
|
||
|
||
## ESPHome Configuration
|
||
|
||
### Base Config Template
|
||
|
||
`esphome/base.yaml` — shared across all units:
|
||
|
||
```yaml
|
||
esphome:
|
||
name: homeai-${room}
|
||
friendly_name: "HomeAI ${room_display}"
|
||
platform: esp32
|
||
board: esp32-s3-box-3
|
||
|
||
wifi:
|
||
ssid: !secret wifi_ssid
|
||
password: !secret wifi_password
|
||
ap:
|
||
ssid: "HomeAI Fallback"
|
||
|
||
api:
|
||
encryption:
|
||
key: !secret api_key
|
||
|
||
ota:
|
||
password: !secret ota_password
|
||
|
||
logger:
|
||
level: INFO
|
||
```
|
||
|
||
### Room-Specific Config
|
||
|
||
`esphome/s3-box-living-room.yaml`:
|
||
|
||
```yaml
|
||
substitutions:
|
||
room: living-room
|
||
room_display: "Living Room"
|
||
mac_mini_ip: "192.168.1.x" # or Tailscale IP
|
||
|
||
packages:
|
||
base: !include base.yaml
|
||
voice: !include voice.yaml
|
||
display: !include display.yaml
|
||
```
|
||
|
||
One file per room, only the substitutions change.
|
||
|
||
### Voice / Wyoming Satellite — `esphome/voice.yaml`
|
||
|
||
```yaml
|
||
microphone:
|
||
- platform: esp_adf
|
||
id: mic
|
||
|
||
speaker:
|
||
- platform: esp_adf
|
||
id: spk
|
||
|
||
micro_wake_word:
|
||
model: hey_jarvis # or custom model path
|
||
on_wake_word_detected:
|
||
- voice_assistant.start:
|
||
|
||
voice_assistant:
|
||
microphone: mic
|
||
speaker: spk
|
||
noise_suppression_level: 2
|
||
auto_gain: 31dBFS
|
||
volume_multiplier: 2.0
|
||
|
||
on_listening:
|
||
- display.page.show: page_listening
|
||
- script.execute: animate_face_listening
|
||
|
||
on_stt_vad_end:
|
||
- display.page.show: page_thinking
|
||
- script.execute: animate_face_thinking
|
||
|
||
on_tts_start:
|
||
- display.page.show: page_speaking
|
||
- script.execute: animate_face_speaking
|
||
|
||
on_end:
|
||
- display.page.show: page_idle
|
||
- script.execute: animate_face_idle
|
||
|
||
on_error:
|
||
- display.page.show: page_error
|
||
- script.execute: animate_face_error
|
||
```
|
||
|
||
**Note:** ESPHome's `voice_assistant` component connects to HA, which routes to Wyoming STT/TTS on the Mac Mini. This is the standard ESPHome → HA → Wyoming path.
|
||
|
||
### LVGL Display — `esphome/display.yaml`
|
||
|
||
```yaml
|
||
display:
|
||
- platform: ili9xxx
|
||
model: ILI9341
|
||
id: lcd
|
||
cs_pin: GPIO5
|
||
dc_pin: GPIO4
|
||
reset_pin: GPIO48
|
||
|
||
touchscreen:
|
||
- platform: tt21100
|
||
id: touch
|
||
|
||
lvgl:
|
||
displays:
|
||
- lcd
|
||
touchscreens:
|
||
- touch
|
||
|
||
# Face widget — centered on screen
|
||
widgets:
|
||
- obj:
|
||
id: face_container
|
||
width: 320
|
||
height: 240
|
||
bg_color: 0x000000
|
||
children:
|
||
# Eyes (two circles)
|
||
- obj:
|
||
id: eye_left
|
||
x: 90
|
||
y: 90
|
||
width: 50
|
||
height: 50
|
||
radius: 25
|
||
bg_color: 0xFFFFFF
|
||
- obj:
|
||
id: eye_right
|
||
x: 180
|
||
y: 90
|
||
width: 50
|
||
height: 50
|
||
radius: 25
|
||
bg_color: 0xFFFFFF
|
||
# Mouth (line/arc)
|
||
- arc:
|
||
id: mouth
|
||
x: 110
|
||
y: 160
|
||
width: 100
|
||
height: 40
|
||
start_angle: 180
|
||
end_angle: 360
|
||
arc_color: 0xFFFFFF
|
||
|
||
pages:
|
||
- id: page_idle
|
||
- id: page_listening
|
||
- id: page_thinking
|
||
- id: page_speaking
|
||
- id: page_error
|
||
```
|
||
|
||
### LVGL Face State Animations — `esphome/animations.yaml`
|
||
|
||
```yaml
|
||
script:
|
||
- id: animate_face_idle
|
||
then:
|
||
- lvgl.widget.modify:
|
||
id: eye_left
|
||
height: 50 # normal open
|
||
- lvgl.widget.modify:
|
||
id: eye_right
|
||
height: 50
|
||
- lvgl.widget.modify:
|
||
id: mouth
|
||
arc_color: 0xFFFFFF
|
||
|
||
- id: animate_face_listening
|
||
then:
|
||
- lvgl.widget.modify:
|
||
id: eye_left
|
||
height: 60 # wider eyes
|
||
- lvgl.widget.modify:
|
||
id: eye_right
|
||
height: 60
|
||
- lvgl.widget.modify:
|
||
id: mouth
|
||
arc_color: 0x00BFFF # blue tint
|
||
|
||
- id: animate_face_thinking
|
||
then:
|
||
- lvgl.widget.modify:
|
||
id: eye_left
|
||
height: 20 # squinting
|
||
- lvgl.widget.modify:
|
||
id: eye_right
|
||
height: 20
|
||
|
||
- id: animate_face_speaking
|
||
then:
|
||
- lvgl.widget.modify:
|
||
id: mouth
|
||
arc_color: 0x00FF88 # green speaking indicator
|
||
|
||
- id: animate_face_error
|
||
then:
|
||
- lvgl.widget.modify:
|
||
id: eye_left
|
||
bg_color: 0xFF2200 # red eyes
|
||
- lvgl.widget.modify:
|
||
id: eye_right
|
||
bg_color: 0xFF2200
|
||
```
|
||
|
||
> **Note:** True lip-sync animation (mouth moving with audio) is complex on ESP32. Phase 1: static states. Phase 2: amplitude-driven mouth height using speaker volume feedback.
|
||
|
||
---
|
||
|
||
## Secrets File
|
||
|
||
`esphome/secrets.yaml` (gitignored):
|
||
|
||
```yaml
|
||
wifi_ssid: "YourNetwork"
|
||
wifi_password: "YourPassword"
|
||
api_key: "<32-byte base64 key>"
|
||
ota_password: "YourOTAPassword"
|
||
```
|
||
|
||
---
|
||
|
||
## Flash & Deployment Workflow
|
||
|
||
```bash
|
||
# Install ESPHome
|
||
pip install esphome
|
||
|
||
# Compile + flash via USB (first time)
|
||
esphome run esphome/s3-box-living-room.yaml
|
||
|
||
# OTA update (subsequent)
|
||
esphome upload esphome/s3-box-living-room.yaml --device <device-ip>
|
||
|
||
# View logs
|
||
esphome logs esphome/s3-box-living-room.yaml
|
||
```
|
||
|
||
---
|
||
|
||
## Home Assistant Integration
|
||
|
||
After flashing:
|
||
1. HA discovers ESP32 automatically via mDNS
|
||
2. Add device in HA → Settings → Devices
|
||
3. Assign Wyoming voice assistant pipeline to the device
|
||
4. Set up room-specific automations (e.g., "Living Room" light control from that satellite)
|
||
|
||
---
|
||
|
||
## Directory Layout
|
||
|
||
```
|
||
homeai-esp32/
|
||
└── esphome/
|
||
├── base.yaml
|
||
├── voice.yaml
|
||
├── display.yaml
|
||
├── animations.yaml
|
||
├── s3-box-living-room.yaml
|
||
├── s3-box-bedroom.yaml # template, fill in when hardware available
|
||
├── s3-box-kitchen.yaml # template
|
||
└── secrets.yaml # gitignored
|
||
```
|
||
|
||
---
|
||
|
||
## Wake Word Decisions
|
||
|
||
| Option | Latency | Privacy | Effort |
|
||
|---|---|---|---|
|
||
| `hey_jarvis` (built-in microWakeWord) | ~200ms | On-device | Zero |
|
||
| Custom word (trained model) | ~200ms | On-device | High — requires 50+ recordings |
|
||
| Mac Mini openWakeWord (stream audio) | ~500ms | On Mac | Medium |
|
||
|
||
**Recommendation:** Start with `hey_jarvis`. Train a custom word (character's name) once character name is finalised.
|
||
|
||
---
|
||
|
||
## Implementation Steps
|
||
|
||
- [ ] Install ESPHome: `pip install esphome`
|
||
- [ ] Write `esphome/secrets.yaml` (gitignored)
|
||
- [ ] Write `base.yaml`, `voice.yaml`, `display.yaml`, `animations.yaml`
|
||
- [ ] Write `s3-box-living-room.yaml` for first unit
|
||
- [ ] Flash first unit via USB: `esphome run s3-box-living-room.yaml`
|
||
- [ ] Verify unit appears in HA device list
|
||
- [ ] Assign Wyoming voice pipeline to unit in HA
|
||
- [ ] Test: speak wake word → transcription → LLM response → spoken reply
|
||
- [ ] Test: LVGL face cycles through idle → listening → thinking → speaking
|
||
- [ ] Verify OTA update works: change LVGL color, deploy wirelessly
|
||
- [ ] Write config templates for remaining rooms (bedroom, kitchen)
|
||
- [ ] Flash remaining units, verify each works independently
|
||
- [ ] Document final MAC address → room name mapping
|
||
|
||
---
|
||
|
||
## Success Criteria
|
||
|
||
- [ ] Wake word "hey jarvis" triggers pipeline reliably from 3m distance
|
||
- [ ] STT transcription accuracy >90% for clear speech in quiet room
|
||
- [ ] TTS audio plays clearly through ESP32 speaker
|
||
- [ ] LVGL face shows correct state for idle / listening / thinking / speaking / error
|
||
- [ ] OTA firmware updates work without USB cable
|
||
- [ ] Unit reconnects automatically after WiFi drop
|
||
- [ ] Unit survives power cycle and resumes normal operation
|