Files
homeai/homeai-esp32/PLAN.md
Aodhan Collins 1e52c002c2 feat: Raspberry Pi 5 kitchen satellite — Wyoming voice satellite with ReSpeaker pHAT
Add full Pi 5 satellite setup with ReSpeaker 2-Mics pHAT for kitchen
voice control via Wyoming protocol. Includes satellite_wrapper.py that
monkey-patches WakeStreamingSatellite to fix three compounding bugs:

- TTS echo suppression: mutes wake word detection while speaker plays
- Server writer race fix: checks _writer before streaming, re-arms on None
- Streaming timeout: auto-recovers after 30s if pipeline hangs
- Error recovery: resets streaming state on server Error events

Also includes Pi 5 hardware workarounds (wm8960 overlay, stereo-only
audio wrappers, ALSA mixer calibration) and deploy.sh with fast
iteration commands (--push-wrapper, --test-logs).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-14 20:09:47 +00:00

224 lines
8.1 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# P6: homeai-esp32 — Room Satellite Hardware
> Phase 4 | Depends on: P1 (HA running), P3 (Wyoming STT/TTS servers running)
---
## Goal
Flash ESP32-S3-BOX-3 units with ESPHome. Each unit acts as a dumb room satellite: always-on mic, on-device wake word detection, audio playback, and a display showing assistant state via static PNG face illustrations. All intelligence stays on the Mac Mini.
---
## Hardware: ESP32-S3-BOX-3
| Feature | Spec |
|---|---|
| SoC | ESP32-S3 (dual-core Xtensa, 240MHz) |
| RAM | 512KB SRAM + 16MB PSRAM |
| Flash | 16MB |
| Display | 2.4" IPS LCD, 320×240, touchscreen (ILI9xxx, model S3BOX) |
| Audio ADC | ES7210 (dual mic array, 16kHz 16-bit) |
| Audio DAC | ES8311 (speaker output, 48kHz 16-bit) |
| Speaker | Built-in 1W |
| Connectivity | WiFi 802.11b/g/n (2.4GHz only), BT 5.0 |
| USB | USB-C (programming + power, native USB JTAG serial) |
---
## Architecture Per Unit
```
ESP32-S3-BOX-3
├── micro_wake_word (on-device, always listening)
│ └── "hey_jarvis" — triggers voice_assistant on wake detection
├── voice_assistant (ESPHome component)
│ ├── connects to Home Assistant via ESPHome API
│ ├── HA routes audio → Mac Mini Wyoming STT (10.0.0.101:10300)
│ ├── HA routes text → OpenClaw conversation agent (10.0.0.101:8081)
│ └── HA routes response → Mac Mini Wyoming TTS (10.0.0.101:10301)
├── Display (ili9xxx, model S3BOX, 320×240)
│ └── static PNG faces per state (idle, listening, thinking, replying, error)
└── ESPHome OTA
└── firmware updates over WiFi
```
---
## Pin Map (ESP32-S3-BOX-3)
| Function | Pin(s) | Notes |
|---|---|---|
| I2S LRCLK | GPIO45 | strapping pin — warning ignored |
| I2S BCLK | GPIO17 | |
| I2S MCLK | GPIO2 | |
| I2S DIN (mic) | GPIO16 | ES7210 ADC input |
| I2S DOUT (speaker) | GPIO15 | ES8311 DAC output |
| Speaker enable | GPIO46 | strapping pin — warning ignored |
| I2C SCL | GPIO18 | audio codec control bus |
| I2C SDA | GPIO8 | audio codec control bus |
| SPI CLK (display) | GPIO7 | |
| SPI MOSI (display) | GPIO6 | |
| Display CS | GPIO5 | |
| Display DC | GPIO4 | |
| Display Reset | GPIO48 | inverted |
| Backlight | GPIO47 | LEDC PWM |
| Left top button | GPIO0 | strapping pin — mute toggle / factory reset |
| Sensor dock I2C SCL | GPIO40 | sensor bus (AHT-30, AT581x radar) |
| Sensor dock I2C SDA | GPIO41 | sensor bus (AHT-30, AT581x radar) |
| Radar presence output | GPIO21 | AT581x digital detection pin |
---
## ESPHome Configuration
### Platform & Framework
```yaml
esp32:
board: esp32s3box
flash_size: 16MB
cpu_frequency: 240MHz
framework:
type: esp-idf
sdkconfig_options:
CONFIG_ESP32S3_DEFAULT_CPU_FREQ_240: "y"
CONFIG_ESP32S3_DATA_CACHE_64KB: "y"
CONFIG_ESP32S3_DATA_CACHE_LINE_64B: "y"
psram:
mode: octal
speed: 80MHz
```
### Audio Stack
Uses `i2s_audio` platform with external ADC/DAC codec chips:
- **Microphone**: ES7210 ADC via I2S, 16kHz 16-bit mono
- **Speaker**: ES8311 DAC via I2S, 48kHz 16-bit mono (left channel)
- **Media player**: wraps speaker with volume control (min 50%, max 85%)
### Wake Word
On-device `micro_wake_word` component with `hey_jarvis` model. Can optionally be switched to Home Assistant streaming wake word via a selector entity.
### Display
`ili9xxx` platform with model `S3BOX`. Uses `update_interval: never` — display updates are triggered by scripts on voice assistant state changes. Static 320×240 PNG images for each state are compiled into firmware. No text overlays — voice-only interaction.
Screen auto-dims after a configurable idle timeout (default 1 min, adjustable 160 min via HA entity). Wakes on voice activity or radar presence detection.
### Sensor Dock (ESP32-S3-BOX-3-SENSOR)
Optional accessory dock connected via secondary I2C bus (GPIO40/41, 100kHz):
- **AHT-30** (temp/humidity) — `aht10` component with variant AHT20, 30s update interval
- **AT581x mmWave radar** — presence detection via GPIO21, I2C for settings config
- **Radar RF switch** — toggle radar on/off from HA
- Radar configured on boot: sensing_distance=600, trigger_keep=5s, hw_frontend_reset=true
### Voice Assistant
ESPHome's `voice_assistant` component connects to HA via the ESPHome native API (not directly to Wyoming). HA orchestrates the pipeline:
1. Audio → Wyoming STT (Mac Mini) → text
2. Text → OpenClaw conversation agent → response
3. Response → Wyoming TTS (Mac Mini) → audio back to ESP32
---
## Directory Layout
```
homeai-esp32/
├── PLAN.md
├── setup.sh # env check + flash/ota/logs commands
└── esphome/
├── secrets.yaml # gitignored — WiFi + API key
├── homeai-living-room.yaml # first unit (full config)
├── homeai-bedroom.yaml # future: copy + change substitutions
├── homeai-kitchen.yaml # future: copy + change substitutions
└── illustrations/ # 320×240 PNG face images
├── idle.png
├── loading.png
├── listening.png
├── thinking.png
├── replying.png
├── error.png
└── timer_finished.png
```
---
## ESPHome Environment
```bash
# Dedicated venv (Python 3.12) — do NOT share with voice/whisper venvs
~/homeai-esphome-env/bin/esphome version # ESPHome 2026.2.4+
# Quick commands
cd ~/gitea/homeai/homeai-esp32
~/homeai-esphome-env/bin/esphome run esphome/homeai-living-room.yaml # compile + flash
~/homeai-esphome-env/bin/esphome logs esphome/homeai-living-room.yaml # stream logs
# Or use the setup script
./setup.sh flash # compile + USB flash
./setup.sh ota # compile + OTA update
./setup.sh logs # stream device logs
./setup.sh validate # check YAML without compiling
```
---
## Wake Word Options
| Option | Latency | Privacy | Effort |
|---|---|---|---|
| `hey_jarvis` (built-in micro_wake_word) | ~200ms | On-device | Zero |
| Custom word (trained model) | ~200ms | On-device | High — requires 50+ recordings |
| HA streaming wake word | ~500ms | On Mac Mini | Medium — stream all audio |
**Current**: `hey_jarvis` on-device. Train a custom word (character's name) once finalised.
---
## Implementation Steps
- [x] Install ESPHome in `~/homeai-esphome-env` (Python 3.12)
- [x] Write `esphome/secrets.yaml` (gitignored)
- [x] Write `homeai-living-room.yaml` (based on official S3-BOX-3 reference config)
- [x] Generate placeholder face illustrations (7 PNGs, 320×240)
- [x] Write `setup.sh` with flash/ota/logs/validate commands
- [x] Write `deploy.sh` with OTA deploy, image management, multi-unit support
- [x] Flash first unit via USB (living room)
- [x] Verify unit appears in HA device list
- [x] Assign Wyoming voice pipeline to unit in HA
- [x] Test: speak wake word → transcription → LLM response → spoken reply
- [x] Test: display cycles through idle → listening → thinking → replying
- [x] Verify OTA update works: change config, deploy wirelessly
- [ ] Write config templates for remaining rooms (bedroom, kitchen)
- [ ] Flash remaining units, verify each works independently
- [ ] Document final MAC address → room name mapping
---
## Success Criteria
- [ ] Wake word "hey jarvis" triggers pipeline reliably from 3m distance
- [ ] STT transcription accuracy >90% for clear speech in quiet room
- [ ] TTS audio plays clearly through ESP32 speaker
- [ ] Display shows correct state for idle / listening / thinking / replying / error / muted
- [ ] OTA firmware updates work without USB cable
- [ ] Unit reconnects automatically after WiFi drop
- [ ] Unit survives power cycle and resumes normal operation
---
## Known Constraints
- **Memory**: voice_assistant + micro_wake_word + display + sensor dock is near the limit. Do NOT add Bluetooth or LVGL widgets — they will cause crashes.
- **WiFi**: 2.4GHz only. 5GHz networks are not supported.
- **Speaker**: 1W built-in. Volume capped at 85% to avoid distortion.
- **Display**: Static PNGs compiled into firmware. To change images, reflash via OTA (~1-2 min).
- **First compile**: Downloads ESP-IDF toolchain (~500MB), takes 5-10 minutes. Incremental builds are 1-2 minutes.