Initial project structure and planning docs
Full project plan across 8 sub-projects (homeai-infra, homeai-llm, homeai-voice, homeai-agent, homeai-character, homeai-esp32, homeai-visual, homeai-images). Includes per-project PLAN.md files, top-level PROJECT_PLAN.md, and master TODO.md. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
357
homeai-esp32/PLAN.md
Normal file
357
homeai-esp32/PLAN.md
Normal file
@@ -0,0 +1,357 @@
|
||||
# P6: homeai-esp32 — Room Satellite Hardware
|
||||
|
||||
> Phase 4 | Depends on: P1 (HA running), P3 (Wyoming STT/TTS servers running)
|
||||
|
||||
---
|
||||
|
||||
## Goal
|
||||
|
||||
Flash ESP32-S3-BOX-3 units with ESPHome. Each unit acts as a dumb room satellite: always-on mic, local wake word detection, audio playback, and an LVGL animated face showing assistant state. All intelligence stays on the Mac Mini.
|
||||
|
||||
---
|
||||
|
||||
## Hardware: ESP32-S3-BOX-3
|
||||
|
||||
| Feature | Spec |
|
||||
|---|---|
|
||||
| SoC | ESP32-S3 (dual-core Xtensa, 240MHz) |
|
||||
| RAM | 512KB SRAM + 16MB PSRAM |
|
||||
| Flash | 16MB |
|
||||
| Display | 2.4" IPS LCD, 320×240, touchscreen |
|
||||
| Mic | Dual microphone array |
|
||||
| Speaker | Built-in 1W speaker |
|
||||
| Connectivity | WiFi 802.11b/g/n, BT 5.0 |
|
||||
| USB | USB-C (programming + power) |
|
||||
|
||||
---
|
||||
|
||||
## Architecture Per Unit
|
||||
|
||||
```
|
||||
ESP32-S3-BOX-3
|
||||
├── microWakeWord (on-device, always listening)
|
||||
│ └── triggers Wyoming Satellite on wake detection
|
||||
├── Wyoming Satellite
|
||||
│ ├── streams mic audio → Mac Mini Wyoming STT (port 10300)
|
||||
│ └── receives TTS audio ← Mac Mini Wyoming TTS (port 10301)
|
||||
├── LVGL Display
|
||||
│ └── animated face, driven by HA entity state
|
||||
└── ESPHome OTA
|
||||
└── firmware updates over WiFi
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## ESPHome Configuration
|
||||
|
||||
### Base Config Template
|
||||
|
||||
`esphome/base.yaml` — shared across all units:
|
||||
|
||||
```yaml
|
||||
esphome:
|
||||
name: homeai-${room}
|
||||
friendly_name: "HomeAI ${room_display}"
|
||||
platform: esp32
|
||||
board: esp32-s3-box-3
|
||||
|
||||
wifi:
|
||||
ssid: !secret wifi_ssid
|
||||
password: !secret wifi_password
|
||||
ap:
|
||||
ssid: "HomeAI Fallback"
|
||||
|
||||
api:
|
||||
encryption:
|
||||
key: !secret api_key
|
||||
|
||||
ota:
|
||||
password: !secret ota_password
|
||||
|
||||
logger:
|
||||
level: INFO
|
||||
```
|
||||
|
||||
### Room-Specific Config
|
||||
|
||||
`esphome/s3-box-living-room.yaml`:
|
||||
|
||||
```yaml
|
||||
substitutions:
|
||||
room: living-room
|
||||
room_display: "Living Room"
|
||||
mac_mini_ip: "192.168.1.x" # or Tailscale IP
|
||||
|
||||
packages:
|
||||
base: !include base.yaml
|
||||
voice: !include voice.yaml
|
||||
display: !include display.yaml
|
||||
```
|
||||
|
||||
One file per room, only the substitutions change.
|
||||
|
||||
### Voice / Wyoming Satellite — `esphome/voice.yaml`
|
||||
|
||||
```yaml
|
||||
microphone:
|
||||
- platform: esp_adf
|
||||
id: mic
|
||||
|
||||
speaker:
|
||||
- platform: esp_adf
|
||||
id: spk
|
||||
|
||||
micro_wake_word:
|
||||
model: hey_jarvis # or custom model path
|
||||
on_wake_word_detected:
|
||||
- voice_assistant.start:
|
||||
|
||||
voice_assistant:
|
||||
microphone: mic
|
||||
speaker: spk
|
||||
noise_suppression_level: 2
|
||||
auto_gain: 31dBFS
|
||||
volume_multiplier: 2.0
|
||||
|
||||
on_listening:
|
||||
- display.page.show: page_listening
|
||||
- script.execute: animate_face_listening
|
||||
|
||||
on_stt_vad_end:
|
||||
- display.page.show: page_thinking
|
||||
- script.execute: animate_face_thinking
|
||||
|
||||
on_tts_start:
|
||||
- display.page.show: page_speaking
|
||||
- script.execute: animate_face_speaking
|
||||
|
||||
on_end:
|
||||
- display.page.show: page_idle
|
||||
- script.execute: animate_face_idle
|
||||
|
||||
on_error:
|
||||
- display.page.show: page_error
|
||||
- script.execute: animate_face_error
|
||||
```
|
||||
|
||||
**Note:** ESPHome's `voice_assistant` component connects to HA, which routes to Wyoming STT/TTS on the Mac Mini. This is the standard ESPHome → HA → Wyoming path.
|
||||
|
||||
### LVGL Display — `esphome/display.yaml`
|
||||
|
||||
```yaml
|
||||
display:
|
||||
- platform: ili9xxx
|
||||
model: ILI9341
|
||||
id: lcd
|
||||
cs_pin: GPIO5
|
||||
dc_pin: GPIO4
|
||||
reset_pin: GPIO48
|
||||
|
||||
touchscreen:
|
||||
- platform: tt21100
|
||||
id: touch
|
||||
|
||||
lvgl:
|
||||
displays:
|
||||
- lcd
|
||||
touchscreens:
|
||||
- touch
|
||||
|
||||
# Face widget — centered on screen
|
||||
widgets:
|
||||
- obj:
|
||||
id: face_container
|
||||
width: 320
|
||||
height: 240
|
||||
bg_color: 0x000000
|
||||
children:
|
||||
# Eyes (two circles)
|
||||
- obj:
|
||||
id: eye_left
|
||||
x: 90
|
||||
y: 90
|
||||
width: 50
|
||||
height: 50
|
||||
radius: 25
|
||||
bg_color: 0xFFFFFF
|
||||
- obj:
|
||||
id: eye_right
|
||||
x: 180
|
||||
y: 90
|
||||
width: 50
|
||||
height: 50
|
||||
radius: 25
|
||||
bg_color: 0xFFFFFF
|
||||
# Mouth (line/arc)
|
||||
- arc:
|
||||
id: mouth
|
||||
x: 110
|
||||
y: 160
|
||||
width: 100
|
||||
height: 40
|
||||
start_angle: 180
|
||||
end_angle: 360
|
||||
arc_color: 0xFFFFFF
|
||||
|
||||
pages:
|
||||
- id: page_idle
|
||||
- id: page_listening
|
||||
- id: page_thinking
|
||||
- id: page_speaking
|
||||
- id: page_error
|
||||
```
|
||||
|
||||
### LVGL Face State Animations — `esphome/animations.yaml`
|
||||
|
||||
```yaml
|
||||
script:
|
||||
- id: animate_face_idle
|
||||
then:
|
||||
- lvgl.widget.modify:
|
||||
id: eye_left
|
||||
height: 50 # normal open
|
||||
- lvgl.widget.modify:
|
||||
id: eye_right
|
||||
height: 50
|
||||
- lvgl.widget.modify:
|
||||
id: mouth
|
||||
arc_color: 0xFFFFFF
|
||||
|
||||
- id: animate_face_listening
|
||||
then:
|
||||
- lvgl.widget.modify:
|
||||
id: eye_left
|
||||
height: 60 # wider eyes
|
||||
- lvgl.widget.modify:
|
||||
id: eye_right
|
||||
height: 60
|
||||
- lvgl.widget.modify:
|
||||
id: mouth
|
||||
arc_color: 0x00BFFF # blue tint
|
||||
|
||||
- id: animate_face_thinking
|
||||
then:
|
||||
- lvgl.widget.modify:
|
||||
id: eye_left
|
||||
height: 20 # squinting
|
||||
- lvgl.widget.modify:
|
||||
id: eye_right
|
||||
height: 20
|
||||
|
||||
- id: animate_face_speaking
|
||||
then:
|
||||
- lvgl.widget.modify:
|
||||
id: mouth
|
||||
arc_color: 0x00FF88 # green speaking indicator
|
||||
|
||||
- id: animate_face_error
|
||||
then:
|
||||
- lvgl.widget.modify:
|
||||
id: eye_left
|
||||
bg_color: 0xFF2200 # red eyes
|
||||
- lvgl.widget.modify:
|
||||
id: eye_right
|
||||
bg_color: 0xFF2200
|
||||
```
|
||||
|
||||
> **Note:** True lip-sync animation (mouth moving with audio) is complex on ESP32. Phase 1: static states. Phase 2: amplitude-driven mouth height using speaker volume feedback.
|
||||
|
||||
---
|
||||
|
||||
## Secrets File
|
||||
|
||||
`esphome/secrets.yaml` (gitignored):
|
||||
|
||||
```yaml
|
||||
wifi_ssid: "YourNetwork"
|
||||
wifi_password: "YourPassword"
|
||||
api_key: "<32-byte base64 key>"
|
||||
ota_password: "YourOTAPassword"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Flash & Deployment Workflow
|
||||
|
||||
```bash
|
||||
# Install ESPHome
|
||||
pip install esphome
|
||||
|
||||
# Compile + flash via USB (first time)
|
||||
esphome run esphome/s3-box-living-room.yaml
|
||||
|
||||
# OTA update (subsequent)
|
||||
esphome upload esphome/s3-box-living-room.yaml --device <device-ip>
|
||||
|
||||
# View logs
|
||||
esphome logs esphome/s3-box-living-room.yaml
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Home Assistant Integration
|
||||
|
||||
After flashing:
|
||||
1. HA discovers ESP32 automatically via mDNS
|
||||
2. Add device in HA → Settings → Devices
|
||||
3. Assign Wyoming voice assistant pipeline to the device
|
||||
4. Set up room-specific automations (e.g., "Living Room" light control from that satellite)
|
||||
|
||||
---
|
||||
|
||||
## Directory Layout
|
||||
|
||||
```
|
||||
homeai-esp32/
|
||||
└── esphome/
|
||||
├── base.yaml
|
||||
├── voice.yaml
|
||||
├── display.yaml
|
||||
├── animations.yaml
|
||||
├── s3-box-living-room.yaml
|
||||
├── s3-box-bedroom.yaml # template, fill in when hardware available
|
||||
├── s3-box-kitchen.yaml # template
|
||||
└── secrets.yaml # gitignored
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Wake Word Decisions
|
||||
|
||||
| Option | Latency | Privacy | Effort |
|
||||
|---|---|---|---|
|
||||
| `hey_jarvis` (built-in microWakeWord) | ~200ms | On-device | Zero |
|
||||
| Custom word (trained model) | ~200ms | On-device | High — requires 50+ recordings |
|
||||
| Mac Mini openWakeWord (stream audio) | ~500ms | On Mac | Medium |
|
||||
|
||||
**Recommendation:** Start with `hey_jarvis`. Train a custom word (character's name) once character name is finalised.
|
||||
|
||||
---
|
||||
|
||||
## Implementation Steps
|
||||
|
||||
- [ ] Install ESPHome: `pip install esphome`
|
||||
- [ ] Write `esphome/secrets.yaml` (gitignored)
|
||||
- [ ] Write `base.yaml`, `voice.yaml`, `display.yaml`, `animations.yaml`
|
||||
- [ ] Write `s3-box-living-room.yaml` for first unit
|
||||
- [ ] Flash first unit via USB: `esphome run s3-box-living-room.yaml`
|
||||
- [ ] Verify unit appears in HA device list
|
||||
- [ ] Assign Wyoming voice pipeline to unit in HA
|
||||
- [ ] Test: speak wake word → transcription → LLM response → spoken reply
|
||||
- [ ] Test: LVGL face cycles through idle → listening → thinking → speaking
|
||||
- [ ] Verify OTA update works: change LVGL color, deploy wirelessly
|
||||
- [ ] Write config templates for remaining rooms (bedroom, kitchen)
|
||||
- [ ] Flash remaining units, verify each works independently
|
||||
- [ ] Document final MAC address → room name mapping
|
||||
|
||||
---
|
||||
|
||||
## Success Criteria
|
||||
|
||||
- [ ] Wake word "hey jarvis" triggers pipeline reliably from 3m distance
|
||||
- [ ] STT transcription accuracy >90% for clear speech in quiet room
|
||||
- [ ] TTS audio plays clearly through ESP32 speaker
|
||||
- [ ] LVGL face shows correct state for idle / listening / thinking / speaking / error
|
||||
- [ ] OTA firmware updates work without USB cable
|
||||
- [ ] Unit reconnects automatically after WiFi drop
|
||||
- [ ] Unit survives power cycle and resumes normal operation
|
||||
Reference in New Issue
Block a user