Initial project structure and planning docs

Full project plan across 8 sub-projects (homeai-infra, homeai-llm,
homeai-voice, homeai-agent, homeai-character, homeai-esp32,
homeai-visual, homeai-images). Includes per-project PLAN.md files,
top-level PROJECT_PLAN.md, and master TODO.md.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
Aodhan Collins
2026-03-04 01:11:37 +00:00
commit 38247d7cc4
11 changed files with 3060 additions and 0 deletions

357
homeai-esp32/PLAN.md Normal file
View File

@@ -0,0 +1,357 @@
# P6: homeai-esp32 — Room Satellite Hardware
> Phase 4 | Depends on: P1 (HA running), P3 (Wyoming STT/TTS servers running)
---
## Goal
Flash ESP32-S3-BOX-3 units with ESPHome. Each unit acts as a dumb room satellite: always-on mic, local wake word detection, audio playback, and an LVGL animated face showing assistant state. All intelligence stays on the Mac Mini.
---
## Hardware: ESP32-S3-BOX-3
| Feature | Spec |
|---|---|
| SoC | ESP32-S3 (dual-core Xtensa, 240MHz) |
| RAM | 512KB SRAM + 16MB PSRAM |
| Flash | 16MB |
| Display | 2.4" IPS LCD, 320×240, touchscreen |
| Mic | Dual microphone array |
| Speaker | Built-in 1W speaker |
| Connectivity | WiFi 802.11b/g/n, BT 5.0 |
| USB | USB-C (programming + power) |
---
## Architecture Per Unit
```
ESP32-S3-BOX-3
├── microWakeWord (on-device, always listening)
│ └── triggers Wyoming Satellite on wake detection
├── Wyoming Satellite
│ ├── streams mic audio → Mac Mini Wyoming STT (port 10300)
│ └── receives TTS audio ← Mac Mini Wyoming TTS (port 10301)
├── LVGL Display
│ └── animated face, driven by HA entity state
└── ESPHome OTA
└── firmware updates over WiFi
```
---
## ESPHome Configuration
### Base Config Template
`esphome/base.yaml` — shared across all units:
```yaml
esphome:
name: homeai-${room}
friendly_name: "HomeAI ${room_display}"
platform: esp32
board: esp32-s3-box-3
wifi:
ssid: !secret wifi_ssid
password: !secret wifi_password
ap:
ssid: "HomeAI Fallback"
api:
encryption:
key: !secret api_key
ota:
password: !secret ota_password
logger:
level: INFO
```
### Room-Specific Config
`esphome/s3-box-living-room.yaml`:
```yaml
substitutions:
room: living-room
room_display: "Living Room"
mac_mini_ip: "192.168.1.x" # or Tailscale IP
packages:
base: !include base.yaml
voice: !include voice.yaml
display: !include display.yaml
```
One file per room, only the substitutions change.
### Voice / Wyoming Satellite — `esphome/voice.yaml`
```yaml
microphone:
- platform: esp_adf
id: mic
speaker:
- platform: esp_adf
id: spk
micro_wake_word:
model: hey_jarvis # or custom model path
on_wake_word_detected:
- voice_assistant.start:
voice_assistant:
microphone: mic
speaker: spk
noise_suppression_level: 2
auto_gain: 31dBFS
volume_multiplier: 2.0
on_listening:
- display.page.show: page_listening
- script.execute: animate_face_listening
on_stt_vad_end:
- display.page.show: page_thinking
- script.execute: animate_face_thinking
on_tts_start:
- display.page.show: page_speaking
- script.execute: animate_face_speaking
on_end:
- display.page.show: page_idle
- script.execute: animate_face_idle
on_error:
- display.page.show: page_error
- script.execute: animate_face_error
```
**Note:** ESPHome's `voice_assistant` component connects to HA, which routes to Wyoming STT/TTS on the Mac Mini. This is the standard ESPHome → HA → Wyoming path.
### LVGL Display — `esphome/display.yaml`
```yaml
display:
- platform: ili9xxx
model: ILI9341
id: lcd
cs_pin: GPIO5
dc_pin: GPIO4
reset_pin: GPIO48
touchscreen:
- platform: tt21100
id: touch
lvgl:
displays:
- lcd
touchscreens:
- touch
# Face widget — centered on screen
widgets:
- obj:
id: face_container
width: 320
height: 240
bg_color: 0x000000
children:
# Eyes (two circles)
- obj:
id: eye_left
x: 90
y: 90
width: 50
height: 50
radius: 25
bg_color: 0xFFFFFF
- obj:
id: eye_right
x: 180
y: 90
width: 50
height: 50
radius: 25
bg_color: 0xFFFFFF
# Mouth (line/arc)
- arc:
id: mouth
x: 110
y: 160
width: 100
height: 40
start_angle: 180
end_angle: 360
arc_color: 0xFFFFFF
pages:
- id: page_idle
- id: page_listening
- id: page_thinking
- id: page_speaking
- id: page_error
```
### LVGL Face State Animations — `esphome/animations.yaml`
```yaml
script:
- id: animate_face_idle
then:
- lvgl.widget.modify:
id: eye_left
height: 50 # normal open
- lvgl.widget.modify:
id: eye_right
height: 50
- lvgl.widget.modify:
id: mouth
arc_color: 0xFFFFFF
- id: animate_face_listening
then:
- lvgl.widget.modify:
id: eye_left
height: 60 # wider eyes
- lvgl.widget.modify:
id: eye_right
height: 60
- lvgl.widget.modify:
id: mouth
arc_color: 0x00BFFF # blue tint
- id: animate_face_thinking
then:
- lvgl.widget.modify:
id: eye_left
height: 20 # squinting
- lvgl.widget.modify:
id: eye_right
height: 20
- id: animate_face_speaking
then:
- lvgl.widget.modify:
id: mouth
arc_color: 0x00FF88 # green speaking indicator
- id: animate_face_error
then:
- lvgl.widget.modify:
id: eye_left
bg_color: 0xFF2200 # red eyes
- lvgl.widget.modify:
id: eye_right
bg_color: 0xFF2200
```
> **Note:** True lip-sync animation (mouth moving with audio) is complex on ESP32. Phase 1: static states. Phase 2: amplitude-driven mouth height using speaker volume feedback.
---
## Secrets File
`esphome/secrets.yaml` (gitignored):
```yaml
wifi_ssid: "YourNetwork"
wifi_password: "YourPassword"
api_key: "<32-byte base64 key>"
ota_password: "YourOTAPassword"
```
---
## Flash & Deployment Workflow
```bash
# Install ESPHome
pip install esphome
# Compile + flash via USB (first time)
esphome run esphome/s3-box-living-room.yaml
# OTA update (subsequent)
esphome upload esphome/s3-box-living-room.yaml --device <device-ip>
# View logs
esphome logs esphome/s3-box-living-room.yaml
```
---
## Home Assistant Integration
After flashing:
1. HA discovers ESP32 automatically via mDNS
2. Add device in HA → Settings → Devices
3. Assign Wyoming voice assistant pipeline to the device
4. Set up room-specific automations (e.g., "Living Room" light control from that satellite)
---
## Directory Layout
```
homeai-esp32/
└── esphome/
├── base.yaml
├── voice.yaml
├── display.yaml
├── animations.yaml
├── s3-box-living-room.yaml
├── s3-box-bedroom.yaml # template, fill in when hardware available
├── s3-box-kitchen.yaml # template
└── secrets.yaml # gitignored
```
---
## Wake Word Decisions
| Option | Latency | Privacy | Effort |
|---|---|---|---|
| `hey_jarvis` (built-in microWakeWord) | ~200ms | On-device | Zero |
| Custom word (trained model) | ~200ms | On-device | High — requires 50+ recordings |
| Mac Mini openWakeWord (stream audio) | ~500ms | On Mac | Medium |
**Recommendation:** Start with `hey_jarvis`. Train a custom word (character's name) once character name is finalised.
---
## Implementation Steps
- [ ] Install ESPHome: `pip install esphome`
- [ ] Write `esphome/secrets.yaml` (gitignored)
- [ ] Write `base.yaml`, `voice.yaml`, `display.yaml`, `animations.yaml`
- [ ] Write `s3-box-living-room.yaml` for first unit
- [ ] Flash first unit via USB: `esphome run s3-box-living-room.yaml`
- [ ] Verify unit appears in HA device list
- [ ] Assign Wyoming voice pipeline to unit in HA
- [ ] Test: speak wake word → transcription → LLM response → spoken reply
- [ ] Test: LVGL face cycles through idle → listening → thinking → speaking
- [ ] Verify OTA update works: change LVGL color, deploy wirelessly
- [ ] Write config templates for remaining rooms (bedroom, kitchen)
- [ ] Flash remaining units, verify each works independently
- [ ] Document final MAC address → room name mapping
---
## Success Criteria
- [ ] Wake word "hey jarvis" triggers pipeline reliably from 3m distance
- [ ] STT transcription accuracy >90% for clear speech in quiet room
- [ ] TTS audio plays clearly through ESP32 speaker
- [ ] LVGL face shows correct state for idle / listening / thinking / speaking / error
- [ ] OTA firmware updates work without USB cable
- [ ] Unit reconnects automatically after WiFi drop
- [ ] Unit survives power cycle and resumes normal operation