Initial project structure and planning docs

Full project plan across 8 sub-projects (homeai-infra, homeai-llm, homeai-voice, homeai-agent, homeai-character, homeai-esp32, homeai-visual, homeai-images). Includes per-project PLAN.md files, top-level PROJECT_PLAN.md, and master TODO.md. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-04 01:11:37 +00:00
commit 38247d7cc4
11 changed files with 3060 additions and 0 deletions
--- a/homeai-esp32/PLAN.md
+++ b/homeai-esp32/PLAN.md
@@ -0,0 +1,357 @@
+# P6: homeai-esp32 — Room Satellite Hardware
+
+> Phase 4 | Depends on: P1 (HA running), P3 (Wyoming STT/TTS servers running)
+
+---
+
+## Goal
+
+Flash ESP32-S3-BOX-3 units with ESPHome. Each unit acts as a dumb room satellite: always-on mic, local wake word detection, audio playback, and an LVGL animated face showing assistant state. All intelligence stays on the Mac Mini.
+
+---
+
+## Hardware: ESP32-S3-BOX-3
+
+| Feature | Spec |
+|---|---|
+| SoC | ESP32-S3 (dual-core Xtensa, 240MHz) |
+| RAM | 512KB SRAM + 16MB PSRAM |
+| Flash | 16MB |
+| Display | 2.4" IPS LCD, 320×240, touchscreen |
+| Mic | Dual microphone array |
+| Speaker | Built-in 1W speaker |
+| Connectivity | WiFi 802.11b/g/n, BT 5.0 |
+| USB | USB-C (programming + power) |
+
+---
+
+## Architecture Per Unit
+
+```
+ESP32-S3-BOX-3
+├── microWakeWord (on-device, always listening)
+│   └── triggers Wyoming Satellite on wake detection
+├── Wyoming Satellite
+│   ├── streams mic audio → Mac Mini Wyoming STT (port 10300)
+│   └── receives TTS audio ← Mac Mini Wyoming TTS (port 10301)
+├── LVGL Display
+│   └── animated face, driven by HA entity state
+└── ESPHome OTA
+    └── firmware updates over WiFi
+```
+
+---
+
+## ESPHome Configuration
+
+### Base Config Template
+
+`esphome/base.yaml` — shared across all units:
+
+```yaml
+esphome:
+  name: homeai-${room}
+  friendly_name: "HomeAI ${room_display}"
+  platform: esp32
+  board: esp32-s3-box-3
+
+wifi:
+  ssid: !secret wifi_ssid
+  password: !secret wifi_password
+  ap:
+    ssid: "HomeAI Fallback"
+
+api:
+  encryption:
+    key: !secret api_key
+
+ota:
+  password: !secret ota_password
+
+logger:
+  level: INFO
+```
+
+### Room-Specific Config
+
+`esphome/s3-box-living-room.yaml`:
+
+```yaml
+substitutions:
+  room: living-room
+  room_display: "Living Room"
+  mac_mini_ip: "192.168.1.x"    # or Tailscale IP
+
+packages:
+  base: !include base.yaml
+  voice: !include voice.yaml
+  display: !include display.yaml
+```
+
+One file per room, only the substitutions change.
+
+### Voice / Wyoming Satellite — `esphome/voice.yaml`
+
+```yaml
+microphone:
+  - platform: esp_adf
+    id: mic
+
+speaker:
+  - platform: esp_adf
+    id: spk
+
+micro_wake_word:
+  model: hey_jarvis            # or custom model path
+  on_wake_word_detected:
+    - voice_assistant.start:
+
+voice_assistant:
+  microphone: mic
+  speaker: spk
+  noise_suppression_level: 2
+  auto_gain: 31dBFS
+  volume_multiplier: 2.0
+
+  on_listening:
+    - display.page.show: page_listening
+    - script.execute: animate_face_listening
+
+  on_stt_vad_end:
+    - display.page.show: page_thinking
+    - script.execute: animate_face_thinking
+
+  on_tts_start:
+    - display.page.show: page_speaking
+    - script.execute: animate_face_speaking
+
+  on_end:
+    - display.page.show: page_idle
+    - script.execute: animate_face_idle
+
+  on_error:
+    - display.page.show: page_error
+    - script.execute: animate_face_error
+```
+
+**Note:** ESPHome's `voice_assistant` component connects to HA, which routes to Wyoming STT/TTS on the Mac Mini. This is the standard ESPHome → HA → Wyoming path.
+
+### LVGL Display — `esphome/display.yaml`
+
+```yaml
+display:
+  - platform: ili9xxx
+    model: ILI9341
+    id: lcd
+    cs_pin: GPIO5
+    dc_pin: GPIO4
+    reset_pin: GPIO48
+
+touchscreen:
+  - platform: tt21100
+    id: touch
+
+lvgl:
+  displays:
+    - lcd
+  touchscreens:
+    - touch
+
+  # Face widget — centered on screen
+  widgets:
+    - obj:
+        id: face_container
+        width: 320
+        height: 240
+        bg_color: 0x000000
+        children:
+          # Eyes (two circles)
+          - obj:
+              id: eye_left
+              x: 90
+              y: 90
+              width: 50
+              height: 50
+              radius: 25
+              bg_color: 0xFFFFFF
+          - obj:
+              id: eye_right
+              x: 180
+              y: 90
+              width: 50
+              height: 50
+              radius: 25
+              bg_color: 0xFFFFFF
+          # Mouth (line/arc)
+          - arc:
+              id: mouth
+              x: 110
+              y: 160
+              width: 100
+              height: 40
+              start_angle: 180
+              end_angle: 360
+              arc_color: 0xFFFFFF
+
+  pages:
+    - id: page_idle
+    - id: page_listening
+    - id: page_thinking
+    - id: page_speaking
+    - id: page_error
+```
+
+### LVGL Face State Animations — `esphome/animations.yaml`
+
+```yaml
+script:
+  - id: animate_face_idle
+    then:
+      - lvgl.widget.modify:
+          id: eye_left
+          height: 50     # normal open
+      - lvgl.widget.modify:
+          id: eye_right
+          height: 50
+      - lvgl.widget.modify:
+          id: mouth
+          arc_color: 0xFFFFFF
+
+  - id: animate_face_listening
+    then:
+      - lvgl.widget.modify:
+          id: eye_left
+          height: 60     # wider eyes
+      - lvgl.widget.modify:
+          id: eye_right
+          height: 60
+      - lvgl.widget.modify:
+          id: mouth
+          arc_color: 0x00BFFF  # blue tint
+
+  - id: animate_face_thinking
+    then:
+      - lvgl.widget.modify:
+          id: eye_left
+          height: 20     # squinting
+      - lvgl.widget.modify:
+          id: eye_right
+          height: 20
+
+  - id: animate_face_speaking
+    then:
+      - lvgl.widget.modify:
+          id: mouth
+          arc_color: 0x00FF88  # green speaking indicator
+
+  - id: animate_face_error
+    then:
+      - lvgl.widget.modify:
+          id: eye_left
+          bg_color: 0xFF2200  # red eyes
+      - lvgl.widget.modify:
+          id: eye_right
+          bg_color: 0xFF2200
+```
+
+> **Note:** True lip-sync animation (mouth moving with audio) is complex on ESP32. Phase 1: static states. Phase 2: amplitude-driven mouth height using speaker volume feedback.
+
+---
+
+## Secrets File
+
+`esphome/secrets.yaml` (gitignored):
+
+```yaml
+wifi_ssid: "YourNetwork"
+wifi_password: "YourPassword"
+api_key: "<32-byte base64 key>"
+ota_password: "YourOTAPassword"
+```
+
+---
+
+## Flash & Deployment Workflow
+
+```bash
+# Install ESPHome
+pip install esphome
+
+# Compile + flash via USB (first time)
+esphome run esphome/s3-box-living-room.yaml
+
+# OTA update (subsequent)
+esphome upload esphome/s3-box-living-room.yaml --device <device-ip>
+
+# View logs
+esphome logs esphome/s3-box-living-room.yaml
+```
+
+---
+
+## Home Assistant Integration
+
+After flashing:
+1. HA discovers ESP32 automatically via mDNS
+2. Add device in HA → Settings → Devices
+3. Assign Wyoming voice assistant pipeline to the device
+4. Set up room-specific automations (e.g., "Living Room" light control from that satellite)
+
+---
+
+## Directory Layout
+
+```
+homeai-esp32/
+└── esphome/
+    ├── base.yaml
+    ├── voice.yaml
+    ├── display.yaml
+    ├── animations.yaml
+    ├── s3-box-living-room.yaml
+    ├── s3-box-bedroom.yaml       # template, fill in when hardware available
+    ├── s3-box-kitchen.yaml       # template
+    └── secrets.yaml              # gitignored
+```
+
+---
+
+## Wake Word Decisions
+
+| Option | Latency | Privacy | Effort |
+|---|---|---|---|
+| `hey_jarvis` (built-in microWakeWord) | ~200ms | On-device | Zero |
+| Custom word (trained model) | ~200ms | On-device | High — requires 50+ recordings |
+| Mac Mini openWakeWord (stream audio) | ~500ms | On Mac | Medium |
+
+**Recommendation:** Start with `hey_jarvis`. Train a custom word (character's name) once character name is finalised.
+
+---
+
+## Implementation Steps
+
+- [ ] Install ESPHome: `pip install esphome`
+- [ ] Write `esphome/secrets.yaml` (gitignored)
+- [ ] Write `base.yaml`, `voice.yaml`, `display.yaml`, `animations.yaml`
+- [ ] Write `s3-box-living-room.yaml` for first unit
+- [ ] Flash first unit via USB: `esphome run s3-box-living-room.yaml`
+- [ ] Verify unit appears in HA device list
+- [ ] Assign Wyoming voice pipeline to unit in HA
+- [ ] Test: speak wake word → transcription → LLM response → spoken reply
+- [ ] Test: LVGL face cycles through idle → listening → thinking → speaking
+- [ ] Verify OTA update works: change LVGL color, deploy wirelessly
+- [ ] Write config templates for remaining rooms (bedroom, kitchen)
+- [ ] Flash remaining units, verify each works independently
+- [ ] Document final MAC address → room name mapping
+
+---
+
+## Success Criteria
+
+- [ ] Wake word "hey jarvis" triggers pipeline reliably from 3m distance
+- [ ] STT transcription accuracy >90% for clear speech in quiet room
+- [ ] TTS audio plays clearly through ESP32 speaker
+- [ ] LVGL face shows correct state for idle / listening / thinking / speaking / error
+- [ ] OTA firmware updates work without USB cable
+- [ ] Unit reconnects automatically after WiFi drop
+- [ ] Unit survives power cycle and resumes normal operation