Files

Aodhan Collins 38247d7cc4 Initial project structure and planning docs

Full project plan across 8 sub-projects (homeai-infra, homeai-llm,
homeai-voice, homeai-agent, homeai-character, homeai-esp32,
homeai-visual, homeai-images). Includes per-project PLAN.md files,
top-level PROJECT_PLAN.md, and master TODO.md.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-03-04 01:11:37 +00:00

8.4 KiB

Raw Permalink Blame History

P6: homeai-esp32 — Room Satellite Hardware

Phase 4 | Depends on: P1 (HA running), P3 (Wyoming STT/TTS servers running)

Goal

Flash ESP32-S3-BOX-3 units with ESPHome. Each unit acts as a dumb room satellite: always-on mic, local wake word detection, audio playback, and an LVGL animated face showing assistant state. All intelligence stays on the Mac Mini.

Hardware: ESP32-S3-BOX-3

Feature	Spec
SoC	ESP32-S3 (dual-core Xtensa, 240MHz)
RAM	512KB SRAM + 16MB PSRAM
Flash	16MB
Display	2.4" IPS LCD, 320×240, touchscreen
Mic	Dual microphone array
Speaker	Built-in 1W speaker
Connectivity	WiFi 802.11b/g/n, BT 5.0
USB	USB-C (programming + power)

Architecture Per Unit

ESP32-S3-BOX-3
├── microWakeWord (on-device, always listening)
│   └── triggers Wyoming Satellite on wake detection
├── Wyoming Satellite
│   ├── streams mic audio → Mac Mini Wyoming STT (port 10300)
│   └── receives TTS audio ← Mac Mini Wyoming TTS (port 10301)
├── LVGL Display
│   └── animated face, driven by HA entity state
└── ESPHome OTA
    └── firmware updates over WiFi

ESPHome Configuration

Base Config Template

esphome/base.yaml — shared across all units:

esphome:
  name: homeai-${room}
  friendly_name: "HomeAI ${room_display}"
  platform: esp32
  board: esp32-s3-box-3

wifi:
  ssid: !secret wifi_ssid
  password: !secret wifi_password
  ap:
    ssid: "HomeAI Fallback"

api:
  encryption:
    key: !secret api_key

ota:
  password: !secret ota_password

logger:
  level: INFO

Room-Specific Config

esphome/s3-box-living-room.yaml:

substitutions:
  room: living-room
  room_display: "Living Room"
  mac_mini_ip: "192.168.1.x"    # or Tailscale IP

packages:
  base: !include base.yaml
  voice: !include voice.yaml
  display: !include display.yaml

One file per room, only the substitutions change.

Voice / Wyoming Satellite — `esphome/voice.yaml`

microphone:
  - platform: esp_adf
    id: mic

speaker:
  - platform: esp_adf
    id: spk

micro_wake_word:
  model: hey_jarvis            # or custom model path
  on_wake_word_detected:
    - voice_assistant.start:

voice_assistant:
  microphone: mic
  speaker: spk
  noise_suppression_level: 2
  auto_gain: 31dBFS
  volume_multiplier: 2.0

  on_listening:
    - display.page.show: page_listening
    - script.execute: animate_face_listening

  on_stt_vad_end:
    - display.page.show: page_thinking
    - script.execute: animate_face_thinking

  on_tts_start:
    - display.page.show: page_speaking
    - script.execute: animate_face_speaking

  on_end:
    - display.page.show: page_idle
    - script.execute: animate_face_idle

  on_error:
    - display.page.show: page_error
    - script.execute: animate_face_error

Note: ESPHome's voice_assistant component connects to HA, which routes to Wyoming STT/TTS on the Mac Mini. This is the standard ESPHome → HA → Wyoming path.

LVGL Display — `esphome/display.yaml`

display:
  - platform: ili9xxx
    model: ILI9341
    id: lcd
    cs_pin: GPIO5
    dc_pin: GPIO4
    reset_pin: GPIO48

touchscreen:
  - platform: tt21100
    id: touch

lvgl:
  displays:
    - lcd
  touchscreens:
    - touch

  # Face widget — centered on screen
  widgets:
    - obj:
        id: face_container
        width: 320
        height: 240
        bg_color: 0x000000
        children:
          # Eyes (two circles)
          - obj:
              id: eye_left
              x: 90
              y: 90
              width: 50
              height: 50
              radius: 25
              bg_color: 0xFFFFFF
          - obj:
              id: eye_right
              x: 180
              y: 90
              width: 50
              height: 50
              radius: 25
              bg_color: 0xFFFFFF
          # Mouth (line/arc)
          - arc:
              id: mouth
              x: 110
              y: 160
              width: 100
              height: 40
              start_angle: 180
              end_angle: 360
              arc_color: 0xFFFFFF

  pages:
    - id: page_idle
    - id: page_listening
    - id: page_thinking
    - id: page_speaking
    - id: page_error

LVGL Face State Animations — `esphome/animations.yaml`

script:
  - id: animate_face_idle
    then:
      - lvgl.widget.modify:
          id: eye_left
          height: 50     # normal open
      - lvgl.widget.modify:
          id: eye_right
          height: 50
      - lvgl.widget.modify:
          id: mouth
          arc_color: 0xFFFFFF

  - id: animate_face_listening
    then:
      - lvgl.widget.modify:
          id: eye_left
          height: 60     # wider eyes
      - lvgl.widget.modify:
          id: eye_right
          height: 60
      - lvgl.widget.modify:
          id: mouth
          arc_color: 0x00BFFF  # blue tint

  - id: animate_face_thinking
    then:
      - lvgl.widget.modify:
          id: eye_left
          height: 20     # squinting
      - lvgl.widget.modify:
          id: eye_right
          height: 20

  - id: animate_face_speaking
    then:
      - lvgl.widget.modify:
          id: mouth
          arc_color: 0x00FF88  # green speaking indicator

  - id: animate_face_error
    then:
      - lvgl.widget.modify:
          id: eye_left
          bg_color: 0xFF2200  # red eyes
      - lvgl.widget.modify:
          id: eye_right
          bg_color: 0xFF2200

Note: True lip-sync animation (mouth moving with audio) is complex on ESP32. Phase 1: static states. Phase 2: amplitude-driven mouth height using speaker volume feedback.

Secrets File

esphome/secrets.yaml (gitignored):

wifi_ssid: "YourNetwork"
wifi_password: "YourPassword"
api_key: "<32-byte base64 key>"
ota_password: "YourOTAPassword"

Flash & Deployment Workflow

# Install ESPHome
pip install esphome

# Compile + flash via USB (first time)
esphome run esphome/s3-box-living-room.yaml

# OTA update (subsequent)
esphome upload esphome/s3-box-living-room.yaml --device <device-ip>

# View logs
esphome logs esphome/s3-box-living-room.yaml

Home Assistant Integration

After flashing:

HA discovers ESP32 automatically via mDNS
Add device in HA → Settings → Devices
Assign Wyoming voice assistant pipeline to the device
Set up room-specific automations (e.g., "Living Room" light control from that satellite)

Directory Layout

homeai-esp32/
└── esphome/
    ├── base.yaml
    ├── voice.yaml
    ├── display.yaml
    ├── animations.yaml
    ├── s3-box-living-room.yaml
    ├── s3-box-bedroom.yaml       # template, fill in when hardware available
    ├── s3-box-kitchen.yaml       # template
    └── secrets.yaml              # gitignored

Wake Word Decisions

Option	Latency	Privacy	Effort
`hey_jarvis` (built-in microWakeWord)	~200ms	On-device	Zero
Custom word (trained model)	~200ms	On-device	High — requires 50+ recordings
Mac Mini openWakeWord (stream audio)	~500ms	On Mac	Medium

Recommendation: Start with hey_jarvis. Train a custom word (character's name) once character name is finalised.

Implementation Steps

Install ESPHome: pip install esphome
Write esphome/secrets.yaml (gitignored)
Write base.yaml, voice.yaml, display.yaml, animations.yaml
Write s3-box-living-room.yaml for first unit
Flash first unit via USB: esphome run s3-box-living-room.yaml
Verify unit appears in HA device list
Assign Wyoming voice pipeline to unit in HA
Test: speak wake word → transcription → LLM response → spoken reply
Test: LVGL face cycles through idle → listening → thinking → speaking
Verify OTA update works: change LVGL color, deploy wirelessly
Write config templates for remaining rooms (bedroom, kitchen)
Flash remaining units, verify each works independently
Document final MAC address → room name mapping

Success Criteria

Wake word "hey jarvis" triggers pipeline reliably from 3m distance
STT transcription accuracy >90% for clear speech in quiet room
TTS audio plays clearly through ESP32 speaker
LVGL face shows correct state for idle / listening / thinking / speaking / error
OTA firmware updates work without USB cable
Unit reconnects automatically after WiFi drop
Unit survives power cycle and resumes normal operation

8.4 KiB Raw Permalink Blame History Unescape Escape