Initial project structure and planning docs

Full project plan across 8 sub-projects (homeai-infra, homeai-llm, homeai-voice, homeai-agent, homeai-character, homeai-esp32, homeai-visual, homeai-images). Includes per-project PLAN.md files, top-level PROJECT_PLAN.md, and master TODO.md. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-04 01:11:37 +00:00
commit 38247d7cc4
11 changed files with 3060 additions and 0 deletions
--- a/homeai-llm/PLAN.md
+++ b/homeai-llm/PLAN.md
@@ -0,0 +1,202 @@
+# P2: homeai-llm — Local LLM Runtime
+
+> Phase 1 | Depends on: P1 (infra up) | Blocked by: nothing
+
+---
+
+## Goal
+
+Ollama running natively on Mac Mini with target models available. Open WebUI connected and accessible. LLM API ready for all downstream consumers (P3, P4, P7).
+
+---
+
+## Why Native (not Docker)
+
+Ollama must run natively — not in Docker — because:
+- Docker on Mac cannot access Apple Metal GPU (runs in a Linux VM)
+- Native Ollama uses Metal for GPU acceleration, giving 3–5× faster inference
+- Ollama's launchd integration keeps it alive across reboots
+
+---
+
+## Deliverables
+
+### 1. Ollama Installation
+
+```bash
+# Install
+brew install ollama
+
+# Or direct install
+curl -fsSL https://ollama.com/install.sh | sh
+```
+
+Ollama runs as a background process. Configure as a launchd service for reboot survival.
+
+**launchd plist:** `~/Library/LaunchAgents/com.ollama.ollama.plist`
+
+```xml
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
+<plist version="1.0">
+<dict>
+    <key>Label</key>
+    <string>com.ollama.ollama</string>
+    <key>ProgramArguments</key>
+    <array>
+        <string>/usr/local/bin/ollama</string>
+        <string>serve</string>
+    </array>
+    <key>RunAtLoad</key>
+    <true/>
+    <key>KeepAlive</key>
+    <true/>
+    <key>StandardOutPath</key>
+    <string>/tmp/ollama.log</string>
+    <key>StandardErrorPath</key>
+    <string>/tmp/ollama.err</string>
+</dict>
+</plist>
+```
+
+Load: `launchctl load ~/Library/LaunchAgents/com.ollama.ollama.plist`
+
+### 2. Model Manifest — `ollama-models.txt`
+
+Pinned models pulled to Mac Mini:
+
+```
+# Primary — high quality responses
+llama3.3:70b
+qwen2.5:72b
+
+# Fast — low-latency tasks (timers, quick queries, TTS pre-processing)
+qwen2.5:7b
+
+# Code — for n8n/skill writing assistance
+qwen2.5-coder:32b
+
+# Embedding — for mem0 semantic search
+nomic-embed-text
+```
+
+Pull script (`scripts/pull-models.sh`):
+```bash
+#!/usr/bin/env bash
+while IFS= read -r model; do
+  [[ "$model" =~ ^#.*$ || -z "$model" ]] && continue
+  echo "Pulling $model..."
+  ollama pull "$model"
+done < ../ollama-models.txt
+```
+
+### 3. Open WebUI — Docker
+
+Open WebUI connects to Ollama over the Docker-to-host bridge (`host.docker.internal`):
+
+**`docker/open-webui/docker-compose.yml`:**
+
+```yaml
+services:
+  open-webui:
+    image: ghcr.io/open-webui/open-webui:main
+    container_name: open-webui
+    restart: unless-stopped
+    volumes:
+      - ./open-webui-data:/app/backend/data
+    environment:
+      - OLLAMA_BASE_URL=http://host.docker.internal:11434
+    ports:
+      - "3030:8080"
+    networks:
+      - homeai
+    extra_hosts:
+      - "host.docker.internal:host-gateway"
+
+networks:
+  homeai:
+    external: true
+```
+
+Port `3030` chosen to avoid conflict with Gitea (3000).
+
+### 4. Benchmark Script — `scripts/benchmark.sh`
+
+Measures tokens/sec for each model to inform model selection per task:
+
+```bash
+#!/usr/bin/env bash
+PROMPT="Tell me a joke about computers."
+for model in llama3.3:70b qwen2.5:72b qwen2.5:7b; do
+  echo "=== $model ==="
+  time ollama run "$model" "$PROMPT" --nowordwrap
+done
+```
+
+Results documented in `scripts/benchmark-results.md`.
+
+### 5. API Verification
+
+```bash
+# Check Ollama is running
+curl http://localhost:11434/api/tags
+
+# Test OpenAI-compatible endpoint (used by P3, P4)
+curl http://localhost:11434/v1/chat/completions \
+  -H "Content-Type: application/json" \
+  -d '{
+    "model": "qwen2.5:7b",
+    "messages": [{"role": "user", "content": "Hello"}]
+  }'
+```
+
+### 6. Model Selection Guide
+
+Document in `scripts/benchmark-results.md` after benchmarking:
+
+| Task | Model | Reason |
+|---|---|---|
+| Main conversation | `llama3.3:70b` | Best quality |
+| Quick/real-time tasks | `qwen2.5:7b` | Lowest latency |
+| Code generation (skills) | `qwen2.5-coder:32b` | Best code quality |
+| Embeddings (mem0) | `nomic-embed-text` | Compact, fast |
+
+---
+
+## Interface Contract
+
+- **Ollama API:** `http://localhost:11434` (native Ollama)
+- **OpenAI-compatible API:** `http://localhost:11434/v1` — used by P3, P4, P7
+- **Open WebUI:** `http://localhost:3030`
+
+Add to `~/server/.env.services`:
+```dotenv
+OLLAMA_URL=http://localhost:11434
+OLLAMA_API_URL=http://localhost:11434/v1
+OPEN_WEBUI_URL=http://localhost:3030
+```
+
+---
+
+## Implementation Steps
+
+- [ ] Install Ollama via brew
+- [ ] Verify `ollama serve` starts and responds at port 11434
+- [ ] Write launchd plist, load it, verify auto-start on reboot
+- [ ] Write `ollama-models.txt` with model list
+- [ ] Run `scripts/pull-models.sh` — pull all models (allow time for large downloads)
+- [ ] Run `scripts/benchmark.sh` — record results in `benchmark-results.md`
+- [ ] Deploy Open WebUI via Docker compose
+- [ ] Verify Open WebUI can chat with all models
+- [ ] Add `OLLAMA_URL` and `OPEN_WEBUI_URL` to `.env.services`
+- [ ] Add Ollama and Open WebUI monitors to Uptime Kuma
+
+---
+
+## Success Criteria
+
+- [ ] `curl http://localhost:11434/api/tags` returns all expected models
+- [ ] `llama3.3:70b` generates a coherent response in Open WebUI
+- [ ] Ollama survives Mac Mini reboot without manual intervention
+- [ ] Benchmark results documented — at least one model achieving >10 tok/s
+- [ ] Open WebUI accessible at `http://localhost:3030` via Tailscale