Initial project structure and planning docs

Full project plan across 8 sub-projects (homeai-infra, homeai-llm,
homeai-voice, homeai-agent, homeai-character, homeai-esp32,
homeai-visual, homeai-images). Includes per-project PLAN.md files,
top-level PROJECT_PLAN.md, and master TODO.md.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
Aodhan Collins
2026-03-04 01:11:37 +00:00
commit 38247d7cc4
11 changed files with 3060 additions and 0 deletions

202
homeai-llm/PLAN.md Normal file
View File

@@ -0,0 +1,202 @@
# P2: homeai-llm — Local LLM Runtime
> Phase 1 | Depends on: P1 (infra up) | Blocked by: nothing
---
## Goal
Ollama running natively on Mac Mini with target models available. Open WebUI connected and accessible. LLM API ready for all downstream consumers (P3, P4, P7).
---
## Why Native (not Docker)
Ollama must run natively — not in Docker — because:
- Docker on Mac cannot access Apple Metal GPU (runs in a Linux VM)
- Native Ollama uses Metal for GPU acceleration, giving 35× faster inference
- Ollama's launchd integration keeps it alive across reboots
---
## Deliverables
### 1. Ollama Installation
```bash
# Install
brew install ollama
# Or direct install
curl -fsSL https://ollama.com/install.sh | sh
```
Ollama runs as a background process. Configure as a launchd service for reboot survival.
**launchd plist:** `~/Library/LaunchAgents/com.ollama.ollama.plist`
```xml
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>Label</key>
<string>com.ollama.ollama</string>
<key>ProgramArguments</key>
<array>
<string>/usr/local/bin/ollama</string>
<string>serve</string>
</array>
<key>RunAtLoad</key>
<true/>
<key>KeepAlive</key>
<true/>
<key>StandardOutPath</key>
<string>/tmp/ollama.log</string>
<key>StandardErrorPath</key>
<string>/tmp/ollama.err</string>
</dict>
</plist>
```
Load: `launchctl load ~/Library/LaunchAgents/com.ollama.ollama.plist`
### 2. Model Manifest — `ollama-models.txt`
Pinned models pulled to Mac Mini:
```
# Primary — high quality responses
llama3.3:70b
qwen2.5:72b
# Fast — low-latency tasks (timers, quick queries, TTS pre-processing)
qwen2.5:7b
# Code — for n8n/skill writing assistance
qwen2.5-coder:32b
# Embedding — for mem0 semantic search
nomic-embed-text
```
Pull script (`scripts/pull-models.sh`):
```bash
#!/usr/bin/env bash
while IFS= read -r model; do
[[ "$model" =~ ^#.*$ || -z "$model" ]] && continue
echo "Pulling $model..."
ollama pull "$model"
done < ../ollama-models.txt
```
### 3. Open WebUI — Docker
Open WebUI connects to Ollama over the Docker-to-host bridge (`host.docker.internal`):
**`docker/open-webui/docker-compose.yml`:**
```yaml
services:
open-webui:
image: ghcr.io/open-webui/open-webui:main
container_name: open-webui
restart: unless-stopped
volumes:
- ./open-webui-data:/app/backend/data
environment:
- OLLAMA_BASE_URL=http://host.docker.internal:11434
ports:
- "3030:8080"
networks:
- homeai
extra_hosts:
- "host.docker.internal:host-gateway"
networks:
homeai:
external: true
```
Port `3030` chosen to avoid conflict with Gitea (3000).
### 4. Benchmark Script — `scripts/benchmark.sh`
Measures tokens/sec for each model to inform model selection per task:
```bash
#!/usr/bin/env bash
PROMPT="Tell me a joke about computers."
for model in llama3.3:70b qwen2.5:72b qwen2.5:7b; do
echo "=== $model ==="
time ollama run "$model" "$PROMPT" --nowordwrap
done
```
Results documented in `scripts/benchmark-results.md`.
### 5. API Verification
```bash
# Check Ollama is running
curl http://localhost:11434/api/tags
# Test OpenAI-compatible endpoint (used by P3, P4)
curl http://localhost:11434/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "qwen2.5:7b",
"messages": [{"role": "user", "content": "Hello"}]
}'
```
### 6. Model Selection Guide
Document in `scripts/benchmark-results.md` after benchmarking:
| Task | Model | Reason |
|---|---|---|
| Main conversation | `llama3.3:70b` | Best quality |
| Quick/real-time tasks | `qwen2.5:7b` | Lowest latency |
| Code generation (skills) | `qwen2.5-coder:32b` | Best code quality |
| Embeddings (mem0) | `nomic-embed-text` | Compact, fast |
---
## Interface Contract
- **Ollama API:** `http://localhost:11434` (native Ollama)
- **OpenAI-compatible API:** `http://localhost:11434/v1` — used by P3, P4, P7
- **Open WebUI:** `http://localhost:3030`
Add to `~/server/.env.services`:
```dotenv
OLLAMA_URL=http://localhost:11434
OLLAMA_API_URL=http://localhost:11434/v1
OPEN_WEBUI_URL=http://localhost:3030
```
---
## Implementation Steps
- [ ] Install Ollama via brew
- [ ] Verify `ollama serve` starts and responds at port 11434
- [ ] Write launchd plist, load it, verify auto-start on reboot
- [ ] Write `ollama-models.txt` with model list
- [ ] Run `scripts/pull-models.sh` — pull all models (allow time for large downloads)
- [ ] Run `scripts/benchmark.sh` — record results in `benchmark-results.md`
- [ ] Deploy Open WebUI via Docker compose
- [ ] Verify Open WebUI can chat with all models
- [ ] Add `OLLAMA_URL` and `OPEN_WEBUI_URL` to `.env.services`
- [ ] Add Ollama and Open WebUI monitors to Uptime Kuma
---
## Success Criteria
- [ ] `curl http://localhost:11434/api/tags` returns all expected models
- [ ] `llama3.3:70b` generates a coherent response in Open WebUI
- [ ] Ollama survives Mac Mini reboot without manual intervention
- [ ] Benchmark results documented — at least one model achieving >10 tok/s
- [ ] Open WebUI accessible at `http://localhost:3030` via Tailscale