Initial project structure and planning docs
Full project plan across 8 sub-projects (homeai-infra, homeai-llm, homeai-voice, homeai-agent, homeai-character, homeai-esp32, homeai-visual, homeai-images). Includes per-project PLAN.md files, top-level PROJECT_PLAN.md, and master TODO.md. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
202
homeai-llm/PLAN.md
Normal file
202
homeai-llm/PLAN.md
Normal file
@@ -0,0 +1,202 @@
|
||||
# P2: homeai-llm — Local LLM Runtime
|
||||
|
||||
> Phase 1 | Depends on: P1 (infra up) | Blocked by: nothing
|
||||
|
||||
---
|
||||
|
||||
## Goal
|
||||
|
||||
Ollama running natively on Mac Mini with target models available. Open WebUI connected and accessible. LLM API ready for all downstream consumers (P3, P4, P7).
|
||||
|
||||
---
|
||||
|
||||
## Why Native (not Docker)
|
||||
|
||||
Ollama must run natively — not in Docker — because:
|
||||
- Docker on Mac cannot access Apple Metal GPU (runs in a Linux VM)
|
||||
- Native Ollama uses Metal for GPU acceleration, giving 3–5× faster inference
|
||||
- Ollama's launchd integration keeps it alive across reboots
|
||||
|
||||
---
|
||||
|
||||
## Deliverables
|
||||
|
||||
### 1. Ollama Installation
|
||||
|
||||
```bash
|
||||
# Install
|
||||
brew install ollama
|
||||
|
||||
# Or direct install
|
||||
curl -fsSL https://ollama.com/install.sh | sh
|
||||
```
|
||||
|
||||
Ollama runs as a background process. Configure as a launchd service for reboot survival.
|
||||
|
||||
**launchd plist:** `~/Library/LaunchAgents/com.ollama.ollama.plist`
|
||||
|
||||
```xml
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
|
||||
<plist version="1.0">
|
||||
<dict>
|
||||
<key>Label</key>
|
||||
<string>com.ollama.ollama</string>
|
||||
<key>ProgramArguments</key>
|
||||
<array>
|
||||
<string>/usr/local/bin/ollama</string>
|
||||
<string>serve</string>
|
||||
</array>
|
||||
<key>RunAtLoad</key>
|
||||
<true/>
|
||||
<key>KeepAlive</key>
|
||||
<true/>
|
||||
<key>StandardOutPath</key>
|
||||
<string>/tmp/ollama.log</string>
|
||||
<key>StandardErrorPath</key>
|
||||
<string>/tmp/ollama.err</string>
|
||||
</dict>
|
||||
</plist>
|
||||
```
|
||||
|
||||
Load: `launchctl load ~/Library/LaunchAgents/com.ollama.ollama.plist`
|
||||
|
||||
### 2. Model Manifest — `ollama-models.txt`
|
||||
|
||||
Pinned models pulled to Mac Mini:
|
||||
|
||||
```
|
||||
# Primary — high quality responses
|
||||
llama3.3:70b
|
||||
qwen2.5:72b
|
||||
|
||||
# Fast — low-latency tasks (timers, quick queries, TTS pre-processing)
|
||||
qwen2.5:7b
|
||||
|
||||
# Code — for n8n/skill writing assistance
|
||||
qwen2.5-coder:32b
|
||||
|
||||
# Embedding — for mem0 semantic search
|
||||
nomic-embed-text
|
||||
```
|
||||
|
||||
Pull script (`scripts/pull-models.sh`):
|
||||
```bash
|
||||
#!/usr/bin/env bash
|
||||
while IFS= read -r model; do
|
||||
[[ "$model" =~ ^#.*$ || -z "$model" ]] && continue
|
||||
echo "Pulling $model..."
|
||||
ollama pull "$model"
|
||||
done < ../ollama-models.txt
|
||||
```
|
||||
|
||||
### 3. Open WebUI — Docker
|
||||
|
||||
Open WebUI connects to Ollama over the Docker-to-host bridge (`host.docker.internal`):
|
||||
|
||||
**`docker/open-webui/docker-compose.yml`:**
|
||||
|
||||
```yaml
|
||||
services:
|
||||
open-webui:
|
||||
image: ghcr.io/open-webui/open-webui:main
|
||||
container_name: open-webui
|
||||
restart: unless-stopped
|
||||
volumes:
|
||||
- ./open-webui-data:/app/backend/data
|
||||
environment:
|
||||
- OLLAMA_BASE_URL=http://host.docker.internal:11434
|
||||
ports:
|
||||
- "3030:8080"
|
||||
networks:
|
||||
- homeai
|
||||
extra_hosts:
|
||||
- "host.docker.internal:host-gateway"
|
||||
|
||||
networks:
|
||||
homeai:
|
||||
external: true
|
||||
```
|
||||
|
||||
Port `3030` chosen to avoid conflict with Gitea (3000).
|
||||
|
||||
### 4. Benchmark Script — `scripts/benchmark.sh`
|
||||
|
||||
Measures tokens/sec for each model to inform model selection per task:
|
||||
|
||||
```bash
|
||||
#!/usr/bin/env bash
|
||||
PROMPT="Tell me a joke about computers."
|
||||
for model in llama3.3:70b qwen2.5:72b qwen2.5:7b; do
|
||||
echo "=== $model ==="
|
||||
time ollama run "$model" "$PROMPT" --nowordwrap
|
||||
done
|
||||
```
|
||||
|
||||
Results documented in `scripts/benchmark-results.md`.
|
||||
|
||||
### 5. API Verification
|
||||
|
||||
```bash
|
||||
# Check Ollama is running
|
||||
curl http://localhost:11434/api/tags
|
||||
|
||||
# Test OpenAI-compatible endpoint (used by P3, P4)
|
||||
curl http://localhost:11434/v1/chat/completions \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"model": "qwen2.5:7b",
|
||||
"messages": [{"role": "user", "content": "Hello"}]
|
||||
}'
|
||||
```
|
||||
|
||||
### 6. Model Selection Guide
|
||||
|
||||
Document in `scripts/benchmark-results.md` after benchmarking:
|
||||
|
||||
| Task | Model | Reason |
|
||||
|---|---|---|
|
||||
| Main conversation | `llama3.3:70b` | Best quality |
|
||||
| Quick/real-time tasks | `qwen2.5:7b` | Lowest latency |
|
||||
| Code generation (skills) | `qwen2.5-coder:32b` | Best code quality |
|
||||
| Embeddings (mem0) | `nomic-embed-text` | Compact, fast |
|
||||
|
||||
---
|
||||
|
||||
## Interface Contract
|
||||
|
||||
- **Ollama API:** `http://localhost:11434` (native Ollama)
|
||||
- **OpenAI-compatible API:** `http://localhost:11434/v1` — used by P3, P4, P7
|
||||
- **Open WebUI:** `http://localhost:3030`
|
||||
|
||||
Add to `~/server/.env.services`:
|
||||
```dotenv
|
||||
OLLAMA_URL=http://localhost:11434
|
||||
OLLAMA_API_URL=http://localhost:11434/v1
|
||||
OPEN_WEBUI_URL=http://localhost:3030
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Implementation Steps
|
||||
|
||||
- [ ] Install Ollama via brew
|
||||
- [ ] Verify `ollama serve` starts and responds at port 11434
|
||||
- [ ] Write launchd plist, load it, verify auto-start on reboot
|
||||
- [ ] Write `ollama-models.txt` with model list
|
||||
- [ ] Run `scripts/pull-models.sh` — pull all models (allow time for large downloads)
|
||||
- [ ] Run `scripts/benchmark.sh` — record results in `benchmark-results.md`
|
||||
- [ ] Deploy Open WebUI via Docker compose
|
||||
- [ ] Verify Open WebUI can chat with all models
|
||||
- [ ] Add `OLLAMA_URL` and `OPEN_WEBUI_URL` to `.env.services`
|
||||
- [ ] Add Ollama and Open WebUI monitors to Uptime Kuma
|
||||
|
||||
---
|
||||
|
||||
## Success Criteria
|
||||
|
||||
- [ ] `curl http://localhost:11434/api/tags` returns all expected models
|
||||
- [ ] `llama3.3:70b` generates a coherent response in Open WebUI
|
||||
- [ ] Ollama survives Mac Mini reboot without manual intervention
|
||||
- [ ] Benchmark results documented — at least one model achieving >10 tok/s
|
||||
- [ ] Open WebUI accessible at `http://localhost:3030` via Tailscale
|
||||
Reference in New Issue
Block a user