# P2: homeai-llm — Local LLM Runtime > Phase 1 | Depends on: P1 (infra up) | Blocked by: nothing --- ## Goal Ollama running natively on Mac Mini with target models available. Open WebUI connected and accessible. LLM API ready for all downstream consumers (P3, P4, P7). --- ## Why Native (not Docker) Ollama must run natively — not in Docker — because: - Docker on Mac cannot access Apple Metal GPU (runs in a Linux VM) - Native Ollama uses Metal for GPU acceleration, giving 3–5× faster inference - Ollama's launchd integration keeps it alive across reboots --- ## Deliverables ### 1. Ollama Installation ```bash # Install brew install ollama # Or direct install curl -fsSL https://ollama.com/install.sh | sh ``` Ollama runs as a background process. Configure as a launchd service for reboot survival. **launchd plist:** `~/Library/LaunchAgents/com.ollama.ollama.plist` ```xml Label com.ollama.ollama ProgramArguments /usr/local/bin/ollama serve RunAtLoad KeepAlive StandardOutPath /tmp/ollama.log StandardErrorPath /tmp/ollama.err ``` Load: `launchctl load ~/Library/LaunchAgents/com.ollama.ollama.plist` ### 2. Model Manifest — `ollama-models.txt` Pinned models pulled to Mac Mini: ``` # Primary — high quality responses llama3.3:70b qwen2.5:72b # Fast — low-latency tasks (timers, quick queries, TTS pre-processing) qwen2.5:7b # Code — for n8n/skill writing assistance qwen2.5-coder:32b # Embedding — for mem0 semantic search nomic-embed-text ``` Pull script (`scripts/pull-models.sh`): ```bash #!/usr/bin/env bash while IFS= read -r model; do [[ "$model" =~ ^#.*$ || -z "$model" ]] && continue echo "Pulling $model..." ollama pull "$model" done < ../ollama-models.txt ``` ### 3. Open WebUI — Docker Open WebUI connects to Ollama over the Docker-to-host bridge (`host.docker.internal`): **`docker/open-webui/docker-compose.yml`:** ```yaml services: open-webui: image: ghcr.io/open-webui/open-webui:main container_name: open-webui restart: unless-stopped volumes: - ./open-webui-data:/app/backend/data environment: - OLLAMA_BASE_URL=http://host.docker.internal:11434 ports: - "3030:8080" networks: - homeai extra_hosts: - "host.docker.internal:host-gateway" networks: homeai: external: true ``` Port `3030` chosen to avoid conflict with Gitea (3000). ### 4. Benchmark Script — `scripts/benchmark.sh` Measures tokens/sec for each model to inform model selection per task: ```bash #!/usr/bin/env bash PROMPT="Tell me a joke about computers." for model in llama3.3:70b qwen2.5:72b qwen2.5:7b; do echo "=== $model ===" time ollama run "$model" "$PROMPT" --nowordwrap done ``` Results documented in `scripts/benchmark-results.md`. ### 5. API Verification ```bash # Check Ollama is running curl http://localhost:11434/api/tags # Test OpenAI-compatible endpoint (used by P3, P4) curl http://localhost:11434/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "qwen2.5:7b", "messages": [{"role": "user", "content": "Hello"}] }' ``` ### 6. Model Selection Guide Document in `scripts/benchmark-results.md` after benchmarking: | Task | Model | Reason | |---|---|---| | Main conversation | `llama3.3:70b` | Best quality | | Quick/real-time tasks | `qwen2.5:7b` | Lowest latency | | Code generation (skills) | `qwen2.5-coder:32b` | Best code quality | | Embeddings (mem0) | `nomic-embed-text` | Compact, fast | --- ## Interface Contract - **Ollama API:** `http://localhost:11434` (native Ollama) - **OpenAI-compatible API:** `http://localhost:11434/v1` — used by P3, P4, P7 - **Open WebUI:** `http://localhost:3030` Add to `~/server/.env.services`: ```dotenv OLLAMA_URL=http://localhost:11434 OLLAMA_API_URL=http://localhost:11434/v1 OPEN_WEBUI_URL=http://localhost:3030 ``` --- ## Implementation Steps - [ ] Install Ollama via brew - [ ] Verify `ollama serve` starts and responds at port 11434 - [ ] Write launchd plist, load it, verify auto-start on reboot - [ ] Write `ollama-models.txt` with model list - [ ] Run `scripts/pull-models.sh` — pull all models (allow time for large downloads) - [ ] Run `scripts/benchmark.sh` — record results in `benchmark-results.md` - [ ] Deploy Open WebUI via Docker compose - [ ] Verify Open WebUI can chat with all models - [ ] Add `OLLAMA_URL` and `OPEN_WEBUI_URL` to `.env.services` - [ ] Add Ollama and Open WebUI monitors to Uptime Kuma --- ## Success Criteria - [ ] `curl http://localhost:11434/api/tags` returns all expected models - [ ] `llama3.3:70b` generates a coherent response in Open WebUI - [ ] Ollama survives Mac Mini reboot without manual intervention - [ ] Benchmark results documented — at least one model achieving >10 tok/s - [ ] Open WebUI accessible at `http://localhost:3030` via Tailscale