Initial project structure and planning docs

Full project plan across 8 sub-projects (homeai-infra, homeai-llm, homeai-voice, homeai-agent, homeai-character, homeai-esp32, homeai-visual, homeai-images). Includes per-project PLAN.md files, top-level PROJECT_PLAN.md, and master TODO.md. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-04 01:11:37 +00:00
commit 38247d7cc4
11 changed files with 3060 additions and 0 deletions
--- a/homeai-images/PLAN.md
+++ b/homeai-images/PLAN.md
@@ -0,0 +1,393 @@
+# P8: homeai-images — Image Generation
+
+> Phase 6 | Depends on: P4 (OpenClaw skill runner) | Independent of P6, P7
+
+---
+
+## Goal
+
+ComfyUI running natively on Mac Mini with SDXL and Flux.1 models. A character LoRA trained for consistent appearance. OpenClaw skill exposes image generation as a callable tool. Saved workflows cover the most common use cases.
+
+---
+
+## Why Native (not Docker)
+
+Same reasoning as Ollama: ComfyUI needs Metal GPU acceleration. Docker on Mac can't access the GPU. ComfyUI runs natively as a launchd service.
+
+---
+
+## Installation
+
+```bash
+# Clone ComfyUI
+git clone https://github.com/comfyanonymous/ComfyUI ~/ComfyUI
+cd ~/ComfyUI
+
+# Install dependencies (Python 3.11+, venv recommended)
+python3 -m venv venv
+source venv/bin/activate
+pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu
+pip install -r requirements.txt
+
+# Launch
+python main.py --listen 0.0.0.0 --port 8188
+```
+
+**Note:** Use the PyTorch MPS backend for Apple Silicon:
+
+```python
+# ComfyUI auto-detects MPS — no extra config needed
+# Verify by checking ComfyUI startup logs for "Using device: mps"
+```
+
+### launchd plist — `com.homeai.comfyui.plist`
+
+```xml
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
+<plist version="1.0">
+<dict>
+    <key>Label</key>
+    <string>com.homeai.comfyui</string>
+    <key>ProgramArguments</key>
+    <array>
+        <string>/Users/<username>/ComfyUI/venv/bin/python</string>
+        <string>/Users/<username>/ComfyUI/main.py</string>
+        <string>--listen</string>
+        <string>0.0.0.0</string>
+        <string>--port</string>
+        <string>8188</string>
+    </array>
+    <key>WorkingDirectory</key>
+    <string>/Users/<username>/ComfyUI</string>
+    <key>RunAtLoad</key>
+    <true/>
+    <key>KeepAlive</key>
+    <true/>
+    <key>StandardOutPath</key>
+    <string>/tmp/comfyui.log</string>
+    <key>StandardErrorPath</key>
+    <string>/tmp/comfyui.err</string>
+</dict>
+</plist>
+```
+
+---
+
+## Model Downloads
+
+### Model Manifest
+
+`~/ComfyUI/models/` structure:
+
+```
+checkpoints/
+├── sd_xl_base_1.0.safetensors          # SDXL base
+├── flux1-dev.safetensors               # Flux.1-dev (high quality)
+└── flux1-schnell.safetensors           # Flux.1-schnell (fast drafts)
+
+vae/
+├── sdxl_vae.safetensors
+└── ae.safetensors                      # Flux VAE
+
+clip/
+├── clip_l.safetensors
+└── t5xxl_fp16.safetensors              # Flux text encoder
+
+controlnet/
+├── controlnet-canny-sdxl.safetensors
+└── controlnet-depth-sdxl.safetensors
+
+loras/
+└── aria-v1.safetensors                 # Character LoRA (trained locally)
+```
+
+### Download Script — `scripts/download-models.sh`
+
+```bash
+#!/usr/bin/env bash
+MODELS_DIR=~/ComfyUI/models
+
+# HuggingFace downloads (requires huggingface-cli or wget)
+pip install huggingface_hub
+
+python3 -c "
+from huggingface_hub import hf_hub_download
+import os
+
+downloads = [
+    ('stabilityai/stable-diffusion-xl-base-1.0', 'sd_xl_base_1.0.safetensors', 'checkpoints'),
+    ('black-forest-labs/FLUX.1-schnell', 'flux1-schnell.safetensors', 'checkpoints'),
+]
+
+for repo, filename, subdir in downloads:
+    hf_hub_download(
+        repo_id=repo,
+        filename=filename,
+        local_dir=f'{os.path.expanduser(\"~/ComfyUI/models\")}/{subdir}'
+    )
+"
+```
+
+> Flux.1-dev requires accepting HuggingFace license agreement. Download manually if script fails.
+
+---
+
+## Saved Workflows
+
+All workflows stored as ComfyUI JSON in `homeai-images/workflows/`.
+
+### `portrait.json` — Character Portrait
+
+Standard character portrait with expression control.
+
+Key nodes:
+- **CheckpointLoader:** SDXL base
+- **LoraLoader:** aria character LoRA
+- **CLIPTextEncode:** positive prompt includes character description + expression
+- **KSampler:** 25 steps, DPM++ 2M, CFG 7
+- **VAEDecode → SaveImage**
+
+Positive prompt template:
+```
+aria, (character lora), 1girl, solo, portrait, looking at viewer,
+soft lighting, detailed face, high quality, masterpiece,
+<EXPRESSION_PLACEHOLDER>
+```
+
+### `scene.json` — Character in Scene with ControlNet
+
+Uses ControlNet depth/canny for pose control.
+
+Key nodes:
+- **LoadImage:** input pose reference image
+- **ControlNetLoader:** canny or depth model
+- **ControlNetApply:** apply to conditioning
+- **KSampler** with ControlNet guidance
+
+### `quick.json` — Fast Draft via Flux.1-schnell
+
+Low-step, fast generation for quick previews.
+
+Key nodes:
+- **CheckpointLoader:** flux1-schnell
+- **KSampler:** 4 steps, Euler, CFG 1 (Flux uses CFG=1)
+- Output: 512×512 or 768×768
+
+### `upscale.json` — 2× Upscale
+
+Takes existing image, upscales 2× with detail enhancement.
+
+Key nodes:
+- **LoadImage**
+- **UpscaleModelLoader:** `4x_NMKD-Siax_200k.pth` (download separately)
+- **ImageUpscaleWithModel**
+- **KSampler img2img** for detail pass
+
+---
+
+## `comfyui.py` Skill — OpenClaw Integration
+
+Full implementation (replaces stub from P4).
+
+File: `homeai-images/skills/comfyui.py`
+
+```python
+"""
+ComfyUI image generation skill for OpenClaw.
+Submits workflow JSON via ComfyUI REST API and returns generated image path.
+"""
+
+import json
+import time
+import uuid
+import requests
+from pathlib import Path
+
+COMFYUI_URL = "http://localhost:8188"
+WORKFLOWS_DIR = Path(__file__).parent.parent / "workflows"
+OUTPUT_DIR = Path.home() / "ComfyUI" / "output"
+
+def generate(workflow_name: str, params: dict = None) -> str:
+    """
+    Submit a named workflow to ComfyUI.
+    Returns the path of the generated image.
+
+    Args:
+        workflow_name: Name of workflow JSON (without .json extension)
+        params: Dict of node overrides, e.g. {"positive_prompt": "...", "steps": 20}
+
+    Returns:
+        Absolute path to generated image file
+    """
+    workflow_path = WORKFLOWS_DIR / f"{workflow_name}.json"
+    if not workflow_path.exists():
+        raise ValueError(f"Workflow '{workflow_name}' not found at {workflow_path}")
+
+    workflow = json.loads(workflow_path.read_text())
+
+    # Apply param overrides
+    if params:
+        workflow = _apply_params(workflow, params)
+
+    # Submit to ComfyUI queue
+    client_id = str(uuid.uuid4())
+    prompt_id = _queue_prompt(workflow, client_id)
+
+    # Poll for completion
+    image_path = _wait_for_output(prompt_id, client_id)
+    return str(image_path)
+
+
+def _queue_prompt(workflow: dict, client_id: str) -> str:
+    resp = requests.post(
+        f"{COMFYUI_URL}/prompt",
+        json={"prompt": workflow, "client_id": client_id}
+    )
+    resp.raise_for_status()
+    return resp.json()["prompt_id"]
+
+
+def _wait_for_output(prompt_id: str, client_id: str, timeout: int = 120) -> Path:
+    start = time.time()
+    while time.time() - start < timeout:
+        resp = requests.get(f"{COMFYUI_URL}/history/{prompt_id}")
+        history = resp.json()
+        if prompt_id in history:
+            outputs = history[prompt_id]["outputs"]
+            for node_output in outputs.values():
+                if "images" in node_output:
+                    img = node_output["images"][0]
+                    return OUTPUT_DIR / img["subfolder"] / img["filename"]
+        time.sleep(2)
+    raise TimeoutError(f"ComfyUI generation timed out after {timeout}s")
+
+
+def _apply_params(workflow: dict, params: dict) -> dict:
+    """
+    Apply parameter overrides to workflow nodes.
+    Expects workflow nodes to have a 'title' field for addressing.
+    e.g., params={"positive_prompt": "new prompt"} updates node titled "positive_prompt"
+    """
+    for node_id, node in workflow.items():
+        title = node.get("_meta", {}).get("title", "")
+        if title in params:
+            node["inputs"]["text"] = params[title]
+    return workflow
+
+
+# Convenience wrappers for OpenClaw
+def portrait(expression: str = "neutral", extra_prompt: str = "") -> str:
+    return generate("portrait", {"positive_prompt": f"aria, {expression}, {extra_prompt}"})
+
+def quick(prompt: str) -> str:
+    return generate("quick", {"positive_prompt": prompt})
+
+def scene(prompt: str, controlnet_image_path: str = None) -> str:
+    params = {"positive_prompt": prompt}
+    if controlnet_image_path:
+        params["controlnet_image"] = controlnet_image_path
+    return generate("scene", params)
+```
+
+---
+
+## Character LoRA Training
+
+A LoRA trains the model to consistently generate the character's appearance.
+
+### Dataset Preparation
+
+1. Collect 20–50 reference images of the character (or commission a character sheet)
+2. Consistent style, multiple angles/expressions
+3. Resize to 1024×1024, square crop
+4. Write captions: `aria, 1girl, solo, <specific description>`
+5. Store in `~/lora-training/aria/`
+
+### Training
+
+Use **kohya_ss** or **SimpleTuner** for LoRA training on Apple Silicon:
+
+```bash
+# kohya_ss (SDXL LoRA)
+git clone https://github.com/bmaltais/kohya_ss
+pip install -r requirements.txt
+
+# Training config — key params for MPS
+python train_network.py \
+  --pretrained_model_name_or_path=~/ComfyUI/models/checkpoints/sd_xl_base_1.0.safetensors \
+  --train_data_dir=~/lora-training/aria \
+  --output_dir=~/ComfyUI/models/loras \
+  --output_name=aria-v1 \
+  --network_module=networks.lora \
+  --network_dim=32 \
+  --network_alpha=16 \
+  --max_train_epochs=10 \
+  --learning_rate=1e-4
+```
+
+> Training on M4 Pro via MPS: expect 1–4 hours for a 20-image dataset at 10 epochs.
+
+---
+
+## Directory Layout
+
+```
+homeai-images/
+├── workflows/
+│   ├── portrait.json
+│   ├── scene.json
+│   ├── quick.json
+│   └── upscale.json
+└── skills/
+    └── comfyui.py
+```
+
+---
+
+## Interface Contracts
+
+**Consumes:**
+- ComfyUI REST API: `http://localhost:8188`
+- Workflows from `homeai-images/workflows/`
+- Character LoRA from `~/ComfyUI/models/loras/aria-v1.safetensors`
+
+**Exposes:**
+- `comfyui.generate(workflow, params)` → image path — called by P4 OpenClaw
+
+**Add to `.env.services`:**
+```dotenv
+COMFYUI_URL=http://localhost:8188
+```
+
+---
+
+## Implementation Steps
+
+- [ ] Clone ComfyUI to `~/ComfyUI/`, install deps in venv
+- [ ] Verify MPS is detected at launch (`Using device: mps` in logs)
+- [ ] Write and load launchd plist
+- [ ] Download SDXL base model via `scripts/download-models.sh`
+- [ ] Download Flux.1-schnell
+- [ ] Test basic generation via ComfyUI web UI (browse to port 8188)
+- [ ] Build and save `quick.json` workflow in ComfyUI UI, export JSON
+- [ ] Build and save `portrait.json` workflow, export JSON
+- [ ] Build and save `scene.json` workflow with ControlNet, export JSON
+- [ ] Write `skills/comfyui.py` full implementation
+- [ ] Test skill: `comfyui.quick("a cat sitting on a couch")` → image file
+- [ ] Collect character reference images for LoRA training
+- [ ] Train SDXL LoRA with kohya_ss
+- [ ] Load LoRA in `portrait.json` workflow, verify character consistency
+- [ ] Symlink `skills/` to `~/.openclaw/skills/`
+- [ ] Test via OpenClaw: "Generate a portrait of Aria looking happy"
+
+---
+
+## Success Criteria
+
+- [ ] ComfyUI UI accessible at `http://localhost:8188` after reboot
+- [ ] `quick.json` workflow generates an image in <30s on M4 Pro
+- [ ] `portrait.json` with character LoRA produces consistent character appearance
+- [ ] `comfyui.generate("quick", {"positive_prompt": "test"})` returns a valid image path
+- [ ] Generated images are saved to `~/ComfyUI/output/`
+- [ ] ComfyUI survives Mac Mini reboot via launchd