Full project plan across 8 sub-projects (homeai-infra, homeai-llm, homeai-voice, homeai-agent, homeai-character, homeai-esp32, homeai-visual, homeai-images). Includes per-project PLAN.md files, top-level PROJECT_PLAN.md, and master TODO.md. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
394 lines
11 KiB
Markdown
394 lines
11 KiB
Markdown
# P8: homeai-images — Image Generation
|
||
|
||
> Phase 6 | Depends on: P4 (OpenClaw skill runner) | Independent of P6, P7
|
||
|
||
---
|
||
|
||
## Goal
|
||
|
||
ComfyUI running natively on Mac Mini with SDXL and Flux.1 models. A character LoRA trained for consistent appearance. OpenClaw skill exposes image generation as a callable tool. Saved workflows cover the most common use cases.
|
||
|
||
---
|
||
|
||
## Why Native (not Docker)
|
||
|
||
Same reasoning as Ollama: ComfyUI needs Metal GPU acceleration. Docker on Mac can't access the GPU. ComfyUI runs natively as a launchd service.
|
||
|
||
---
|
||
|
||
## Installation
|
||
|
||
```bash
|
||
# Clone ComfyUI
|
||
git clone https://github.com/comfyanonymous/ComfyUI ~/ComfyUI
|
||
cd ~/ComfyUI
|
||
|
||
# Install dependencies (Python 3.11+, venv recommended)
|
||
python3 -m venv venv
|
||
source venv/bin/activate
|
||
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu
|
||
pip install -r requirements.txt
|
||
|
||
# Launch
|
||
python main.py --listen 0.0.0.0 --port 8188
|
||
```
|
||
|
||
**Note:** Use the PyTorch MPS backend for Apple Silicon:
|
||
|
||
```python
|
||
# ComfyUI auto-detects MPS — no extra config needed
|
||
# Verify by checking ComfyUI startup logs for "Using device: mps"
|
||
```
|
||
|
||
### launchd plist — `com.homeai.comfyui.plist`
|
||
|
||
```xml
|
||
<?xml version="1.0" encoding="UTF-8"?>
|
||
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
|
||
<plist version="1.0">
|
||
<dict>
|
||
<key>Label</key>
|
||
<string>com.homeai.comfyui</string>
|
||
<key>ProgramArguments</key>
|
||
<array>
|
||
<string>/Users/<username>/ComfyUI/venv/bin/python</string>
|
||
<string>/Users/<username>/ComfyUI/main.py</string>
|
||
<string>--listen</string>
|
||
<string>0.0.0.0</string>
|
||
<string>--port</string>
|
||
<string>8188</string>
|
||
</array>
|
||
<key>WorkingDirectory</key>
|
||
<string>/Users/<username>/ComfyUI</string>
|
||
<key>RunAtLoad</key>
|
||
<true/>
|
||
<key>KeepAlive</key>
|
||
<true/>
|
||
<key>StandardOutPath</key>
|
||
<string>/tmp/comfyui.log</string>
|
||
<key>StandardErrorPath</key>
|
||
<string>/tmp/comfyui.err</string>
|
||
</dict>
|
||
</plist>
|
||
```
|
||
|
||
---
|
||
|
||
## Model Downloads
|
||
|
||
### Model Manifest
|
||
|
||
`~/ComfyUI/models/` structure:
|
||
|
||
```
|
||
checkpoints/
|
||
├── sd_xl_base_1.0.safetensors # SDXL base
|
||
├── flux1-dev.safetensors # Flux.1-dev (high quality)
|
||
└── flux1-schnell.safetensors # Flux.1-schnell (fast drafts)
|
||
|
||
vae/
|
||
├── sdxl_vae.safetensors
|
||
└── ae.safetensors # Flux VAE
|
||
|
||
clip/
|
||
├── clip_l.safetensors
|
||
└── t5xxl_fp16.safetensors # Flux text encoder
|
||
|
||
controlnet/
|
||
├── controlnet-canny-sdxl.safetensors
|
||
└── controlnet-depth-sdxl.safetensors
|
||
|
||
loras/
|
||
└── aria-v1.safetensors # Character LoRA (trained locally)
|
||
```
|
||
|
||
### Download Script — `scripts/download-models.sh`
|
||
|
||
```bash
|
||
#!/usr/bin/env bash
|
||
MODELS_DIR=~/ComfyUI/models
|
||
|
||
# HuggingFace downloads (requires huggingface-cli or wget)
|
||
pip install huggingface_hub
|
||
|
||
python3 -c "
|
||
from huggingface_hub import hf_hub_download
|
||
import os
|
||
|
||
downloads = [
|
||
('stabilityai/stable-diffusion-xl-base-1.0', 'sd_xl_base_1.0.safetensors', 'checkpoints'),
|
||
('black-forest-labs/FLUX.1-schnell', 'flux1-schnell.safetensors', 'checkpoints'),
|
||
]
|
||
|
||
for repo, filename, subdir in downloads:
|
||
hf_hub_download(
|
||
repo_id=repo,
|
||
filename=filename,
|
||
local_dir=f'{os.path.expanduser(\"~/ComfyUI/models\")}/{subdir}'
|
||
)
|
||
"
|
||
```
|
||
|
||
> Flux.1-dev requires accepting HuggingFace license agreement. Download manually if script fails.
|
||
|
||
---
|
||
|
||
## Saved Workflows
|
||
|
||
All workflows stored as ComfyUI JSON in `homeai-images/workflows/`.
|
||
|
||
### `portrait.json` — Character Portrait
|
||
|
||
Standard character portrait with expression control.
|
||
|
||
Key nodes:
|
||
- **CheckpointLoader:** SDXL base
|
||
- **LoraLoader:** aria character LoRA
|
||
- **CLIPTextEncode:** positive prompt includes character description + expression
|
||
- **KSampler:** 25 steps, DPM++ 2M, CFG 7
|
||
- **VAEDecode → SaveImage**
|
||
|
||
Positive prompt template:
|
||
```
|
||
aria, (character lora), 1girl, solo, portrait, looking at viewer,
|
||
soft lighting, detailed face, high quality, masterpiece,
|
||
<EXPRESSION_PLACEHOLDER>
|
||
```
|
||
|
||
### `scene.json` — Character in Scene with ControlNet
|
||
|
||
Uses ControlNet depth/canny for pose control.
|
||
|
||
Key nodes:
|
||
- **LoadImage:** input pose reference image
|
||
- **ControlNetLoader:** canny or depth model
|
||
- **ControlNetApply:** apply to conditioning
|
||
- **KSampler** with ControlNet guidance
|
||
|
||
### `quick.json` — Fast Draft via Flux.1-schnell
|
||
|
||
Low-step, fast generation for quick previews.
|
||
|
||
Key nodes:
|
||
- **CheckpointLoader:** flux1-schnell
|
||
- **KSampler:** 4 steps, Euler, CFG 1 (Flux uses CFG=1)
|
||
- Output: 512×512 or 768×768
|
||
|
||
### `upscale.json` — 2× Upscale
|
||
|
||
Takes existing image, upscales 2× with detail enhancement.
|
||
|
||
Key nodes:
|
||
- **LoadImage**
|
||
- **UpscaleModelLoader:** `4x_NMKD-Siax_200k.pth` (download separately)
|
||
- **ImageUpscaleWithModel**
|
||
- **KSampler img2img** for detail pass
|
||
|
||
---
|
||
|
||
## `comfyui.py` Skill — OpenClaw Integration
|
||
|
||
Full implementation (replaces stub from P4).
|
||
|
||
File: `homeai-images/skills/comfyui.py`
|
||
|
||
```python
|
||
"""
|
||
ComfyUI image generation skill for OpenClaw.
|
||
Submits workflow JSON via ComfyUI REST API and returns generated image path.
|
||
"""
|
||
|
||
import json
|
||
import time
|
||
import uuid
|
||
import requests
|
||
from pathlib import Path
|
||
|
||
COMFYUI_URL = "http://localhost:8188"
|
||
WORKFLOWS_DIR = Path(__file__).parent.parent / "workflows"
|
||
OUTPUT_DIR = Path.home() / "ComfyUI" / "output"
|
||
|
||
def generate(workflow_name: str, params: dict = None) -> str:
|
||
"""
|
||
Submit a named workflow to ComfyUI.
|
||
Returns the path of the generated image.
|
||
|
||
Args:
|
||
workflow_name: Name of workflow JSON (without .json extension)
|
||
params: Dict of node overrides, e.g. {"positive_prompt": "...", "steps": 20}
|
||
|
||
Returns:
|
||
Absolute path to generated image file
|
||
"""
|
||
workflow_path = WORKFLOWS_DIR / f"{workflow_name}.json"
|
||
if not workflow_path.exists():
|
||
raise ValueError(f"Workflow '{workflow_name}' not found at {workflow_path}")
|
||
|
||
workflow = json.loads(workflow_path.read_text())
|
||
|
||
# Apply param overrides
|
||
if params:
|
||
workflow = _apply_params(workflow, params)
|
||
|
||
# Submit to ComfyUI queue
|
||
client_id = str(uuid.uuid4())
|
||
prompt_id = _queue_prompt(workflow, client_id)
|
||
|
||
# Poll for completion
|
||
image_path = _wait_for_output(prompt_id, client_id)
|
||
return str(image_path)
|
||
|
||
|
||
def _queue_prompt(workflow: dict, client_id: str) -> str:
|
||
resp = requests.post(
|
||
f"{COMFYUI_URL}/prompt",
|
||
json={"prompt": workflow, "client_id": client_id}
|
||
)
|
||
resp.raise_for_status()
|
||
return resp.json()["prompt_id"]
|
||
|
||
|
||
def _wait_for_output(prompt_id: str, client_id: str, timeout: int = 120) -> Path:
|
||
start = time.time()
|
||
while time.time() - start < timeout:
|
||
resp = requests.get(f"{COMFYUI_URL}/history/{prompt_id}")
|
||
history = resp.json()
|
||
if prompt_id in history:
|
||
outputs = history[prompt_id]["outputs"]
|
||
for node_output in outputs.values():
|
||
if "images" in node_output:
|
||
img = node_output["images"][0]
|
||
return OUTPUT_DIR / img["subfolder"] / img["filename"]
|
||
time.sleep(2)
|
||
raise TimeoutError(f"ComfyUI generation timed out after {timeout}s")
|
||
|
||
|
||
def _apply_params(workflow: dict, params: dict) -> dict:
|
||
"""
|
||
Apply parameter overrides to workflow nodes.
|
||
Expects workflow nodes to have a 'title' field for addressing.
|
||
e.g., params={"positive_prompt": "new prompt"} updates node titled "positive_prompt"
|
||
"""
|
||
for node_id, node in workflow.items():
|
||
title = node.get("_meta", {}).get("title", "")
|
||
if title in params:
|
||
node["inputs"]["text"] = params[title]
|
||
return workflow
|
||
|
||
|
||
# Convenience wrappers for OpenClaw
|
||
def portrait(expression: str = "neutral", extra_prompt: str = "") -> str:
|
||
return generate("portrait", {"positive_prompt": f"aria, {expression}, {extra_prompt}"})
|
||
|
||
def quick(prompt: str) -> str:
|
||
return generate("quick", {"positive_prompt": prompt})
|
||
|
||
def scene(prompt: str, controlnet_image_path: str = None) -> str:
|
||
params = {"positive_prompt": prompt}
|
||
if controlnet_image_path:
|
||
params["controlnet_image"] = controlnet_image_path
|
||
return generate("scene", params)
|
||
```
|
||
|
||
---
|
||
|
||
## Character LoRA Training
|
||
|
||
A LoRA trains the model to consistently generate the character's appearance.
|
||
|
||
### Dataset Preparation
|
||
|
||
1. Collect 20–50 reference images of the character (or commission a character sheet)
|
||
2. Consistent style, multiple angles/expressions
|
||
3. Resize to 1024×1024, square crop
|
||
4. Write captions: `aria, 1girl, solo, <specific description>`
|
||
5. Store in `~/lora-training/aria/`
|
||
|
||
### Training
|
||
|
||
Use **kohya_ss** or **SimpleTuner** for LoRA training on Apple Silicon:
|
||
|
||
```bash
|
||
# kohya_ss (SDXL LoRA)
|
||
git clone https://github.com/bmaltais/kohya_ss
|
||
pip install -r requirements.txt
|
||
|
||
# Training config — key params for MPS
|
||
python train_network.py \
|
||
--pretrained_model_name_or_path=~/ComfyUI/models/checkpoints/sd_xl_base_1.0.safetensors \
|
||
--train_data_dir=~/lora-training/aria \
|
||
--output_dir=~/ComfyUI/models/loras \
|
||
--output_name=aria-v1 \
|
||
--network_module=networks.lora \
|
||
--network_dim=32 \
|
||
--network_alpha=16 \
|
||
--max_train_epochs=10 \
|
||
--learning_rate=1e-4
|
||
```
|
||
|
||
> Training on M4 Pro via MPS: expect 1–4 hours for a 20-image dataset at 10 epochs.
|
||
|
||
---
|
||
|
||
## Directory Layout
|
||
|
||
```
|
||
homeai-images/
|
||
├── workflows/
|
||
│ ├── portrait.json
|
||
│ ├── scene.json
|
||
│ ├── quick.json
|
||
│ └── upscale.json
|
||
└── skills/
|
||
└── comfyui.py
|
||
```
|
||
|
||
---
|
||
|
||
## Interface Contracts
|
||
|
||
**Consumes:**
|
||
- ComfyUI REST API: `http://localhost:8188`
|
||
- Workflows from `homeai-images/workflows/`
|
||
- Character LoRA from `~/ComfyUI/models/loras/aria-v1.safetensors`
|
||
|
||
**Exposes:**
|
||
- `comfyui.generate(workflow, params)` → image path — called by P4 OpenClaw
|
||
|
||
**Add to `.env.services`:**
|
||
```dotenv
|
||
COMFYUI_URL=http://localhost:8188
|
||
```
|
||
|
||
---
|
||
|
||
## Implementation Steps
|
||
|
||
- [ ] Clone ComfyUI to `~/ComfyUI/`, install deps in venv
|
||
- [ ] Verify MPS is detected at launch (`Using device: mps` in logs)
|
||
- [ ] Write and load launchd plist
|
||
- [ ] Download SDXL base model via `scripts/download-models.sh`
|
||
- [ ] Download Flux.1-schnell
|
||
- [ ] Test basic generation via ComfyUI web UI (browse to port 8188)
|
||
- [ ] Build and save `quick.json` workflow in ComfyUI UI, export JSON
|
||
- [ ] Build and save `portrait.json` workflow, export JSON
|
||
- [ ] Build and save `scene.json` workflow with ControlNet, export JSON
|
||
- [ ] Write `skills/comfyui.py` full implementation
|
||
- [ ] Test skill: `comfyui.quick("a cat sitting on a couch")` → image file
|
||
- [ ] Collect character reference images for LoRA training
|
||
- [ ] Train SDXL LoRA with kohya_ss
|
||
- [ ] Load LoRA in `portrait.json` workflow, verify character consistency
|
||
- [ ] Symlink `skills/` to `~/.openclaw/skills/`
|
||
- [ ] Test via OpenClaw: "Generate a portrait of Aria looking happy"
|
||
|
||
---
|
||
|
||
## Success Criteria
|
||
|
||
- [ ] ComfyUI UI accessible at `http://localhost:8188` after reboot
|
||
- [ ] `quick.json` workflow generates an image in <30s on M4 Pro
|
||
- [ ] `portrait.json` with character LoRA produces consistent character appearance
|
||
- [ ] `comfyui.generate("quick", {"positive_prompt": "test"})` returns a valid image path
|
||
- [ ] Generated images are saved to `~/ComfyUI/output/`
|
||
- [ ] ComfyUI survives Mac Mini reboot via launchd
|