Initial project structure and planning docs
Full project plan across 8 sub-projects (homeai-infra, homeai-llm, homeai-voice, homeai-agent, homeai-character, homeai-esp32, homeai-visual, homeai-images). Includes per-project PLAN.md files, top-level PROJECT_PLAN.md, and master TODO.md. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
393
homeai-images/PLAN.md
Normal file
393
homeai-images/PLAN.md
Normal file
@@ -0,0 +1,393 @@
|
||||
# P8: homeai-images — Image Generation
|
||||
|
||||
> Phase 6 | Depends on: P4 (OpenClaw skill runner) | Independent of P6, P7
|
||||
|
||||
---
|
||||
|
||||
## Goal
|
||||
|
||||
ComfyUI running natively on Mac Mini with SDXL and Flux.1 models. A character LoRA trained for consistent appearance. OpenClaw skill exposes image generation as a callable tool. Saved workflows cover the most common use cases.
|
||||
|
||||
---
|
||||
|
||||
## Why Native (not Docker)
|
||||
|
||||
Same reasoning as Ollama: ComfyUI needs Metal GPU acceleration. Docker on Mac can't access the GPU. ComfyUI runs natively as a launchd service.
|
||||
|
||||
---
|
||||
|
||||
## Installation
|
||||
|
||||
```bash
|
||||
# Clone ComfyUI
|
||||
git clone https://github.com/comfyanonymous/ComfyUI ~/ComfyUI
|
||||
cd ~/ComfyUI
|
||||
|
||||
# Install dependencies (Python 3.11+, venv recommended)
|
||||
python3 -m venv venv
|
||||
source venv/bin/activate
|
||||
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu
|
||||
pip install -r requirements.txt
|
||||
|
||||
# Launch
|
||||
python main.py --listen 0.0.0.0 --port 8188
|
||||
```
|
||||
|
||||
**Note:** Use the PyTorch MPS backend for Apple Silicon:
|
||||
|
||||
```python
|
||||
# ComfyUI auto-detects MPS — no extra config needed
|
||||
# Verify by checking ComfyUI startup logs for "Using device: mps"
|
||||
```
|
||||
|
||||
### launchd plist — `com.homeai.comfyui.plist`
|
||||
|
||||
```xml
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
|
||||
<plist version="1.0">
|
||||
<dict>
|
||||
<key>Label</key>
|
||||
<string>com.homeai.comfyui</string>
|
||||
<key>ProgramArguments</key>
|
||||
<array>
|
||||
<string>/Users/<username>/ComfyUI/venv/bin/python</string>
|
||||
<string>/Users/<username>/ComfyUI/main.py</string>
|
||||
<string>--listen</string>
|
||||
<string>0.0.0.0</string>
|
||||
<string>--port</string>
|
||||
<string>8188</string>
|
||||
</array>
|
||||
<key>WorkingDirectory</key>
|
||||
<string>/Users/<username>/ComfyUI</string>
|
||||
<key>RunAtLoad</key>
|
||||
<true/>
|
||||
<key>KeepAlive</key>
|
||||
<true/>
|
||||
<key>StandardOutPath</key>
|
||||
<string>/tmp/comfyui.log</string>
|
||||
<key>StandardErrorPath</key>
|
||||
<string>/tmp/comfyui.err</string>
|
||||
</dict>
|
||||
</plist>
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Model Downloads
|
||||
|
||||
### Model Manifest
|
||||
|
||||
`~/ComfyUI/models/` structure:
|
||||
|
||||
```
|
||||
checkpoints/
|
||||
├── sd_xl_base_1.0.safetensors # SDXL base
|
||||
├── flux1-dev.safetensors # Flux.1-dev (high quality)
|
||||
└── flux1-schnell.safetensors # Flux.1-schnell (fast drafts)
|
||||
|
||||
vae/
|
||||
├── sdxl_vae.safetensors
|
||||
└── ae.safetensors # Flux VAE
|
||||
|
||||
clip/
|
||||
├── clip_l.safetensors
|
||||
└── t5xxl_fp16.safetensors # Flux text encoder
|
||||
|
||||
controlnet/
|
||||
├── controlnet-canny-sdxl.safetensors
|
||||
└── controlnet-depth-sdxl.safetensors
|
||||
|
||||
loras/
|
||||
└── aria-v1.safetensors # Character LoRA (trained locally)
|
||||
```
|
||||
|
||||
### Download Script — `scripts/download-models.sh`
|
||||
|
||||
```bash
|
||||
#!/usr/bin/env bash
|
||||
MODELS_DIR=~/ComfyUI/models
|
||||
|
||||
# HuggingFace downloads (requires huggingface-cli or wget)
|
||||
pip install huggingface_hub
|
||||
|
||||
python3 -c "
|
||||
from huggingface_hub import hf_hub_download
|
||||
import os
|
||||
|
||||
downloads = [
|
||||
('stabilityai/stable-diffusion-xl-base-1.0', 'sd_xl_base_1.0.safetensors', 'checkpoints'),
|
||||
('black-forest-labs/FLUX.1-schnell', 'flux1-schnell.safetensors', 'checkpoints'),
|
||||
]
|
||||
|
||||
for repo, filename, subdir in downloads:
|
||||
hf_hub_download(
|
||||
repo_id=repo,
|
||||
filename=filename,
|
||||
local_dir=f'{os.path.expanduser(\"~/ComfyUI/models\")}/{subdir}'
|
||||
)
|
||||
"
|
||||
```
|
||||
|
||||
> Flux.1-dev requires accepting HuggingFace license agreement. Download manually if script fails.
|
||||
|
||||
---
|
||||
|
||||
## Saved Workflows
|
||||
|
||||
All workflows stored as ComfyUI JSON in `homeai-images/workflows/`.
|
||||
|
||||
### `portrait.json` — Character Portrait
|
||||
|
||||
Standard character portrait with expression control.
|
||||
|
||||
Key nodes:
|
||||
- **CheckpointLoader:** SDXL base
|
||||
- **LoraLoader:** aria character LoRA
|
||||
- **CLIPTextEncode:** positive prompt includes character description + expression
|
||||
- **KSampler:** 25 steps, DPM++ 2M, CFG 7
|
||||
- **VAEDecode → SaveImage**
|
||||
|
||||
Positive prompt template:
|
||||
```
|
||||
aria, (character lora), 1girl, solo, portrait, looking at viewer,
|
||||
soft lighting, detailed face, high quality, masterpiece,
|
||||
<EXPRESSION_PLACEHOLDER>
|
||||
```
|
||||
|
||||
### `scene.json` — Character in Scene with ControlNet
|
||||
|
||||
Uses ControlNet depth/canny for pose control.
|
||||
|
||||
Key nodes:
|
||||
- **LoadImage:** input pose reference image
|
||||
- **ControlNetLoader:** canny or depth model
|
||||
- **ControlNetApply:** apply to conditioning
|
||||
- **KSampler** with ControlNet guidance
|
||||
|
||||
### `quick.json` — Fast Draft via Flux.1-schnell
|
||||
|
||||
Low-step, fast generation for quick previews.
|
||||
|
||||
Key nodes:
|
||||
- **CheckpointLoader:** flux1-schnell
|
||||
- **KSampler:** 4 steps, Euler, CFG 1 (Flux uses CFG=1)
|
||||
- Output: 512×512 or 768×768
|
||||
|
||||
### `upscale.json` — 2× Upscale
|
||||
|
||||
Takes existing image, upscales 2× with detail enhancement.
|
||||
|
||||
Key nodes:
|
||||
- **LoadImage**
|
||||
- **UpscaleModelLoader:** `4x_NMKD-Siax_200k.pth` (download separately)
|
||||
- **ImageUpscaleWithModel**
|
||||
- **KSampler img2img** for detail pass
|
||||
|
||||
---
|
||||
|
||||
## `comfyui.py` Skill — OpenClaw Integration
|
||||
|
||||
Full implementation (replaces stub from P4).
|
||||
|
||||
File: `homeai-images/skills/comfyui.py`
|
||||
|
||||
```python
|
||||
"""
|
||||
ComfyUI image generation skill for OpenClaw.
|
||||
Submits workflow JSON via ComfyUI REST API and returns generated image path.
|
||||
"""
|
||||
|
||||
import json
|
||||
import time
|
||||
import uuid
|
||||
import requests
|
||||
from pathlib import Path
|
||||
|
||||
COMFYUI_URL = "http://localhost:8188"
|
||||
WORKFLOWS_DIR = Path(__file__).parent.parent / "workflows"
|
||||
OUTPUT_DIR = Path.home() / "ComfyUI" / "output"
|
||||
|
||||
def generate(workflow_name: str, params: dict = None) -> str:
|
||||
"""
|
||||
Submit a named workflow to ComfyUI.
|
||||
Returns the path of the generated image.
|
||||
|
||||
Args:
|
||||
workflow_name: Name of workflow JSON (without .json extension)
|
||||
params: Dict of node overrides, e.g. {"positive_prompt": "...", "steps": 20}
|
||||
|
||||
Returns:
|
||||
Absolute path to generated image file
|
||||
"""
|
||||
workflow_path = WORKFLOWS_DIR / f"{workflow_name}.json"
|
||||
if not workflow_path.exists():
|
||||
raise ValueError(f"Workflow '{workflow_name}' not found at {workflow_path}")
|
||||
|
||||
workflow = json.loads(workflow_path.read_text())
|
||||
|
||||
# Apply param overrides
|
||||
if params:
|
||||
workflow = _apply_params(workflow, params)
|
||||
|
||||
# Submit to ComfyUI queue
|
||||
client_id = str(uuid.uuid4())
|
||||
prompt_id = _queue_prompt(workflow, client_id)
|
||||
|
||||
# Poll for completion
|
||||
image_path = _wait_for_output(prompt_id, client_id)
|
||||
return str(image_path)
|
||||
|
||||
|
||||
def _queue_prompt(workflow: dict, client_id: str) -> str:
|
||||
resp = requests.post(
|
||||
f"{COMFYUI_URL}/prompt",
|
||||
json={"prompt": workflow, "client_id": client_id}
|
||||
)
|
||||
resp.raise_for_status()
|
||||
return resp.json()["prompt_id"]
|
||||
|
||||
|
||||
def _wait_for_output(prompt_id: str, client_id: str, timeout: int = 120) -> Path:
|
||||
start = time.time()
|
||||
while time.time() - start < timeout:
|
||||
resp = requests.get(f"{COMFYUI_URL}/history/{prompt_id}")
|
||||
history = resp.json()
|
||||
if prompt_id in history:
|
||||
outputs = history[prompt_id]["outputs"]
|
||||
for node_output in outputs.values():
|
||||
if "images" in node_output:
|
||||
img = node_output["images"][0]
|
||||
return OUTPUT_DIR / img["subfolder"] / img["filename"]
|
||||
time.sleep(2)
|
||||
raise TimeoutError(f"ComfyUI generation timed out after {timeout}s")
|
||||
|
||||
|
||||
def _apply_params(workflow: dict, params: dict) -> dict:
|
||||
"""
|
||||
Apply parameter overrides to workflow nodes.
|
||||
Expects workflow nodes to have a 'title' field for addressing.
|
||||
e.g., params={"positive_prompt": "new prompt"} updates node titled "positive_prompt"
|
||||
"""
|
||||
for node_id, node in workflow.items():
|
||||
title = node.get("_meta", {}).get("title", "")
|
||||
if title in params:
|
||||
node["inputs"]["text"] = params[title]
|
||||
return workflow
|
||||
|
||||
|
||||
# Convenience wrappers for OpenClaw
|
||||
def portrait(expression: str = "neutral", extra_prompt: str = "") -> str:
|
||||
return generate("portrait", {"positive_prompt": f"aria, {expression}, {extra_prompt}"})
|
||||
|
||||
def quick(prompt: str) -> str:
|
||||
return generate("quick", {"positive_prompt": prompt})
|
||||
|
||||
def scene(prompt: str, controlnet_image_path: str = None) -> str:
|
||||
params = {"positive_prompt": prompt}
|
||||
if controlnet_image_path:
|
||||
params["controlnet_image"] = controlnet_image_path
|
||||
return generate("scene", params)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Character LoRA Training
|
||||
|
||||
A LoRA trains the model to consistently generate the character's appearance.
|
||||
|
||||
### Dataset Preparation
|
||||
|
||||
1. Collect 20–50 reference images of the character (or commission a character sheet)
|
||||
2. Consistent style, multiple angles/expressions
|
||||
3. Resize to 1024×1024, square crop
|
||||
4. Write captions: `aria, 1girl, solo, <specific description>`
|
||||
5. Store in `~/lora-training/aria/`
|
||||
|
||||
### Training
|
||||
|
||||
Use **kohya_ss** or **SimpleTuner** for LoRA training on Apple Silicon:
|
||||
|
||||
```bash
|
||||
# kohya_ss (SDXL LoRA)
|
||||
git clone https://github.com/bmaltais/kohya_ss
|
||||
pip install -r requirements.txt
|
||||
|
||||
# Training config — key params for MPS
|
||||
python train_network.py \
|
||||
--pretrained_model_name_or_path=~/ComfyUI/models/checkpoints/sd_xl_base_1.0.safetensors \
|
||||
--train_data_dir=~/lora-training/aria \
|
||||
--output_dir=~/ComfyUI/models/loras \
|
||||
--output_name=aria-v1 \
|
||||
--network_module=networks.lora \
|
||||
--network_dim=32 \
|
||||
--network_alpha=16 \
|
||||
--max_train_epochs=10 \
|
||||
--learning_rate=1e-4
|
||||
```
|
||||
|
||||
> Training on M4 Pro via MPS: expect 1–4 hours for a 20-image dataset at 10 epochs.
|
||||
|
||||
---
|
||||
|
||||
## Directory Layout
|
||||
|
||||
```
|
||||
homeai-images/
|
||||
├── workflows/
|
||||
│ ├── portrait.json
|
||||
│ ├── scene.json
|
||||
│ ├── quick.json
|
||||
│ └── upscale.json
|
||||
└── skills/
|
||||
└── comfyui.py
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Interface Contracts
|
||||
|
||||
**Consumes:**
|
||||
- ComfyUI REST API: `http://localhost:8188`
|
||||
- Workflows from `homeai-images/workflows/`
|
||||
- Character LoRA from `~/ComfyUI/models/loras/aria-v1.safetensors`
|
||||
|
||||
**Exposes:**
|
||||
- `comfyui.generate(workflow, params)` → image path — called by P4 OpenClaw
|
||||
|
||||
**Add to `.env.services`:**
|
||||
```dotenv
|
||||
COMFYUI_URL=http://localhost:8188
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Implementation Steps
|
||||
|
||||
- [ ] Clone ComfyUI to `~/ComfyUI/`, install deps in venv
|
||||
- [ ] Verify MPS is detected at launch (`Using device: mps` in logs)
|
||||
- [ ] Write and load launchd plist
|
||||
- [ ] Download SDXL base model via `scripts/download-models.sh`
|
||||
- [ ] Download Flux.1-schnell
|
||||
- [ ] Test basic generation via ComfyUI web UI (browse to port 8188)
|
||||
- [ ] Build and save `quick.json` workflow in ComfyUI UI, export JSON
|
||||
- [ ] Build and save `portrait.json` workflow, export JSON
|
||||
- [ ] Build and save `scene.json` workflow with ControlNet, export JSON
|
||||
- [ ] Write `skills/comfyui.py` full implementation
|
||||
- [ ] Test skill: `comfyui.quick("a cat sitting on a couch")` → image file
|
||||
- [ ] Collect character reference images for LoRA training
|
||||
- [ ] Train SDXL LoRA with kohya_ss
|
||||
- [ ] Load LoRA in `portrait.json` workflow, verify character consistency
|
||||
- [ ] Symlink `skills/` to `~/.openclaw/skills/`
|
||||
- [ ] Test via OpenClaw: "Generate a portrait of Aria looking happy"
|
||||
|
||||
---
|
||||
|
||||
## Success Criteria
|
||||
|
||||
- [ ] ComfyUI UI accessible at `http://localhost:8188` after reboot
|
||||
- [ ] `quick.json` workflow generates an image in <30s on M4 Pro
|
||||
- [ ] `portrait.json` with character LoRA produces consistent character appearance
|
||||
- [ ] `comfyui.generate("quick", {"positive_prompt": "test"})` returns a valid image path
|
||||
- [ ] Generated images are saved to `~/ComfyUI/output/`
|
||||
- [ ] ComfyUI survives Mac Mini reboot via launchd
|
||||
Reference in New Issue
Block a user