Files

Aodhan Collins 38247d7cc4 Initial project structure and planning docs

Full project plan across 8 sub-projects (homeai-infra, homeai-llm,
homeai-voice, homeai-agent, homeai-character, homeai-esp32,
homeai-visual, homeai-images). Includes per-project PLAN.md files,
top-level PROJECT_PLAN.md, and master TODO.md.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-03-04 01:11:37 +00:00

11 KiB

Raw Permalink Blame History

P8: homeai-images — Image Generation

Phase 6 | Depends on: P4 (OpenClaw skill runner) | Independent of P6, P7

Goal

ComfyUI running natively on Mac Mini with SDXL and Flux.1 models. A character LoRA trained for consistent appearance. OpenClaw skill exposes image generation as a callable tool. Saved workflows cover the most common use cases.

Why Native (not Docker)

Same reasoning as Ollama: ComfyUI needs Metal GPU acceleration. Docker on Mac can't access the GPU. ComfyUI runs natively as a launchd service.

Installation

# Clone ComfyUI
git clone https://github.com/comfyanonymous/ComfyUI ~/ComfyUI
cd ~/ComfyUI

# Install dependencies (Python 3.11+, venv recommended)
python3 -m venv venv
source venv/bin/activate
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu
pip install -r requirements.txt

# Launch
python main.py --listen 0.0.0.0 --port 8188

Note: Use the PyTorch MPS backend for Apple Silicon:

# ComfyUI auto-detects MPS — no extra config needed
# Verify by checking ComfyUI startup logs for "Using device: mps"

launchd plist — `com.homeai.comfyui.plist`

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
    <key>Label</key>
    <string>com.homeai.comfyui</string>
    <key>ProgramArguments</key>
    <array>
        <string>/Users/<username>/ComfyUI/venv/bin/python</string>
        <string>/Users/<username>/ComfyUI/main.py</string>
        <string>--listen</string>
        <string>0.0.0.0</string>
        <string>--port</string>
        <string>8188</string>
    </array>
    <key>WorkingDirectory</key>
    <string>/Users/<username>/ComfyUI</string>
    <key>RunAtLoad</key>
    <true/>
    <key>KeepAlive</key>
    <true/>
    <key>StandardOutPath</key>
    <string>/tmp/comfyui.log</string>
    <key>StandardErrorPath</key>
    <string>/tmp/comfyui.err</string>
</dict>
</plist>

Model Downloads

Model Manifest

~/ComfyUI/models/ structure:

checkpoints/
├── sd_xl_base_1.0.safetensors          # SDXL base
├── flux1-dev.safetensors               # Flux.1-dev (high quality)
└── flux1-schnell.safetensors           # Flux.1-schnell (fast drafts)

vae/
├── sdxl_vae.safetensors
└── ae.safetensors                      # Flux VAE

clip/
├── clip_l.safetensors
└── t5xxl_fp16.safetensors              # Flux text encoder

controlnet/
├── controlnet-canny-sdxl.safetensors
└── controlnet-depth-sdxl.safetensors

loras/
└── aria-v1.safetensors                 # Character LoRA (trained locally)

Download Script — `scripts/download-models.sh`

#!/usr/bin/env bash
MODELS_DIR=~/ComfyUI/models

# HuggingFace downloads (requires huggingface-cli or wget)
pip install huggingface_hub

python3 -c "
from huggingface_hub import hf_hub_download
import os

downloads = [
    ('stabilityai/stable-diffusion-xl-base-1.0', 'sd_xl_base_1.0.safetensors', 'checkpoints'),
    ('black-forest-labs/FLUX.1-schnell', 'flux1-schnell.safetensors', 'checkpoints'),
]

for repo, filename, subdir in downloads:
    hf_hub_download(
        repo_id=repo,
        filename=filename,
        local_dir=f'{os.path.expanduser(\"~/ComfyUI/models\")}/{subdir}'
    )
"

Flux.1-dev requires accepting HuggingFace license agreement. Download manually if script fails.

Saved Workflows

All workflows stored as ComfyUI JSON in homeai-images/workflows/.

`portrait.json` — Character Portrait

Standard character portrait with expression control.

Key nodes:

CheckpointLoader: SDXL base
LoraLoader: aria character LoRA
CLIPTextEncode: positive prompt includes character description + expression
KSampler: 25 steps, DPM++ 2M, CFG 7
VAEDecode → SaveImage

Positive prompt template:

aria, (character lora), 1girl, solo, portrait, looking at viewer,
soft lighting, detailed face, high quality, masterpiece,
<EXPRESSION_PLACEHOLDER>

`scene.json` — Character in Scene with ControlNet

Uses ControlNet depth/canny for pose control.

Key nodes:

LoadImage: input pose reference image
ControlNetLoader: canny or depth model
ControlNetApply: apply to conditioning
KSampler with ControlNet guidance

`quick.json` — Fast Draft via Flux.1-schnell

Low-step, fast generation for quick previews.

Key nodes:

CheckpointLoader: flux1-schnell
KSampler: 4 steps, Euler, CFG 1 (Flux uses CFG=1)
Output: 512×512 or 768×768

`upscale.json` — 2× Upscale

Takes existing image, upscales 2× with detail enhancement.

Key nodes:

LoadImage
UpscaleModelLoader: 4x_NMKD-Siax_200k.pth (download separately)
ImageUpscaleWithModel
KSampler img2img for detail pass

`comfyui.py` Skill — OpenClaw Integration

Full implementation (replaces stub from P4).

File: homeai-images/skills/comfyui.py

"""
ComfyUI image generation skill for OpenClaw.
Submits workflow JSON via ComfyUI REST API and returns generated image path.
"""

import json
import time
import uuid
import requests
from pathlib import Path

COMFYUI_URL = "http://localhost:8188"
WORKFLOWS_DIR = Path(__file__).parent.parent / "workflows"
OUTPUT_DIR = Path.home() / "ComfyUI" / "output"

def generate(workflow_name: str, params: dict = None) -> str:
    """
    Submit a named workflow to ComfyUI.
    Returns the path of the generated image.

    Args:
        workflow_name: Name of workflow JSON (without .json extension)
        params: Dict of node overrides, e.g. {"positive_prompt": "...", "steps": 20}

    Returns:
        Absolute path to generated image file
    """
    workflow_path = WORKFLOWS_DIR / f"{workflow_name}.json"
    if not workflow_path.exists():
        raise ValueError(f"Workflow '{workflow_name}' not found at {workflow_path}")

    workflow = json.loads(workflow_path.read_text())

    # Apply param overrides
    if params:
        workflow = _apply_params(workflow, params)

    # Submit to ComfyUI queue
    client_id = str(uuid.uuid4())
    prompt_id = _queue_prompt(workflow, client_id)

    # Poll for completion
    image_path = _wait_for_output(prompt_id, client_id)
    return str(image_path)


def _queue_prompt(workflow: dict, client_id: str) -> str:
    resp = requests.post(
        f"{COMFYUI_URL}/prompt",
        json={"prompt": workflow, "client_id": client_id}
    )
    resp.raise_for_status()
    return resp.json()["prompt_id"]


def _wait_for_output(prompt_id: str, client_id: str, timeout: int = 120) -> Path:
    start = time.time()
    while time.time() - start < timeout:
        resp = requests.get(f"{COMFYUI_URL}/history/{prompt_id}")
        history = resp.json()
        if prompt_id in history:
            outputs = history[prompt_id]["outputs"]
            for node_output in outputs.values():
                if "images" in node_output:
                    img = node_output["images"][0]
                    return OUTPUT_DIR / img["subfolder"] / img["filename"]
        time.sleep(2)
    raise TimeoutError(f"ComfyUI generation timed out after {timeout}s")


def _apply_params(workflow: dict, params: dict) -> dict:
    """
    Apply parameter overrides to workflow nodes.
    Expects workflow nodes to have a 'title' field for addressing.
    e.g., params={"positive_prompt": "new prompt"} updates node titled "positive_prompt"
    """
    for node_id, node in workflow.items():
        title = node.get("_meta", {}).get("title", "")
        if title in params:
            node["inputs"]["text"] = params[title]
    return workflow


# Convenience wrappers for OpenClaw
def portrait(expression: str = "neutral", extra_prompt: str = "") -> str:
    return generate("portrait", {"positive_prompt": f"aria, {expression}, {extra_prompt}"})

def quick(prompt: str) -> str:
    return generate("quick", {"positive_prompt": prompt})

def scene(prompt: str, controlnet_image_path: str = None) -> str:
    params = {"positive_prompt": prompt}
    if controlnet_image_path:
        params["controlnet_image"] = controlnet_image_path
    return generate("scene", params)

Character LoRA Training

A LoRA trains the model to consistently generate the character's appearance.

Dataset Preparation

Collect 20–50 reference images of the character (or commission a character sheet)
Consistent style, multiple angles/expressions
Resize to 1024×1024, square crop
Write captions: aria, 1girl, solo, <specific description>
Store in ~/lora-training/aria/

Training

Use kohya_ss or SimpleTuner for LoRA training on Apple Silicon:

# kohya_ss (SDXL LoRA)
git clone https://github.com/bmaltais/kohya_ss
pip install -r requirements.txt

# Training config — key params for MPS
python train_network.py \
  --pretrained_model_name_or_path=~/ComfyUI/models/checkpoints/sd_xl_base_1.0.safetensors \
  --train_data_dir=~/lora-training/aria \
  --output_dir=~/ComfyUI/models/loras \
  --output_name=aria-v1 \
  --network_module=networks.lora \
  --network_dim=32 \
  --network_alpha=16 \
  --max_train_epochs=10 \
  --learning_rate=1e-4

Training on M4 Pro via MPS: expect 1–4 hours for a 20-image dataset at 10 epochs.

Directory Layout

homeai-images/
├── workflows/
│   ├── portrait.json
│   ├── scene.json
│   ├── quick.json
│   └── upscale.json
└── skills/
    └── comfyui.py

Interface Contracts

Consumes:

ComfyUI REST API: http://localhost:8188
Workflows from homeai-images/workflows/
Character LoRA from ~/ComfyUI/models/loras/aria-v1.safetensors

Exposes:

comfyui.generate(workflow, params) → image path — called by P4 OpenClaw

Add to .env.services:

COMFYUI_URL=http://localhost:8188

Implementation Steps

Clone ComfyUI to ~/ComfyUI/, install deps in venv
Verify MPS is detected at launch (Using device: mps in logs)
Write and load launchd plist
Download SDXL base model via scripts/download-models.sh
Download Flux.1-schnell
Test basic generation via ComfyUI web UI (browse to port 8188)
Build and save quick.json workflow in ComfyUI UI, export JSON
Build and save portrait.json workflow, export JSON
Build and save scene.json workflow with ControlNet, export JSON
Write skills/comfyui.py full implementation
Test skill: comfyui.quick("a cat sitting on a couch") → image file
Collect character reference images for LoRA training
Train SDXL LoRA with kohya_ss
Load LoRA in portrait.json workflow, verify character consistency
Symlink skills/ to ~/.openclaw/skills/
Test via OpenClaw: "Generate a portrait of Aria looking happy"

Success Criteria

ComfyUI UI accessible at http://localhost:8188 after reboot
quick.json workflow generates an image in <30s on M4 Pro
portrait.json with character LoRA produces consistent character appearance
comfyui.generate("quick", {"positive_prompt": "test"}) returns a valid image path
Generated images are saved to ~/ComfyUI/output/
ComfyUI survives Mac Mini reboot via launchd

11 KiB Raw Permalink Blame History Unescape Escape

P8: homeai-images — Image Generation

Goal

Why Native (not Docker)

Installation

launchd plist — com.homeai.comfyui.plist

Model Downloads

Model Manifest

Download Script — scripts/download-models.sh

Saved Workflows

portrait.json — Character Portrait

scene.json — Character in Scene with ControlNet

quick.json — Fast Draft via Flux.1-schnell

upscale.json — 2× Upscale

comfyui.py Skill — OpenClaw Integration

Character LoRA Training

Dataset Preparation

Training

Directory Layout

Interface Contracts

Implementation Steps

Success Criteria

11 KiB

Raw Permalink Blame History

launchd plist — `com.homeai.comfyui.plist`

Download Script — `scripts/download-models.sh`

`portrait.json` — Character Portrait

`scene.json` — Character in Scene with ControlNet

`quick.json` — Fast Draft via Flux.1-schnell

`upscale.json` — 2× Upscale

`comfyui.py` Skill — OpenClaw Integration