Files
homeai/homeai-visual/PLAN.md
Aodhan Collins 38247d7cc4 Initial project structure and planning docs
Full project plan across 8 sub-projects (homeai-infra, homeai-llm,
homeai-voice, homeai-agent, homeai-character, homeai-esp32,
homeai-visual, homeai-images). Includes per-project PLAN.md files,
top-level PROJECT_PLAN.md, and master TODO.md.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-04 01:11:37 +00:00

10 KiB
Raw Permalink Blame History

P7: homeai-visual — VTube Studio Visual Layer

Phase 5 | Depends on: P4 (OpenClaw skill runner), P5 (character expression map)


Goal

VTube Studio displays a Live2D model on Mac Mini desktop and mobile. Expressions are driven by the AI pipeline state (thinking, speaking, happy, etc.) via an OpenClaw skill that talks to VTube Studio's WebSocket API. Lip sync follows audio amplitude.


Architecture

OpenClaw pipeline state
        ↓ (during LLM response generation)
vtube_studio.py skill
        ↓ WebSocket (port 8001)
VTube Studio (macOS app)
        ↓
Live2D model renders expression
        ↓
Displayed on:
  - Mac Mini desktop (primary)
  - iPhone/iPad (VTube Studio mobile, same model via Tailscale)

VTube Studio Setup

Installation

  1. Download VTube Studio from the Mac App Store
  2. Launch, go through initial setup
  3. Enable WebSocket API: Settings → WebSocket API → Enable (port 8001)
  4. Load Live2D model (see Model section below)

WebSocket API Authentication

VTube Studio uses a token-based auth flow:

import asyncio
import websockets
import json

async def authenticate():
    async with websockets.connect("ws://localhost:8001") as ws:
        # Step 1: request authentication token
        await ws.send(json.dumps({
            "apiName": "VTubeStudioPublicAPI",
            "apiVersion": "1.0",
            "requestID": "auth-req",
            "messageType": "AuthenticationTokenRequest",
            "data": {
                "pluginName": "HomeAI",
                "pluginDeveloper": "HomeAI",
                "pluginIcon": None
            }
        }))
        response = json.loads(await ws.recv())
        token = response["data"]["authenticationToken"]
        # User must click "Allow" in VTube Studio UI

        # Step 2: authenticate with token
        await ws.send(json.dumps({
            "apiName": "VTubeStudioPublicAPI",
            "apiVersion": "1.0",
            "requestID": "auth",
            "messageType": "AuthenticationRequest",
            "data": {
                "pluginName": "HomeAI",
                "pluginDeveloper": "HomeAI",
                "authenticationToken": token
            }
        }))
        auth_resp = json.loads(await ws.recv())
        print("Authenticated:", auth_resp["data"]["authenticated"])
        return token

Token is persisted to ~/.openclaw/vtube_token.json.


vtube_studio.py Skill

Full implementation (replaces the stub from P4).

File: homeai-visual/skills/vtube_studio.py (symlinked to ~/.openclaw/skills/)

"""
VTube Studio WebSocket skill for OpenClaw.
Drives Live2D model expressions based on AI pipeline state.
"""

import asyncio
import json
import websockets
from pathlib import Path

VTUBE_WS_URL = "ws://localhost:8001"
TOKEN_PATH = Path.home() / ".openclaw" / "vtube_token.json"

class VTubeStudioSkill:
    def __init__(self, character_config: dict):
        self.expression_map = character_config.get("live2d_expressions", {})
        self.ws_triggers = character_config.get("vtube_ws_triggers", {})
        self.token = self._load_token()
        self._ws = None

    def _load_token(self) -> str | None:
        if TOKEN_PATH.exists():
            return json.loads(TOKEN_PATH.read_text()).get("token")
        return None

    def _save_token(self, token: str):
        TOKEN_PATH.write_text(json.dumps({"token": token}))

    async def connect(self):
        self._ws = await websockets.connect(VTUBE_WS_URL)
        if self.token:
            await self._authenticate()
        else:
            await self._request_new_token()

    async def _authenticate(self):
        await self._send({
            "messageType": "AuthenticationRequest",
            "data": {
                "pluginName": "HomeAI",
                "pluginDeveloper": "HomeAI",
                "authenticationToken": self.token
            }
        })
        resp = await self._recv()
        if not resp["data"].get("authenticated"):
            # Token expired — request a new one
            await self._request_new_token()

    async def _request_new_token(self):
        await self._send({
            "messageType": "AuthenticationTokenRequest",
            "data": {
                "pluginName": "HomeAI",
                "pluginDeveloper": "HomeAI",
                "pluginIcon": None
            }
        })
        resp = await self._recv()
        token = resp["data"]["authenticationToken"]
        self._save_token(token)
        self.token = token
        await self._authenticate()

    async def trigger_expression(self, event: str):
        """Trigger a named expression state (idle, thinking, speaking, etc.)"""
        hotkey_id = self.expression_map.get(event)
        if not hotkey_id:
            return
        await self._trigger_hotkey(hotkey_id)

    async def _trigger_hotkey(self, hotkey_id: str):
        await self._send({
            "messageType": "HotkeyTriggerRequest",
            "data": {"hotkeyID": hotkey_id}
        })
        await self._recv()

    async def set_parameter(self, name: str, value: float):
        """Set a VTube Studio parameter (e.g., mouth open for lip sync)"""
        await self._send({
            "messageType": "InjectParameterDataRequest",
            "data": {
                "parameterValues": [
                    {"id": name, "value": value}
                ]
            }
        })
        await self._recv()

    async def _send(self, payload: dict):
        full = {
            "apiName": "VTubeStudioPublicAPI",
            "apiVersion": "1.0",
            "requestID": "homeai",
            **payload
        }
        await self._ws.send(json.dumps(full))

    async def _recv(self) -> dict:
        return json.loads(await self._ws.recv())

    async def close(self):
        if self._ws:
            await self._ws.close()


# OpenClaw skill entry point — synchronous wrapper
def trigger_expression(event: str, character_config: dict):
    skill = VTubeStudioSkill(character_config)
    asyncio.run(_run(skill, event))

async def _run(skill, event):
    await skill.connect()
    await skill.trigger_expression(event)
    await skill.close()

Lip Sync

Phase 1: Amplitude-Based (Simple)

During TTS audio playback, sample audio amplitude and map to mouth open parameter:

import numpy as np
import sounddevice as sd

def stream_with_lipsync(audio_data: np.ndarray, sample_rate: int, vtube: VTubeStudioSkill):
    chunk_size = 1024
    for i in range(0, len(audio_data), chunk_size):
        chunk = audio_data[i:i+chunk_size]
        amplitude = float(np.abs(chunk).mean()) / 32768.0  # normalise 16-bit PCM
        mouth_value = min(amplitude * 10, 1.0)  # scale to 01
        asyncio.run(vtube.set_parameter("MouthOpen", mouth_value))
        sd.play(chunk, sample_rate, blocking=True)
    asyncio.run(vtube.set_parameter("MouthOpen", 0.0))  # close mouth after

Phase 2: Phoneme-Based (Future)

Parse TTS phoneme timing from Kokoro/Chatterbox output and drive expression per phoneme. More accurate but significantly more complex. Defer to after Phase 5.


Live2D Model

Options

Option Cost Effort Quality
Free models (VTube Studio sample packs) Free Low Generic
Purchase from nizima.com or booth.pm ¥3,000¥30,000 Low High
Commission custom model ¥50,000¥200,000+ Low (for you) Unique

Recommendation: Start with a purchased model from nizima.com or booth.pm that matches the character's aesthetic. Commission custom later once personality is locked in.

Model Setup

  1. Download .vtube.model3.json + associated assets
  2. Place in ~/Documents/Live2DModels/ (VTube Studio default)
  3. Load in VTube Studio: Model tab → Add Model
  4. Map hotkeys: VTube Studio → Hotkeys → create one per expression state
  5. Record hotkey IDs, update aria.json live2d_expressions mapping

Expression Hotkey Mapping Workflow

  1. Launch VTube Studio, load model
  2. Go to Hotkeys → add hotkeys for each state: idle, listening, thinking, speaking, happy, sad, surprised, error
  3. VTube Studio assigns a UUID to each hotkey — copy these
  4. Open Character Manager (P5), paste UUIDs into expression mapping UI
  5. Export updated aria.json
  6. Restart OpenClaw — new expression map loaded

Mobile Setup

  1. Install VTube Studio on iPhone/iPad
  2. On same Tailscale network, VTube Studio mobile discovers Mac Mini model
  3. Mirror mode: mobile shows same model as desktop
  4. Useful for bedside or kitchen display while Mac Mini desktop is the primary

Directory Layout

homeai-visual/
└── skills/
    ├── vtube_studio.py      ← full implementation
    ├── lipsync.py           ← amplitude-based lip sync helper
    └── auth.py              ← token management utility

Implementation Steps

  • Install VTube Studio (Mac App Store)
  • Enable WebSocket API on port 8001
  • Source/purchase a Live2D model
  • Load model in VTube Studio, verify it renders
  • Create hotkeys in VTube Studio for all 8 expression states
  • Write vtube_studio.py full implementation
  • Run auth flow — click "Allow" in VTube Studio UI, save token
  • Test trigger_expression("thinking") → model shows expression
  • Test all 8 expressions via a simple test script
  • Update aria.json with real VTube Studio hotkey IDs
  • Write lipsync.py amplitude-based helper
  • Integrate lip sync into TTS dispatch in OpenClaw
  • Symlink skills/~/.openclaw/skills/
  • Test full pipeline: voice query → thinking expression → LLM → speaking expression with lip sync
  • Set up VTube Studio on iPhone (optional, do last)

Success Criteria

  • All 8 expression states trigger correctly via trigger_expression()
  • Lip sync is visibly responding to TTS audio (even if imperfect)
  • VTube Studio token survives app restart (token file persists)
  • Expression triggers are fast enough to feel responsive (<100ms from call to render)
  • Model stays loaded and connected after Mac Mini sleep/wake