danbooru-mcp/docs/user-guide.md

# Danbooru MCP Tag Validator — User Guide

This guide explains how to integrate and use the `danbooru-mcp` server with an LLM to generate valid, high-quality prompts for Illustrious / Stable Diffusion models trained on Danbooru data.

---

## Table of Contents

1. [What is this?](#what-is-this)
2. [Quick Start](#quick-start)
3. [Tool Reference](#tool-reference)
   - [search_tags](#search_tags)
   - [validate_tags](#validate_tags)
   - [suggest_tags](#suggest_tags)
4. [Prompt Engineering Workflow](#prompt-engineering-workflow)
5. [Category Reference](#category-reference)
6. [Best Practices](#best-practices)
7. [Common Scenarios](#common-scenarios)
8. [Troubleshooting](#troubleshooting)

---

## What is this?

Illustrious (and similar Danbooru-trained Stable Diffusion models) uses **Danbooru tags** as its prompt language.
Tags like `1girl`, `blue_hair`, `looking_at_viewer` are meaningful because the model was trained on images annotated with them.

The problem: there are hundreds of thousands of valid Danbooru tags, and misspelling or inventing tags produces no useful signal — the model generates less accurate images.

**This MCP server** lets an LLM:
- **Search** the full tag database for tag discovery
- **Validate** a proposed prompt's tags against the real Danbooru database
- **Suggest** corrections for typos or near-miss tags

The database contains **292,500 tags**, all with ≥10 posts on Danbooru — filtering out one-off or misspelled entries.

---

## Quick Start

### 1. Add to your MCP client (Claude Desktop example)

**Using Docker (recommended):**
```json
{
  "mcpServers": {
    "danbooru-tags": {
      "command": "docker",
      "args": ["run", "--rm", "-i", "danbooru-mcp:latest"]
    }
  }
}
```

**Using Python directly:**
```json
{
  "mcpServers": {
    "danbooru-tags": {
      "command": "/path/to/danbooru-mcp/.venv/bin/python",
      "args": ["/path/to/danbooru-mcp/src/server.py"]
    }
  }
}
```

### 2. Instruct the LLM

Add a system prompt telling the LLM to use the server:

```
You have access to the danbooru-tags MCP server for validating Stable Diffusion prompts.
Before generating any final prompt:
1. Use validate_tags to check all proposed tags are real Danbooru tags.
2. Use suggest_tags to fix any invalid tags.
3. Only output the validated, corrected tag list.
```

---

## Tool Reference

### `search_tags`

Find tags by name using full-text / prefix search.

**Parameters:**

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `query` | `string` | *required* | Search string. Trailing `*` added automatically for prefix match. Supports FTS5 syntax. |
| `limit` | `integer` | `20` | Max results (1–200) |
| `category` | `string` | `null` | Optional filter: `"general"`, `"artist"`, `"copyright"`, `"character"`, `"meta"` |

**Returns:** List of tag objects:
```json
[
  {
    "name": "blue_hair",
    "post_count": 1079925,
    "category": "general",
    "is_deprecated": false
  }
]
```

**Examples:**

```
Search for hair colour tags:
  search_tags("blue_hair")
  → blue_hair, blue_hairband, blue_hair-chan_(ramchi), …

Search only character tags for a Vocaloid:
  search_tags("hatsune", category="character")
  → hatsune_miku, hatsune_mikuo, hatsune_miku_(append), …

Boolean search:
  search_tags("hair AND blue")
  → tags matching both "hair" and "blue"
```

**FTS5 query syntax:**

| Syntax | Meaning |
|--------|---------|
| `blue_ha*` | prefix match (added automatically) |
| `"blue hair"` | phrase match |
| `hair AND blue` | both terms present |
| `hair NOT red` | exclusion |

---

### `validate_tags`

Check a list of tags against the full Danbooru database. Returns three groups: valid, deprecated, and invalid.

**Parameters:**

| Parameter | Type | Description |
|-----------|------|-------------|
| `tags` | `list[string]` | Tags to validate, e.g. `["1girl", "blue_hair", "sword"]` |

**Returns:**
```json
{
  "valid":      ["1girl", "blue_hair", "sword"],
  "deprecated": [],
  "invalid":    ["blue_hairs", "not_a_real_tag"]
}
```

| Key | Meaning |
|-----|---------|
| `valid` | Exists in Danbooru and is not deprecated — safe to use |
| `deprecated` | Exists but has been deprecated (an updated canonical tag exists) |
| `invalid` | Not found — likely misspelled, hallucinated, or too niche (<10 posts) |

**Important:** Always run `validate_tags` before finalising a prompt. Invalid tags are silently ignored by the model but waste token budget and reduce prompt clarity.

---

### `suggest_tags`

Autocomplete-style suggestions for a partial or approximate tag. Results are sorted by post count (most commonly used first). Deprecated tags are **excluded**.

**Parameters:**

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `partial` | `string` | *required* | Partial tag or rough approximation |
| `limit` | `integer` | `10` | Max suggestions (1–50) |
| `category` | `string` | `null` | Optional category filter |

**Returns:** Same format as `search_tags`, sorted by `post_count` descending.

**Examples:**

```
Fix a typo:
  suggest_tags("looking_at_vewer")
  → ["looking_at_viewer", …]

Find the most popular sword-related tags:
  suggest_tags("sword", limit=5, category="general")
  → sword (337,737), sword_behind_back (7,203), …

Find character tags for a partial name:
  suggest_tags("miku", category="character")
  → hatsune_miku (129,806), yuki_miku (4,754), …
```

---

## Prompt Engineering Workflow

This is the recommended workflow for an LLM building Illustrious prompts:

### Step 1 — Draft

The LLM drafts an initial list of conceptual tags based on the user's description:

```
User: "A girl with long silver hair wearing a kimono in a Japanese garden"

Draft tags:
  1girl, silver_hair, long_hair, kimono, japanese_garden, cherry_blossoms,
  sitting, looking_at_viewer, outdoors, traditional_clothes
```

### Step 2 — Validate

```
validate_tags([
  "1girl", "silver_hair", "long_hair", "kimono", "japanese_garden",
  "cherry_blossoms", "sitting", "looking_at_viewer", "outdoors",
  "traditional_clothes"
])
```

Response:
```json
{
  "valid": ["1girl", "long_hair", "kimono", "cherry_blossoms", "sitting",
            "looking_at_viewer", "outdoors", "traditional_clothes"],
  "deprecated": [],
  "invalid": ["silver_hair", "japanese_garden"]
}
```

### Step 3 — Fix invalid tags

```
suggest_tags("silver_hair", limit=3)
→ [{"name": "white_hair", "post_count": 800000}, ...]

suggest_tags("japanese_garden", limit=3)
→ [{"name": "garden", "post_count": 45000},
   {"name": "japanese_clothes", "post_count": 12000}, ...]
```

### Step 4 — Finalise

```
Final prompt:
  1girl, white_hair, long_hair, kimono, garden, cherry_blossoms,
  sitting, looking_at_viewer, outdoors, traditional_clothes
```

All tags are validated. Prompt is ready to send to ComfyUI.

---

## Category Reference

Danbooru organises tags into five categories. Understanding them helps scope searches:

| Category | Value | Description | Examples |
|----------|-------|-------------|---------|
| **general** | `0` | Descriptive tags for image content | `1girl`, `blue_hair`, `sword`, `outdoors` |
| **artist** | `1` | Artist/creator names | `wlop`, `natsuki_subaru` |
| **copyright** | `3` | Source material / franchise | `fate/stay_night`, `touhou`, `genshin_impact` |
| **character** | `4` | Specific character names | `hatsune_miku`, `hakurei_reimu` |
| **meta** | `5` | Image quality / format tags | `highres`, `absurdres`, `commentary` |

**Tips:**
- For generating images, focus on **general** tags (colours, poses, clothing, expressions)
- Add **character** and **copyright** tags when depicting a specific character
- **meta** tags like `highres` and `best_quality` can improve output quality
- Avoid **artist** tags unless intentionally mimicking a specific art style

---

## Best Practices

### ✅ Always validate before generating

```python
# Always run this before finalising
result = validate_tags(your_proposed_tags)
# Fix everything in result["invalid"] before sending to ComfyUI
```

### ✅ Use suggest_tags for discoverability

Even for tags you think you know, run `suggest_tags` to find the canonical form:
- `standing` vs `standing_on_one_leg` vs `standing_split`
- `smile` vs `small_smile` vs `evil_smile`

The tag with the highest `post_count` is almost always the right one for your intent.

### ✅ Prefer high-post-count tags

Higher post count = more training data = more consistent model response.

```python
# Get the top 5 most established hair colour tags
suggest_tags("hair_color", limit=5, category="general")
```

### ✅ Layer specificity

Good prompts move from general to specific:
```
# General → Specific
1girl,                        # subject count
solo,                         # composition
long_hair, blue_hair,         # hair
white_dress, off_shoulder,    # clothing
smile, looking_at_viewer,     # expression/pose
outdoors, garden, daytime,    # setting
masterpiece, best_quality     # quality
```

### ❌ Avoid deprecated tags

If `validate_tags` reports a tag as `deprecated`, use `suggest_tags` to find the current replacement:

```python
# If "nude" is deprecated, find the current tag:
suggest_tags("nude", category="general")
```

### ❌ Don't invent tags

The model doesn't understand arbitrary natural language in prompts — only tags it was trained on. `beautiful_landscape` is not a Danbooru tag; `scenery` and `landscape` are.

---

## Common Scenarios

### Scenario: Character in a specific pose

```
# 1. Search for pose tags
search_tags("sitting", category="general", limit=10)
→ sitting, sitting_on_ground, kneeling, seiza, wariza, …

# 2. Validate the full tag set
validate_tags(["1girl", "hatsune_miku", "sitting", "looking_at_viewer", "smile"])
```

### Scenario: Specific art style

```
# Find copyright tags for a franchise
search_tags("genshin", category="copyright", limit=5)
→ genshin_impact, …

# Find character from that franchise
search_tags("hu_tao", category="character", limit=3)
→ hu_tao_(genshin_impact), …
```

### Scenario: Quality boosting tags

```
# Find commonly used meta/quality tags
search_tags("quality", category="meta", limit=5)
→ best_quality, high_quality, …

search_tags("res", category="meta", limit=5)
→ highres, absurdres, ultra-high_res, …
```

### Scenario: Unknown misspelling

```
# You typed "haor" instead of "hair"
suggest_tags("haor", limit=5)
→ [] (no prefix match)

# Try a broader search
search_tags("long hair")
→ long_hair, long_hair_between_eyes, wavy_hair, …
```

---

## Troubleshooting

### "invalid" tags that should be valid

The database contains only tags with **≥10 posts**. Tags with fewer posts are intentionally excluded as they are likely misspellings, very niche, or one-off annotations.

If a tag you expect to be valid shows as invalid:
1. Try `suggest_tags` to find a close variant
2. Use `search_tags` to explore the tag space
3. The tag may genuinely have <10 posts — use a broader synonym instead

### Server not responding

Check the MCP server is running and the `db/tags.db` file exists:

```bash
# Local
python src/server.py

# Docker
docker run --rm -i danbooru-mcp:latest
```

Environment variable override:
```bash
DANBOORU_TAGS_DB=/custom/path/tags.db python src/server.py
```

### Database needs rebuilding / updating

Re-run the scraper (it's resumable):

```bash
# Refresh all tags
python scripts/scrape_tags.py --no-resume

# Update changed tags only (re-scrapes from scratch, stops at ≥10 posts boundary)
python scripts/scrape_tags.py
```

Then rebuild the Docker image:
```bash
docker build -f Dockerfile.prebuilt -t danbooru-mcp:latest .
```