danbooru-mcp/README.md

# danbooru-mcp

An MCP (Model Context Protocol) server that lets an LLM search, validate, and get suggestions for valid **Danbooru tags** — the prompt vocabulary used by Illustrious and other Danbooru-trained Stable Diffusion models.

📖 **[Full User Guide](docs/user-guide.md)** — workflow walkthrough, tool reference, best practices, and common scenarios.

Tags are scraped directly from the **Danbooru public API** and stored in a local SQLite database with an **FTS5 full-text search index** for fast prefix/substring queries. Each tag includes its post count, category, and deprecation status so the LLM can prioritise well-used, canonical tags.

---

## Tools

| Tool | Description |
|------|-------------|
| `search_tags(query, limit=20, category=None)` | Prefix/full-text search — returns rich tag objects ordered by relevance |
| `validate_tags(tags)` | Exact-match validation — splits into `valid`, `deprecated`, `invalid` |
| `suggest_tags(partial, limit=10, category=None)` | Autocomplete for partial tag strings, sorted by post count |

### Return object shape

All tools return tag objects with:

```json
{
  "name":         "blue_hair",
  "post_count":   1079908,
  "category":     "general",
  "is_deprecated": false
}
```

### Category filter values

`"general"` · `"artist"` · `"copyright"` · `"character"` · `"meta"`

---

## Setup

### 1. Install dependencies

```bash
pip install -e .
```

### 2. Build the SQLite database (scrapes the Danbooru API)

```bash
python scripts/scrape_tags.py
```

This scrapes ~1–2 million tags from the Danbooru public API (no account required)
and stores them in `db/tags.db` with a FTS5 index.
Estimated time: **5–15 minutes** depending on network speed.

```
Options:
  --db PATH         Output database path (default: db/tags.db)
  --workers N       Parallel HTTP workers (default: 4)
  --max-page N      Safety cap on pages (default: 2500)
  --no-resume       Re-scrape all pages from scratch
  --no-fts          Skip FTS5 rebuild (for incremental runs)
```

The scraper is **resumable** — if interrupted, re-run it and it will
continue from where it left off.

### 3. (Optional) Test API access first

```bash
python scripts/test_danbooru_api.py
```

### 4. Run the MCP server

```bash
python src/server.py
```

---

## Docker

### Quick start (pre-built DB — recommended)

Use this when you've already run `python scripts/scrape_tags.py` and have `db/tags.db`:

```bash
# Build image with the pre-built DB baked in (~30 seconds)
docker build -f Dockerfile.prebuilt -t danbooru-mcp .

# Verify
docker run --rm --entrypoint python danbooru-mcp \
  -c "import sqlite3,sys; c=sqlite3.connect('/app/db/tags.db'); sys.stderr.write(str(c.execute('SELECT COUNT(*) FROM tags').fetchone()[0]) + ' tags\n')"
```

### Build from scratch (runs the scraper during Docker build)

```bash
# Scrapes the Danbooru API during build — takes ~15 minutes
docker build \
  --build-arg DANBOORU_USER=your_username \
  --build-arg DANBOORU_API_KEY=your_api_key \
  -t danbooru-mcp .
```

### MCP client config (Docker)

```json
{
  "mcpServers": {
    "danbooru-tags": {
      "command": "docker",
      "args": ["run", "--rm", "-i", "danbooru-mcp:latest"]
    }
  }
}
```

---

## MCP Client Configuration

### Claude Desktop (`claude_desktop_config.json`)

```json
{
  "mcpServers": {
    "danbooru-tags": {
      "command": "python",
      "args": ["/absolute/path/to/danbooru-mcp/src/server.py"]
    }
  }
}
```

### Custom DB path via environment variable

```json
{
  "mcpServers": {
    "danbooru-tags": {
      "command": "python",
      "args": ["/path/to/src/server.py"],
      "env": {
        "DANBOORU_TAGS_DB": "/custom/path/to/tags.db"
      }
    }
  }
}
```

---

## Example LLM Prompt Workflow

```
User: Generate a prompt for a girl with blue hair and a sword.

LLM calls validate_tags(["1girl", "blue_hairs", "sword", "looking_at_vewer"])
→ {
    "valid":      ["1girl", "sword"],
    "deprecated": [],
    "invalid":    ["blue_hairs", "looking_at_vewer"]
  }

LLM calls suggest_tags("blue_hair", limit=3)
→ [
    {"name": "blue_hair",     "post_count": 1079908, "category": "general"},
    {"name": "blue_hairband", "post_count":   26905, "category": "general"},
    ...
  ]

LLM calls suggest_tags("looking_at_viewer", limit=1)
→ [{"name": "looking_at_viewer", "post_count": 4567890, "category": "general"}]

Final validated prompt: 1girl, blue_hair, sword, looking_at_viewer
```

---

## Project Structure

```
danbooru-mcp/
├── data/
│   └── all_tags.csv              # original CSV export (legacy, replaced by API scrape)
├── db/
│   └── tags.db                   # SQLite DB (generated, gitignored)
├── plans/
│   └── danbooru-mcp-plan.md      # Architecture plan
├── scripts/
│   ├── scrape_tags.py            # API scraper → SQLite (primary)
│   ├── import_tags.py            # Legacy CSV importer
│   └── test_danbooru_api.py      # API connectivity tests
├── src/
│   └── server.py                 # MCP server
├── pyproject.toml
├── .gitignore
└── README.md
```

---

## Requirements

- Python 3.10+
- `mcp[cli]` — official Python MCP SDK
- `requests` — HTTP client for API scraping
- `sqlite3` — Python stdlib (no install needed)