# Danbooru MCP Tag Validator — Plan ## Overview A Python MCP server backed by a SQLite database that allows an LLM to search, validate, and get suggestions for valid Danbooru tags. This enables an Illustrious/Stable Diffusion ComfyUI workflow to ensure all generated prompts contain only real Danbooru tags. --- ## Architecture ```mermaid graph TD A[LLM / ComfyUI Workflow] -->|MCP protocol| B[danbooru-mcp Python server] B -->|SQL queries| C[SQLite DB: tags.db] C -->|FTS5 index| D[fast text search] E[all_tags.csv 900k rows] -->|import script one-time| C ``` --- ## Project Structure ``` danbooru-mcp/ ├── data/ │ └── all_tags.csv # source data (existing, ~900k entries) ├── db/ │ └── tags.db # generated SQLite DB (gitignored) ├── scripts/ │ └── import_tags.py # one-time CSV → SQLite importer ├── src/ │ └── server.py # MCP server entry point ├── pyproject.toml # deps: mcp[cli] only (sqlite3 is stdlib) ├── .gitignore └── README.md ``` --- ## Database Design ### Table: `tags` ```sql CREATE TABLE tags ( id INTEGER PRIMARY KEY, name TEXT UNIQUE NOT NULL ); ``` ### FTS5 Virtual Table: `tags_fts` ```sql CREATE VIRTUAL TABLE tags_fts USING fts5( name, content='tags', content_rowid='id' ); ``` - Allows fast prefix queries: `tags_fts MATCH 'blue_*'` - Exact-match validation uses B-tree index on `tags.name UNIQUE` --- ## MCP Tools | Tool | Input | Output | Purpose | |------|-------|--------|---------| | `search_tags` | `query: str`, `limit: int = 20` | `list[str]` | Prefix/substring search for tag discovery | | `validate_tags` | `tags: list[str]` | `{valid: list, invalid: list}` | Bulk exact-match validation of a prompt | | `suggest_tags` | `partial: str`, `limit: int = 10` | `list[str]` | Autocomplete-style correction of near-miss tags | --- ## Key Design Decisions - **SQLite over PostgreSQL** — zero operational overhead, file-portable, sufficient for read-only tag lookup - **FTS5 over LIKE** — `LIKE '%query%'` on 900k rows is O(n); FTS5 prefix matching is O(log n) via inverted index - **Exact-match bulk validation** — `WHERE name IN (...)` leverages SQLite's B-tree index on the UNIQUE constraint - **No vector/embedding search** — Danbooru tags are well-defined strings; exact + prefix matching is appropriate and zero-dependency --- ## Dependencies - Python 3.10+ - `mcp[cli]` — official Python MCP SDK - `sqlite3` — Python stdlib, no install required --- ## MCP Client Config Example ```json { "mcpServers": { "danbooru-tags": { "command": "python", "args": ["/path/to/danbooru-mcp/src/server.py"] } } } ``` --- ## Todo List - [ ] Write `pyproject.toml` with `mcp[cli]` dependency - [ ] Write `scripts/import_tags.py` — CSV to SQLite with FTS5 index - [ ] Write `src/server.py` — MCP server with 3 tools - [ ] `search_tags` tool - [ ] `validate_tags` tool - [ ] `suggest_tags` tool - [ ] Write `.gitignore` (exclude `db/tags.db`) - [ ] Write `README.md` with setup and usage instructions