Files
Mog-Squire/scripts/README.md
2025-07-08 23:04:43 +01:00

3.3 KiB

Recipe ETL Scripts

This directory contains helper scripts for extracting Woodworking recipe data from the raw datasets/Woodworking.txt file and loading it into the project PostgreSQL database.

File overview

File Purpose
woodworking_to_csv.py Legacy first-pass parser → datasets/Woodworking.csv.
woodworking_to_csv_v2.py Improved parser that matches the spec (category, level, sub-crafts, ingredients, HQ yields, etc.) → datasets/Woodworking_v2.csv.
recipes_to_csv_v2.py Generic parser. python recipes_to_csv_v2.py <Craft> processes one craft; use python recipes_to_csv_v2.py --all or simply omit the argument to parse every .txt file under datasets/, producing datasets/<Craft>_v2.csv for each.
load_woodworking_to_db.py Loader for the legacy CSV (kept for reference).
load_woodworking_v2_to_db.py Drops & recreates recipes_woodworking table and bulk-loads Woodworking_v2.csv.
load_recipes_v2_to_db.py Generic loader.
load_inventory_to_db.py Truncate & load datasets/inventory.csv into the inventory table.
requirements.txt Minimal Python dependencies for the scripts.
venv/ Local virtual-environment created by the setup steps below.

Prerequisites

  • Python ≥ 3.9

  • PostgreSQL instance reachable with credentials in db.conf at project root:

    PSQL_HOST=
    PSQL_PORT=
    PSQL_USER=
    PSQL_PASSWORD=
    PSQL_DBNAME=
    

Quick start (Woodworking example)

# 1. From project root
cd scripts

# 2. Create & activate virtualenv (only once)
python3 -m venv venv
source venv/bin/activate

# 3. Install dependencies
pip install -r requirements.txt

# 4. Generate CSVs for **all** crafts
python recipes_to_csv_v2.py --all  # or simply `python recipes_to_csv_v2.py`

# 5. Load all crafts into the DB (drops/recreates each table)
python load_recipes_v2_to_db.py

To work with a single craft, specify its name instead:

python recipes_to_csv_v2.py Smithing       # generate Smithing_v2.csv
python load_recipes_v2_to_db.py Smithing   # load only Smithing recipes

The loader will output e.g.:

Wrote 480 recipes -> datasets/Woodworking_v2.csv
Loaded recipes into new recipes_woodworking table.

CSV schema (v2)

Column Notes
category Craft rank without level range (e.g. "Amateur")
level Recipe level integer
subcrafts JSON list [["Smithing",2],["Alchemy",7]]
name NQ product name
crystal Element used (Wind, Earth, etc.)
key_item Required key item (blank if none)
ingredients JSON list [["Arrowwood Log",1]]
hq_yields JSON list HQ1-HQ3 e.g. [["Arrowwood Lumber",6],["Arrowwood Lumber",9],["Arrowwood Lumber",12]]

Parsing rules

  • Item quantities are detected only when the suffix uses an “x” (e.g. Lumber x6).
  • Strings such as Bronze Leggings +1 are treated as the full item name; the +1/+2/+3 suffix is preserved.

Developing / debugging

  • Edit the parsers as needed, then rerun them to regenerate CSV.
  • Feel free to add new scripts here; remember to update requirements.txt & this README.