3.2 KiB
3.2 KiB
Recipe ETL Scripts
This directory contains helper scripts for extracting Woodworking recipe data from the raw datasets/Woodworking.txt file and loading it into the project PostgreSQL database.
File overview
| File | Purpose |
|---|---|
| woodworking_to_csv.py | Legacy first-pass parser → datasets/Woodworking.csv. |
| woodworking_to_csv_v2.py | Improved parser that matches the spec (category, level, sub-crafts, ingredients, HQ yields, etc.) → datasets/Woodworking_v2.csv. |
| recipes_to_csv_v2.py | Generic parser. python recipes_to_csv_v2.py <Craft> processes one craft; use python recipes_to_csv_v2.py --all or simply omit the argument to parse every .txt file under datasets/, producing datasets/<Craft>_v2.csv for each. |
| load_woodworking_to_db.py | Loader for the legacy CSV (kept for reference). |
| load_woodworking_v2_to_db.py | Drops & recreates recipes_woodworking table and bulk-loads Woodworking_v2.csv. |
| load_recipes_v2_to_db.py | Generic loader. python load_recipes_v2_to_db.py <Craft> loads one craft; omit the argument to load all generated CSVs into their respective recipes_<craft> tables. |
| requirements.txt | Minimal Python dependencies for the scripts. |
| venv/ | Local virtual-environment created by the setup steps below. |
Prerequisites
-
Python ≥ 3.9
-
PostgreSQL instance reachable with credentials in
db.confat project root:PSQL_HOST=… PSQL_PORT=… PSQL_USER=… PSQL_PASSWORD=… PSQL_DBNAME=…
Quick start (Woodworking example)
# 1. From project root
cd scripts
# 2. Create & activate virtualenv (only once)
python3 -m venv venv
source venv/bin/activate
# 3. Install dependencies
pip install -r requirements.txt
# 4. Generate CSVs for **all** crafts
python recipes_to_csv_v2.py --all # or simply `python recipes_to_csv_v2.py`
# 5. Load all crafts into the DB (drops/recreates each table)
python load_recipes_v2_to_db.py
To work with a single craft, specify its name instead:
python recipes_to_csv_v2.py Smithing # generate Smithing_v2.csv
python load_recipes_v2_to_db.py Smithing # load only Smithing recipes
The loader will output e.g.:
Wrote 480 recipes -> datasets/Woodworking_v2.csv
Loaded recipes into new recipes_woodworking table.
CSV schema (v2)
| Column | Notes |
|---|---|
category |
Craft rank without level range (e.g. "Amateur") |
level |
Recipe level integer |
subcrafts |
JSON list [["Smithing",2],["Alchemy",7]] |
name |
NQ product name |
crystal |
Element used (Wind, Earth, etc.) |
key_item |
Required key item (blank if none) |
ingredients |
JSON list [["Arrowwood Log",1]] |
hq_yields |
JSON list HQ1-HQ3 e.g. [["Arrowwood Lumber",6],["Arrowwood Lumber",9],["Arrowwood Lumber",12]] |
Parsing rules
- Item quantities are detected only when the suffix uses an “x” (e.g.
Lumber x6). - Strings such as
Bronze Leggings +1are treated as the full item name; the+1/+2/+3suffix is preserved.
Developing / debugging
- Edit the parsers as needed, then rerun them to regenerate CSV.
- Feel free to add new scripts here; remember to update requirements.txt & this README.