# Recipe ETL Scripts This directory contains helper scripts for extracting Woodworking recipe data from the raw **datasets/Woodworking.txt** file and loading it into the project PostgreSQL database. ## File overview | File | Purpose | |------|---------| | **woodworking_to_csv.py** | Legacy first-pass parser → `datasets/Woodworking.csv`. | | **woodworking_to_csv_v2.py** | Improved parser that matches the spec (category, level, sub-crafts, ingredients, HQ yields, etc.) → `datasets/Woodworking_v2.csv`. | | **recipes_to_csv_v2.py** | Generic parser. `python recipes_to_csv_v2.py ` processes one craft; use `python recipes_to_csv_v2.py --all` **or simply omit the argument** to parse every `.txt` file under `datasets/`, producing `datasets/_v2.csv` for each. | | **load_woodworking_to_db.py** | Loader for the legacy CSV (kept for reference). | | **load_woodworking_v2_to_db.py** | Drops & recreates **recipes_woodworking** table and bulk-loads `Woodworking_v2.csv`. | | **load_recipes_v2_to_db.py** | Generic loader. | **load_inventory_to_db.py** | Truncate & load `datasets/inventory.csv` into the `inventory` table. | `python load_recipes_v2_to_db.py ` loads one craft; omit the argument to load **all** generated CSVs into their respective `recipes_` tables. | | **requirements.txt** | Minimal Python dependencies for the scripts. | | **venv/** | Local virtual-environment created by the setup steps below. | ## Prerequisites * Python ≥ 3.9 * PostgreSQL instance reachable with credentials in `db.conf` at project root: ```ini PSQL_HOST=… PSQL_PORT=… PSQL_USER=… PSQL_PASSWORD=… PSQL_DBNAME=… ``` ## Quick start (Woodworking example) ```bash # 1. From project root cd scripts # 2. Create & activate virtualenv (only once) python3 -m venv venv source venv/bin/activate # 3. Install dependencies pip install -r requirements.txt # 4. Generate CSVs for **all** crafts python recipes_to_csv_v2.py --all # or simply `python recipes_to_csv_v2.py` # 5. Load all crafts into the DB (drops/recreates each table) python load_recipes_v2_to_db.py ``` To work with a **single craft**, specify its name instead: ```bash python recipes_to_csv_v2.py Smithing # generate Smithing_v2.csv python load_recipes_v2_to_db.py Smithing # load only Smithing recipes ``` The loader will output e.g.: ``` Wrote 480 recipes -> datasets/Woodworking_v2.csv Loaded recipes into new recipes_woodworking table. ``` ## CSV schema (v2) Column | Notes ------ | ----- `category` | Craft rank without level range (e.g. "Amateur") `level` | Recipe level integer `subcrafts` | JSON list `[["Smithing",2],["Alchemy",7]]` `name` | NQ product name `crystal` | Element used (Wind, Earth, etc.) `key_item` | Required key item (blank if none) `ingredients` | JSON list `[["Arrowwood Log",1]]` `hq_yields` | JSON list HQ1-HQ3 e.g. `[["Arrowwood Lumber",6],["Arrowwood Lumber",9],["Arrowwood Lumber",12]]` ## Parsing rules * Item quantities are detected only when the suffix uses an “x” (e.g. `Lumber x6`). * Strings such as `Bronze Leggings +1` are treated as the **full item name**; the `+1/+2/+3` suffix is preserved. ## Developing / debugging * Edit the parsers as needed, then rerun them to regenerate CSV. * Feel free to add new scripts here; remember to update **requirements.txt** & this README.