Initial commit

2025-07-07 13:39:46 +01:00
commit cfa2eff6ef
69 changed files with 70452 additions and 0 deletions
--- a/scripts/README.md
+++ b/scripts/README.md
@@ -0,0 +1,88 @@
+# Recipe ETL Scripts
+
+This directory contains helper scripts for extracting Woodworking recipe data
+from the raw **datasets/Woodworking.txt** file and loading it into the project
+PostgreSQL database.
+
+## File overview
+
+| File | Purpose |
+|------|---------|
+| **woodworking_to_csv.py** | Legacy first-pass parser → `datasets/Woodworking.csv`. |
+| **woodworking_to_csv_v2.py** | Improved parser that matches the spec (category, level, sub-crafts, ingredients, HQ yields, etc.) → `datasets/Woodworking_v2.csv`. |
+| **recipes_to_csv_v2.py** | Generic parser. `python recipes_to_csv_v2.py <Craft>` processes one craft; use `python recipes_to_csv_v2.py --all` **or simply omit the argument** to parse every `.txt` file under `datasets/`, producing `datasets/<Craft>_v2.csv` for each. |
+| **load_woodworking_to_db.py** | Loader for the legacy CSV (kept for reference). |
+| **load_woodworking_v2_to_db.py** | Drops & recreates **recipes_woodworking** table and bulk-loads `Woodworking_v2.csv`. |
+| **load_recipes_v2_to_db.py** | Generic loader. `python load_recipes_v2_to_db.py <Craft>` loads one craft; omit the argument to load **all** generated CSVs into their respective `recipes_<craft>` tables. |
+| **requirements.txt** | Minimal Python dependencies for the scripts. |
+| **venv/** | Local virtual-environment created by the setup steps below. |
+
+## Prerequisites
+
+* Python ≥ 3.9
+* PostgreSQL instance reachable with credentials in `db.conf` at project root:
+
+  ```ini
+  PSQL_HOST=…
+  PSQL_PORT=…
+  PSQL_USER=…
+  PSQL_PASSWORD=…
+  PSQL_DBNAME=…
+  ```
+
+## Quick start (Woodworking example)
+
+```bash
+# 1. From project root
+cd scripts
+
+# 2. Create & activate virtualenv (only once)
+python3 -m venv venv
+source venv/bin/activate
+
+# 3. Install dependencies
+pip install -r requirements.txt
+
+# 4. Generate CSVs for **all** crafts
+python recipes_to_csv_v2.py --all  # or simply `python recipes_to_csv_v2.py`
+
+# 5. Load all crafts into the DB (drops/recreates each table)
+python load_recipes_v2_to_db.py
+```
+
+To work with a **single craft**, specify its name instead:
+
+```bash
+python recipes_to_csv_v2.py Smithing       # generate Smithing_v2.csv
+python load_recipes_v2_to_db.py Smithing   # load only Smithing recipes
+```
+
+The loader will output e.g.:
+
+```
+Wrote 480 recipes -> datasets/Woodworking_v2.csv
+Loaded recipes into new recipes_woodworking table.
+```
+
+## CSV schema (v2)
+
+Column | Notes
+------ | -----
+`category` | Craft rank without level range (e.g. "Amateur")
+`level` | Recipe level integer
+`subcrafts` | JSON list `[["Smithing",2],["Alchemy",7]]`
+`name` | NQ product name
+`crystal` | Element used (Wind, Earth, etc.)
+`key_item` | Required key item (blank if none)
+`ingredients` | JSON list `[["Arrowwood Log",1]]`
+`hq_yields` | JSON list HQ1-HQ3 e.g. `[["Arrowwood Lumber",6],["Arrowwood Lumber",9],["Arrowwood Lumber",12]]`
+
+## Parsing rules
+
+* Item quantities are detected only when the suffix uses an “x” (e.g. `Lumber x6`).
+* Strings such as `Bronze Leggings +1` are treated as the **full item name**; the `+1/+2/+3` suffix is preserved.
+
+## Developing / debugging
+
+* Edit the parsers as needed, then rerun them to regenerate CSV.
+* Feel free to add new scripts here; remember to update **requirements.txt** & this README.