A parser for SuperU receipts
Project description
Superslurp
SuperSlurp is a Utility for Parsing & Extracting Receipts via Savage Layers of Unreadable Regex & Processing
It parses Super U PDF receipts. Can generate a JSON from a receipt, or a JSON aggregate from multiple receipts for consumption from another tools, or generate an HTML report directly.
The parser understands the intricacies of French cuisine, for example it knows a Reblochon fermier from a laitier, as per the French government's official AOP specification.
Useful when you want to display cheese consumption intensity in €/day inside Grafana, or detect sneaky shrinkflation via fat-content drift on your favorite fromage blanc.
Quick start
Install
pip install superslurp
Run
Generate a report from directory of PDF receipts:
superu-report receipts/*.pdf -o report.html
Then open report.html
Synonyms
Receipts are full of abbreviations (REBL.SAV. for REBLOCHON SAVOIE). ~200 built-in
synonyms handle the common ones. To add your own, create a JSON file mapping
abbreviations to full names:
{
"FAR.FROM": "FARINE DE FROMENT",
"FROM": "FROMAGE"
}
superu-report receipts/*.pdf --synonyms extra.json -o report.html
Order matters — put multi-word abbreviations before their single-word parts. Here
FAR.FROM comes before FROM, so a receipt line FAR.FROM correctly becomes
FARINE DE FROMENT. If FROM came first, it would be replaced by FROMAGE and you'd
end up with flour cheese instead of wheat flour.
Use --no-default-synonyms to disable the built-in set entirely.
Parse example
REBL.SAV.AOP.FRM.LC BIO BQT.X12 450G 32%MG 8,61 € 11 is parsed as:
{
"raw": "REBL.SAV.AOP.FRM.LC BIO BQT.X12 450G 32%MG 8,61 € 11", // debug=True
"name": "REBLOCHON",
"price": 8.61,
"bought": 1,
"units": 12,
"grams": 450.0,
"fat_pct": 32.0,
"properties": {
"bio": true,
"milk_treatment": "cru",
"production": "fermier",
"label": "AOP",
"packaging": "BARQUETTE",
"origin": "SAVOIE",
},
"...": "...",
}
Aggregate output
Products are grouped under a canonical name using fuzzy matching (difflib, threshold 0.90):
{
"stores": [{ "id": "123_456", "store_name": "...", "location": "..." }],
"sessions": [{ "id": 1, "date": "2025-01-15 10:00:00", "store_id": "123_456" }],
"session_totals": [{ "session_id": 1, "date": "2025-01-15", "total": 42.5 }],
"session_category_totals": ["..."],
"category_rolling_averages": ["..."],
"products": [
{
"canonical_name": "OEUFS",
"observations": [
{
"original_name": "OEUFS PLEIN AIR MOYEN",
"session_id": 1,
"price": 3.15,
"unit_count": 12,
"price_per_unit": 0.2625,
"bio": true,
"...": "...",
},
],
},
],
}
Developer guide
Pipeline steps
superu-report runs parse → aggregate → HTML in one shot. During development you can
run each step individually to avoid re-doing everything when iterating on a single
stage:
# 1. Parse a single receipt PDF → JSON
superu-receipt-parser receipt.pdf -o receipt.json
# 2. Aggregate multiple parsed JSONs
superu-aggregate-parsed-receipt receipts/ -o aggregate.json
# 3. Generate report from an existing aggregate
superu-report-from-aggregate aggregate.json -o report.html
# Or pipe step 2 → 3
superu-aggregate-parsed-receipt receipts/ | superu-report-from-aggregate - -o report.html
Python API
| Function | Description |
|---|---|
parse_superu_receipt(filename, *, synonyms, debug) |
Parse a PDF receipt → Receipt dict |
compare_receipt_files(paths, *, threshold) |
Aggregate parsed JSONs → CompareResult dict |
generate_report(filenames, *, synonyms) |
Parse PDFs + aggregate + render HTML string |
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file superslurp-0.0.5.tar.gz.
File metadata
- Download URL: superslurp-0.0.5.tar.gz
- Upload date:
- Size: 63.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
72b3a584c30a0dc9d7480796caae76b0bb61832403755882b866bd6b09aeb0d3
|
|
| MD5 |
86f9995000dbb2c54b4c0d35e53f77fe
|
|
| BLAKE2b-256 |
c82ab82f2f88a14c200d9daf0e5ba00980ff77e836c44e30280ded931941478c
|
File details
Details for the file superslurp-0.0.5-py3-none-any.whl.
File metadata
- Download URL: superslurp-0.0.5-py3-none-any.whl
- Upload date:
- Size: 55.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7b8d47a17b8e94a06ea05568d245dcd77bf9a115184d17a2d4a811248c2d6da2
|
|
| MD5 |
9bf7f5d2cc9a4ca3ad48ac6a26a47a08
|
|
| BLAKE2b-256 |
4f0f3839f68f8d26b0fcfac8a04bb445efa07202bcf70664c3dd849f782c77d1
|