A parser for SuperU receipts
Project description
superslurp : Super, Sublime, Light, and Unprecedented Receipt Parser
Parser for SuperU receipts. Take the PDF receipt sent by mail as input and return a json.
Useful when you want to display the instantaneous cheese consumption intensity of your home in € inside grafana.
1. Parse a receipt
from superslurp import parse_superu_receipt
result = parse_superu_receipt("Ticket de caisse_01032022-165652.pdf")
The receipt line QUENELLE NATURE U X6 240G / 3 x 0,85 € 2,55 € 11 is parsed as:
{
"name": "QUENELLE NATURE U",
"price": 0.85,
"quantity": 3,
"units": 6,
"grams": 240.0,
"tr": false,
"way_of_paying": "11",
"discount": null
}
Pass debug=True to include the original receipt line ("raw" field):
from superslurp import parse_superu_receipt
result = parse_superu_receipt("receipt.pdf", debug=True)
Pass a synonyms dict to expand receipt abbreviations in item names:
from superslurp import parse_superu_receipt
synonyms = {"TABS": "TABLETTES", "VAISS": "VAISSELLE"}
result = parse_superu_receipt("receipt.pdf", synonyms=synonyms)
# "TABS LAVE VAISS.STANDARD U" → "TABLETTES LAVE VAISSELLE STANDARD U"
CLI:
superu-receipt-parser receipt.pdf --synonyms synonyms.json
2. Aggregate receipts
Compare items across multiple parsed receipts. Products are grouped under a canonical name using fuzzy matching (via difflib).
from pathlib import Path
from superslurp.compare.aggregate import compare_receipt_files
synonyms = {"TABS": "TABLETTES", "VAISS": "VAISSELLE"}
result = compare_receipt_files(
paths=[Path("receipt1.json"), Path("receipt2.json")],
threshold=0.90, # difflib threshold (default: 0.90)
synonyms=synonyms, # optional, same format as parse
)
The result contains stores, sessions, per-session totals, a rolling weekly average, and products with their observations:
{
"stores": [{ "id": "123_456", "store_name": "...", "location": "..." }],
"sessions": [{ "id": 1, "date": "2025-01-15 10:00:00", "store_id": "123_456" }],
"session_totals": [{ "session_id": 1, "date": "2025-01-15", "total": 42.5 }],
"rolling_average": [{ "date": "2025-01-13", "value": 85.3 }, "..."],
"products": [
{
"canonical_name": "OEUFS",
"observations": [
{
"original_name": "OEUFS PLEIN AIR MOYEN",
"session_id": 1,
"price": 3.15,
"quantity": 1,
"grams": null,
"discount": null,
"price_per_kg": null,
"unit_count": 12,
"price_per_unit": 0.2625
}
]
}
]
}
CLI:
superu-aggregate-parsed-receipt receipts/ --synonyms synonyms.json --output aggregate.json
3. Generate an HTML report
From PDFs directly
Parse receipt PDFs and generate a self-contained HTML dashboard in one step:
from superslurp import generate_report
html = generate_report(
["receipt1.pdf", "receipt2.pdf", "receipt3.pdf"],
synonyms=synonyms, # optional
threshold=0.90, # fuzzy matching threshold (default: 0.90)
)
Path("report.html").write_text(html)
CLI:
superu-report receipts/*.pdf --synonyms synonyms.json --output report.html
From an existing aggregate JSON
If you already have an aggregate JSON (from step 2):
from superslurp.compare.html_report import generate_html
html = generate_html(aggregate_result)
Path("report.html").write_text(html)
superu-report-from-aggregate aggregate.json --output report.html
Or pipe directly from aggregate:
superu-aggregate-parsed-receipt receipts/ --synonyms synonyms.json \
| superu-report-from-aggregate - --output report.html
Synonyms
Synonyms is an ordered dict[str, str]. Entries are applied sequentially with
word-boundary matching — insertion order matters. Earlier entries are replaced
first, so later entries won't match words already consumed.
Dots in both names and keys are normalized to spaces before matching, so "FROM.BLC"
matches FROM.BLC on the receipt.
synonyms = {
"FROM.BLC": "FROMAGE BLANC", # applied 1st: consumes FROM and BLC
"CHOCO PATIS": "CHOCOLAT PATISSIER", # applied 2nd: consumes CHOCO and PATIS
"CHOCO": "CHOCOLAT", # applied 3rd: only if CHOCO still present
"FROM": "FROMAGE", # applied 4th: only if FROM still present
"PATIS": "PATISSERIE", # applied 5th: only if PATIS still present
}
# "FROM.BLC NAT" → "FROMAGE BLANC NAT" (FROM.BLC consumed by 1st)
# "FROM.RAPE" → "FROMAGE RAPE" (FROM consumed by 4th)
# "CHOCO.PATIS.NOIR 52%" → "CHOCOLAT PATISSIER NOIR 52%" (CHOCO PATIS consumed by 2nd)
# "CHOCO.NOIR" → "CHOCOLAT NOIR" (CHOCO consumed by 3rd)
The JSON file is a standard object (key order is preserved since Python 3.7):
{
"FROM.BLC": "FROMAGE BLANC",
"CHOCO PATIS": "CHOCOLAT PATISSIER",
"CHOCO": "CHOCOLAT",
"FROM": "FROMAGE",
"PATIS": "PATISSERIE"
}
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file superslurp-0.0.2.tar.gz.
File metadata
- Download URL: superslurp-0.0.2.tar.gz
- Upload date:
- Size: 38.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4f587c6d6a432ff19a6054e750529ea188914154ed5eacfa3599ec544714704f
|
|
| MD5 |
d69d50723d2a05bdb89a84814a475caf
|
|
| BLAKE2b-256 |
b014a0b90101228f5a31f3549a5f83a227bc923d51e219c38462871cb5c6cf8c
|
File details
Details for the file superslurp-0.0.2-py3-none-any.whl.
File metadata
- Download URL: superslurp-0.0.2-py3-none-any.whl
- Upload date:
- Size: 36.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
46fc60dc6e1dcd538f40407daa08e7dcc182bc1332d53d44ed247be2fe213370
|
|
| MD5 |
aeb0d504f2d1b99eed6257e497e4a446
|
|
| BLAKE2b-256 |
dbbaabf0c9824e4efe870d9b04647fc18ed1123ce65436f00caa9507d0a23691
|