Skip to main content

A parser for SuperU receipts

Project description

superslurp : Super, Sublime, Light, and Unprecedented Receipt Parser

Parser for SuperU receipts. Take the PDF receipt sent by mail as input and return a json.

Useful when you want to display the instantaneous cheese consumption intensity of your home in € inside grafana.

1. Parse a receipt

from superslurp import parse_superu_receipt

result = parse_superu_receipt("Ticket de caisse_01032022-165652.pdf")

The receipt line QUENELLE NATURE U X6 240G / 3 x 0,85 € 2,55 € 11 is parsed as:

{
  "name": "QUENELLE NATURE U",
  "price": 0.85,
  "bought": 3,
  "units": 6,
  "grams": 240.0,
  "volume_ml": null,
  "tr": false,
  "way_of_paying": "11",
  "discount": null,
  "properties": {}
}

Pass debug=True to include the original receipt line ("raw" field):

from superslurp import parse_superu_receipt

result = parse_superu_receipt("receipt.pdf", debug=True)

Pass a synonyms dict to expand receipt abbreviations in item names:

from superslurp import parse_superu_receipt

synonyms = {"TABS": "TABLETTES", "VAISS": "VAISSELLE"}
result = parse_superu_receipt("receipt.pdf", synonyms=synonyms)
# "TABS LAVE VAISS.STANDARD U" → "TABLETTES LAVE VAISSELLE STANDARD U"

CLI:

superu-receipt-parser receipt.pdf --synonyms synonyms.json

2. Aggregate receipts

Compare items across multiple parsed receipts. Products are grouped under a canonical name using fuzzy matching (via difflib).

from pathlib import Path

from superslurp.compare.aggregate import compare_receipt_files

result = compare_receipt_files(
    paths=[Path("receipt1.json"), Path("receipt2.json")],
    threshold=0.90,       # difflib threshold (default: 0.90)
)

Synonyms are applied at parse time (step 1), so the JSON files fed to compare_receipt_files already contain expanded names.

The result contains stores, sessions, per-session totals, a rolling weekly average, and products with their observations:

{
  "stores": [{ "id": "123_456", "store_name": "...", "location": "..." }],
  "sessions": [{ "id": 1, "date": "2025-01-15 10:00:00", "store_id": "123_456" }],
  "session_totals": [{ "session_id": 1, "date": "2025-01-15", "total": 42.5 }],
  "rolling_average": [{ "date": "2025-01-13", "value": 85.3 }, "..."],
  "products": [
    {
      "canonical_name": "OEUFS",
      "observations": [
        {
          "original_name": "OEUFS PLEIN AIR MOYEN",
          "session_id": 1,
          "price": 3.15,
          "quantity": 1,
          "grams": null,
          "discount": null,
          "price_per_kg": null,
          "volume_ml": null,
          "price_per_liter": null,
          "unit_count": 12,
          "price_per_unit": 0.2625,
          "bio": true
        }
      ]
    }
  ]
}

CLI:

superu-aggregate-parsed-receipt receipts/ --output aggregate.json

3. Generate an HTML report

From PDFs directly

Parse receipt PDFs and generate a self-contained HTML dashboard in one step:

from superslurp import generate_report

html = generate_report(
    ["receipt1.pdf", "receipt2.pdf", "receipt3.pdf"],
    synonyms=synonyms,    # optional
    threshold=0.90,       # fuzzy matching threshold (default: 0.90)
)
Path("report.html").write_text(html)

CLI:

superu-report receipts/*.pdf --synonyms synonyms.json --output report.html

From an existing aggregate JSON

If you already have an aggregate JSON (from step 2):

from superslurp.compare.html_report import generate_html

html = generate_html(aggregate_result)
Path("report.html").write_text(html)
superu-report-from-aggregate aggregate.json --output report.html

Or pipe directly from aggregate:

superu-aggregate-parsed-receipt receipts/ \
  | superu-report-from-aggregate - --output report.html

Synonyms

Synonyms are applied during parsing (step 1) — the aggregate step (step 2) only does fuzzy matching on already-expanded names.

Synonyms is an ordered dict[str, str]. Entries are applied sequentially with word-boundary matching — insertion order matters. Earlier entries are replaced first, so later entries won't match words already consumed.

Dots in both names and keys are normalized to spaces before matching, so "FROM.BLC" matches FROM.BLC on the receipt.

synonyms = {
    "FROM.BLC": "FROMAGE BLANC",          # applied 1st: consumes FROM and BLC
    "CHOCO PATIS": "CHOCOLAT PATISSIER",  # applied 2nd: consumes CHOCO and PATIS
    "CHOCO": "CHOCOLAT",                  # applied 3rd: only if CHOCO still present
    "FROM": "FROMAGE",                    # applied 4th: only if FROM still present
    "PATIS": "PATISSERIE",               # applied 5th: only if PATIS still present
}
# "FROM.BLC NAT"         → "FROMAGE BLANC NAT"            (FROM.BLC consumed by 1st)
# "FROM.RAPE"            → "FROMAGE RAPE"                 (FROM consumed by 4th)
# "CHOCO.PATIS.NOIR 52%" → "CHOCOLAT PATISSIER NOIR 52%"  (CHOCO PATIS consumed by 2nd)
# "CHOCO.NOIR"           → "CHOCOLAT NOIR"                (CHOCO consumed by 3rd)

The JSON file is a standard object (key order is preserved since Python 3.7):

{
  "FROM.BLC": "FROMAGE BLANC",
  "CHOCO PATIS": "CHOCOLAT PATISSIER",
  "CHOCO": "CHOCOLAT",
  "FROM": "FROMAGE",
  "PATIS": "PATISSERIE"
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

superslurp-0.0.3.tar.gz (47.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

superslurp-0.0.3-py3-none-any.whl (43.8 kB view details)

Uploaded Python 3

File details

Details for the file superslurp-0.0.3.tar.gz.

File metadata

  • Download URL: superslurp-0.0.3.tar.gz
  • Upload date:
  • Size: 47.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for superslurp-0.0.3.tar.gz
Algorithm Hash digest
SHA256 3ebabe66a44c54c358ab2da2d79d5887ac3843d88c3cacc573eea4df6fe57846
MD5 2e49a4a47f2328e726253992a32a4806
BLAKE2b-256 1a9d4390108f353be7a5889f7b3d64d09efeb8509f78be848e1619479cef8418

See more details on using hashes here.

File details

Details for the file superslurp-0.0.3-py3-none-any.whl.

File metadata

  • Download URL: superslurp-0.0.3-py3-none-any.whl
  • Upload date:
  • Size: 43.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for superslurp-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 4362a932bf87a54d33d035689f79eb751ddaf4d30e0ff690647ccbb584480d44
MD5 913dd0c719d1d2d5ab899ef78730f985
BLAKE2b-256 432ddc64fd842e75eb9e50b869c3995b7fe17efa56ffd40c48888a9cc9e066c9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page