Skip to main content

A parser for SuperU receipts

Project description

Superslurp

SuperSlurp is a Utility for Parsing & Extracting Receipts via Savage Layers of Unreadable Regex & Processing

It parses Super U PDF receipts. Can generate a JSON from a receipt, or a JSON aggregate from multiple receipts for consumption from another tools, or generate an HTML report directly.

The parser understands the intricacies of French cuisine, for example it knows a Reblochon fermier from a laitier, as per the French government's official AOP specification.

Useful when you want to display cheese consumption intensity in €/day inside Grafana, or detect sneaky shrinkflation via fat-content drift on your favorite fromage blanc.

Quick start

Install

pip install superslurp

Run

Generate a report from directory of PDF receipts:

superu-report receipts/*.pdf -o report.html

Then open report.html

Synonyms

Receipts are full of abbreviations (REBL.SAV. for REBLOCHON SAVOIE). ~200 built-in synonyms handle the common ones. To add your own, create a JSON file mapping abbreviations to full names:

{
  "FAR.FROM": "FARINE DE FROMENT",
  "FROM": "FROMAGE"
}
superu-report receipts/*.pdf --synonyms extra.json -o report.html

Order matters — put multi-word abbreviations before their single-word parts. Here FAR.FROM comes before FROM, so a receipt line FAR.FROM correctly becomes FARINE DE FROMENT. If FROM came first, it would be replaced by FROMAGE and you'd end up with flour cheese instead of wheat flour.

Use --no-default-synonyms to disable the built-in set entirely.

Parse example

REBL.SAV.AOP.FRM.LC BIO BQT.X12 450G 32%MG 8,61 € 11 is parsed as:

{
  "raw": "REBL.SAV.AOP.FRM.LC BIO BQT.X12 450G 32%MG  8,61 €  11", // debug=True
  "name": "REBLOCHON",
  "price": 8.61,
  "bought": 1,
  "units": 12,
  "grams": 450.0,
  "fat_pct": 32.0,
  "properties": {
    "bio": true,
    "milk_treatment": "cru",
    "production": "fermier",
    "label": "AOP",
    "packaging": "BARQUETTE",
    "origin": "SAVOIE",
  },
  "...": "...",
}

Aggregate output

Products are grouped under a canonical name using fuzzy matching (difflib, threshold 0.90):

{
  "stores": [{ "id": "123_456", "store_name": "...", "location": "..." }],
  "sessions": [{ "id": 1, "date": "2025-01-15 10:00:00", "store_id": "123_456" }],
  "session_totals": [{ "session_id": 1, "date": "2025-01-15", "total": 42.5 }],
  "session_category_totals": ["..."],
  "category_rolling_averages": ["..."],
  "products": [
    {
      "canonical_name": "OEUFS",
      "observations": [
        {
          "original_name": "OEUFS PLEIN AIR MOYEN",
          "session_id": 1,
          "price": 3.15,
          "unit_count": 12,
          "price_per_unit": 0.2625,
          "bio": true,
          "...": "...",
        },
      ],
    },
  ],
}

Developer guide

Pipeline steps

superu-report runs parse → aggregate → HTML in one shot. During development you can run each step individually to avoid re-doing everything when iterating on a single stage:

# 1. Parse a single receipt PDF → JSON
superu-receipt-parser receipt.pdf -o receipt.json

# 2. Aggregate multiple parsed JSONs
superu-aggregate-parsed-receipt receipts/ -o aggregate.json

# 3. Generate report from an existing aggregate
superu-report-from-aggregate aggregate.json -o report.html

# Or pipe step 2 → 3
superu-aggregate-parsed-receipt receipts/ | superu-report-from-aggregate - -o report.html

Python API

Function Description
parse_superu_receipt(filename, *, synonyms, debug) Parse a PDF receipt → Receipt dict
compare_receipt_files(paths, *, threshold) Aggregate parsed JSONs → CompareResult dict
generate_report(filenames, *, synonyms) Parse PDFs + aggregate + render HTML string

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

superslurp-0.0.5.tar.gz (63.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

superslurp-0.0.5-py3-none-any.whl (55.5 kB view details)

Uploaded Python 3

File details

Details for the file superslurp-0.0.5.tar.gz.

File metadata

  • Download URL: superslurp-0.0.5.tar.gz
  • Upload date:
  • Size: 63.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for superslurp-0.0.5.tar.gz
Algorithm Hash digest
SHA256 72b3a584c30a0dc9d7480796caae76b0bb61832403755882b866bd6b09aeb0d3
MD5 86f9995000dbb2c54b4c0d35e53f77fe
BLAKE2b-256 c82ab82f2f88a14c200d9daf0e5ba00980ff77e836c44e30280ded931941478c

See more details on using hashes here.

File details

Details for the file superslurp-0.0.5-py3-none-any.whl.

File metadata

  • Download URL: superslurp-0.0.5-py3-none-any.whl
  • Upload date:
  • Size: 55.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for superslurp-0.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 7b8d47a17b8e94a06ea05568d245dcd77bf9a115184d17a2d4a811248c2d6da2
MD5 9bf7f5d2cc9a4ca3ad48ac6a26a47a08
BLAKE2b-256 4f0f3839f68f8d26b0fcfac8a04bb445efa07202bcf70664c3dd849f782c77d1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page