A parser for SuperU receipts

These details have not been verified by PyPI

Project links

Project description

Superslurp

SuperSlurp is a Utility for Parsing & Extracting Receipts via Savage Layers of Unreadable Regex & Processing

It parses Super U PDF receipts. Can generate a JSON from a receipt, or a JSON aggregate from multiple receipts for consumption from another tools, or generate an HTML report directly.

The parser understands the intricacies of French cuisine, for example it knows a Reblochon fermier from a laitier, as per the French government's official AOP specification.

Useful when you want to display cheese consumption intensity in €/day inside Grafana, or detect sneaky shrinkflation via fat-content drift on your favorite fromage blanc.

Quick start

Install

pip install superslurp

Run

Generate a report from directory of PDF receipts:

superu-report receipts/*.pdf -o report.html

Then open report.html

Synonyms

Receipts are full of abbreviations (REBL.SAV. for REBLOCHON SAVOIE). ~200 built-in synonyms handle the common ones. To add your own, create a JSON file mapping abbreviations to full names:

{
  "FAR.FROM": "FARINE DE FROMENT",
  "FROM": "FROMAGE"
}

superu-report receipts/*.pdf --synonyms extra.json -o report.html

Order matters — put multi-word abbreviations before their single-word parts. Here FAR.FROM comes before FROM, so a receipt line FAR.FROM correctly becomes FARINE DE FROMENT. If FROM came first, it would be replaced by FROMAGE and you'd end up with flour cheese instead of wheat flour.

Use --no-default-synonyms to disable the built-in set entirely.

Parse example

REBL.SAV.AOP.FRM.LC BIO BQT.X12 450G 32%MG 8,61 € 11 is parsed as:

{
  "raw": "REBL.SAV.AOP.FRM.LC BIO BQT.X12 450G 32%MG  8,61 €  11", // debug=True
  "name": "REBLOCHON",
  "price": 8.61,
  "bought": 1,
  "units": 12,
  "grams": 450.0,
  "fat_pct": 32.0,
  "properties": {
    "bio": true,
    "milk_treatment": "cru",
    "production": "fermier",
    "label": "AOP",
    "packaging": "BARQUETTE",
    "origin": "SAVOIE",
  },
  "...": "...",
}

Aggregate output

Products are grouped under a canonical name using fuzzy matching (difflib, threshold 0.90):

{
  "stores": [{ "id": "123_456", "store_name": "...", "location": "..." }],
  "sessions": [{ "id": 1, "date": "2025-01-15 10:00:00", "store_id": "123_456" }],
  "session_totals": [{ "session_id": 1, "date": "2025-01-15", "total": 42.5 }],
  "session_category_totals": ["..."],
  "category_rolling_averages": ["..."],
  "products": [
    {
      "canonical_name": "OEUFS",
      "observations": [
        {
          "original_name": "OEUFS PLEIN AIR MOYEN",
          "session_id": 1,
          "price": 3.15,
          "unit_count": 12,
          "price_per_unit": 0.2625,
          "bio": true,
          "...": "...",
        },
      ],
    },
  ],
}

Developer guide

Pipeline steps

superu-report runs parse → aggregate → HTML in one shot. During development you can run each step individually to avoid re-doing everything when iterating on a single stage:

# 1. Parse a single receipt PDF → JSON
superu-receipt-parser receipt.pdf -o receipt.json

# 2. Aggregate multiple parsed JSONs
superu-aggregate-parsed-receipt receipts/ -o aggregate.json

# 3. Generate report from an existing aggregate
superu-report-from-aggregate aggregate.json -o report.html

# Or pipe step 2 → 3
superu-aggregate-parsed-receipt receipts/ | superu-report-from-aggregate - -o report.html

Python API

Function	Description
`parse_superu_receipt(filename, *, synonyms, debug)`	Parse a PDF receipt → `Receipt` dict
`compare_receipt_files(paths, *, threshold)`	Aggregate parsed JSONs → `CompareResult` dict
`generate_report(filenames, *, synonyms)`	Parse PDFs + aggregate + render HTML string

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.0.5

Feb 28, 2026

0.0.4

Feb 27, 2026

0.0.3

Feb 27, 2026

0.0.2

Feb 23, 2026

0.0.1

Feb 23, 2025

0.0.0

Feb 23, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

superslurp-0.0.5.tar.gz (63.8 kB view details)

Uploaded Feb 28, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

superslurp-0.0.5-py3-none-any.whl (55.5 kB view details)

Uploaded Feb 28, 2026 Python 3

File details

Details for the file superslurp-0.0.5.tar.gz.

File metadata

Download URL: superslurp-0.0.5.tar.gz
Upload date: Feb 28, 2026
Size: 63.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for superslurp-0.0.5.tar.gz
Algorithm	Hash digest
SHA256	`72b3a584c30a0dc9d7480796caae76b0bb61832403755882b866bd6b09aeb0d3`
MD5	`86f9995000dbb2c54b4c0d35e53f77fe`
BLAKE2b-256	`c82ab82f2f88a14c200d9daf0e5ba00980ff77e836c44e30280ded931941478c`

See more details on using hashes here.

File details

Details for the file superslurp-0.0.5-py3-none-any.whl.

File metadata

Download URL: superslurp-0.0.5-py3-none-any.whl
Upload date: Feb 28, 2026
Size: 55.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for superslurp-0.0.5-py3-none-any.whl
Algorithm	Hash digest
SHA256	`7b8d47a17b8e94a06ea05568d245dcd77bf9a115184d17a2d4a811248c2d6da2`
MD5	`9bf7f5d2cc9a4ca3ad48ac6a26a47a08`
BLAKE2b-256	`4f0f3839f68f8d26b0fcfac8a04bb445efa07202bcf70664c3dd849f782c77d1`

See more details on using hashes here.

superslurp 0.0.5

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Superslurp

Quick start

Install

Run

Synonyms

Parse example

Aggregate output

Developer guide

Pipeline steps

Python API

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes