A parser for SuperU receipts

These details have not been verified by PyPI

Project links

Project description

superslurp : Super, Sublime, Light, and Unprecedented Receipt Parser

Parser for SuperU receipts. Take the PDF receipt sent by mail as input and return a json.

Useful when you want to display the instantaneous cheese consumption intensity of your home in € inside grafana.

1. Parse a receipt

from superslurp import parse_superu_receipt

result = parse_superu_receipt("Ticket de caisse_01032022-165652.pdf")

The parser understands the intricacies of French cheese: AOP designation, fermier vs laitier production, milk treatment, as defined by the official AOP specification.

The receipt line REBL.SAVE.AOP.FRM.LC BIO BQT.X12 450G 32%MG 8,61 € 11 is parsed as:

{
  "name": "REBLOCHON",
  "price": 8.61,
  "bought": 1,
  "units": 12,
  "grams": 450.0,
  "volume_ml": null,
  "fat_pct": 32.0,
  "tr": false,
  "way_of_paying": "11",
  "discount": null,
  "properties": {
    "bio": true,
    "milk_treatment": "cru",
    "production": "fermier",
    "label": "AOP",
    "packaging": "BARQUETTE",
    "origin": "SAVOIE"
  }
}

Pass debug=True to include the original receipt line ("raw" field):

from superslurp import parse_superu_receipt

result = parse_superu_receipt("receipt.pdf", debug=True)

The parser ships with built-in synonyms that expand common receipt abbreviations (e.g. TABS → TABLETTES, VAISS → VAISSELLE). You can provide extra synonyms that are merged on top of the defaults:

from superslurp import parse_superu_receipt

extra = {"CUSTOM_ABBREV": "CUSTOM EXPANSION"}
result = parse_superu_receipt("receipt.pdf", synonyms=extra)

CLI:

# Uses built-in synonyms (default)
superu-receipt-parser receipt.pdf

# Merge extra synonyms on top of built-in defaults
superu-receipt-parser receipt.pdf --synonyms extra.json

# Disable built-in synonyms entirely — only use your own file
superu-receipt-parser receipt.pdf --no-default-synonyms --synonyms my_synonyms.json

2. Aggregate receipts

Compare items across multiple parsed receipts. Products are grouped under a canonical name using fuzzy matching (via difflib).

from pathlib import Path

from superslurp.compare.aggregate import compare_receipt_files

result = compare_receipt_files(
    paths=[Path("receipt1.json"), Path("receipt2.json")],
    threshold=0.90,       # difflib threshold (default: 0.90)
)

Synonyms are applied at parse time (step 1), so the JSON files fed to compare_receipt_files already contain expanded names.

The result contains stores, sessions, per-session totals, a rolling weekly average, and products with their observations:

{
  "stores": [{ "id": "123_456", "store_name": "...", "location": "..." }],
  "sessions": [{ "id": 1, "date": "2025-01-15 10:00:00", "store_id": "123_456" }],
  "session_totals": [{ "session_id": 1, "date": "2025-01-15", "total": 42.5 }],
  "rolling_average": [{ "date": "2025-01-13", "value": 85.3 }, "..."],
  "products": [
    {
      "canonical_name": "OEUFS",
      "observations": [
        {
          "original_name": "OEUFS PLEIN AIR MOYEN",
          "session_id": 1,
          "price": 3.15,
          "quantity": 1,
          "grams": null,
          "discount": null,
          "price_per_kg": null,
          "volume_ml": null,
          "price_per_liter": null,
          "unit_count": 12,
          "price_per_unit": 0.2625,
          "bio": true
        }
      ]
    }
  ]
}

CLI:

superu-aggregate-parsed-receipt receipts/ --output aggregate.json

3. Generate an HTML report

From PDFs directly

Parse receipt PDFs and generate a self-contained HTML dashboard in one step:

from pathlib import Path

from superslurp import generate_report

synonyms = {"TABS": "TABLETTES", "VAISS": "VAISSELLE"}
html = generate_report(
    ["receipt1.pdf", "receipt2.pdf", "receipt3.pdf"],
    synonyms=synonyms,    # optional
    threshold=0.90,       # fuzzy matching threshold (default: 0.90)
)
Path("report.html").write_text(html)

CLI:

superu-report receipts/*.pdf --output report.html
superu-report receipts/*.pdf --synonyms extra.json --output report.html

From an existing aggregate JSON

If you already have an aggregate JSON (from step 2):

from pathlib import Path

from superslurp.compare.html_report import generate_html

html = generate_html(aggregate_result)
Path("report.html").write_text(html)

superu-report-from-aggregate aggregate.json --output report.html

Or pipe directly from aggregate:

superu-aggregate-parsed-receipt receipts/ \
  | superu-report-from-aggregate - --output report.html

Synonyms

Synonyms are applied during parsing (step 1) — the aggregate step (step 2) only does fuzzy matching on already-expanded names.

The package ships with built-in synonyms for ~200 common Super U receipt abbreviations. Extra synonyms passed via --synonyms are merged on top (user entries take precedence on conflict). Use --no-default-synonyms to disable the built-in set entirely — this is useful when you need full control over expansion order, since insertion order matters (see below).

Synonyms is an ordered dict[str, str]. Entries are applied sequentially with word-boundary matching — insertion order matters. Earlier entries are replaced first, so later entries won't match words already consumed.

Dots in both names and keys are normalized to spaces before matching, so "FROM.BLC" matches FROM.BLC on the receipt.

synonyms = {
    "FROM.BLC": "FROMAGE BLANC",          # applied 1st: consumes FROM and BLC
    "CHOCO PATIS": "CHOCOLAT PATISSIER",  # applied 2nd: consumes CHOCO and PATIS
    "CHOCO": "CHOCOLAT",                  # applied 3rd: only if CHOCO still present
    "FROM": "FROMAGE",                    # applied 4th: only if FROM still present
    "PATIS": "PATISSERIE",               # applied 5th: only if PATIS still present
}
# "FROM.BLC NAT"         → "FROMAGE BLANC NAT"            (FROM.BLC consumed by 1st)
# "FROM.RAPE"            → "FROMAGE RAPE"                 (FROM consumed by 4th)
# "CHOCO.PATIS.NOIR 52%" → "CHOCOLAT PATISSIER NOIR 52%"  (CHOCO PATIS consumed by 2nd)
# "CHOCO.NOIR"           → "CHOCOLAT NOIR"                (CHOCO consumed by 3rd)

The JSON file is a standard object (key order is preserved since Python 3.7):

{
  "FROM.BLC": "FROMAGE BLANC",
  "CHOCO PATIS": "CHOCOLAT PATISSIER",
  "CHOCO": "CHOCOLAT",
  "FROM": "FROMAGE",
  "PATIS": "PATISSERIE"
}

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.0.5

Feb 28, 2026

This version

0.0.4

Feb 27, 2026

0.0.3

Feb 27, 2026

0.0.2

Feb 23, 2026

0.0.1

Feb 23, 2025

0.0.0

Feb 23, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

superslurp-0.0.4.tar.gz (61.3 kB view details)

Uploaded Feb 27, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

superslurp-0.0.4-py3-none-any.whl (53.6 kB view details)

Uploaded Feb 27, 2026 Python 3

File details

Details for the file superslurp-0.0.4.tar.gz.

File metadata

Download URL: superslurp-0.0.4.tar.gz
Upload date: Feb 27, 2026
Size: 61.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for superslurp-0.0.4.tar.gz
Algorithm	Hash digest
SHA256	`f5974b909be844349d07e5d4b0c059193a8441621b92b6bc84eab0a3b06ae64f`
MD5	`0a115f42b72919f2a7b003de86ee0e24`
BLAKE2b-256	`e23050da7ed05a41a9ff1a3c278ba6afe108bfaa553dbe35864849ad84a21be0`

See more details on using hashes here.

File details

Details for the file superslurp-0.0.4-py3-none-any.whl.

File metadata

Download URL: superslurp-0.0.4-py3-none-any.whl
Upload date: Feb 27, 2026
Size: 53.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for superslurp-0.0.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`553d1e66405a314d0d3d7902afea342a974eedb5e2f99479f3d6a1c1bf22efee`
MD5	`7c4164101eba37f1169c6afd66f5bf2d`
BLAKE2b-256	`4ea8cb66277394ce22e0e47d6aa55f2ab03b1451b26d1f87997d62c35656edee`

See more details on using hashes here.

superslurp 0.0.4

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

superslurp : Super, Sublime, Light, and Unprecedented Receipt Parser

1. Parse a receipt

2. Aggregate receipts

3. Generate an HTML report

From PDFs directly

From an existing aggregate JSON

Synonyms

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes