Watch a scans folder and auto-rename receipt images/PDFs (and course/document scans) from their OCR'd text.

These details have not been verified by PyPI

Project description

receipt-renamer

Point it at the folder your scanner app drops files into. For every receipt image/PDF it OCRs the page and renames it to something searchable:

Receipt 2024-03-14 1042 Safeway SNAP EBT Adobe Scan.jpg

Non-receipt scans (lecture notes, handouts) are handled by a separate rule layer: it reads the page-top title and expands course codes into every form you might later search for, so CHE2A Midterm Review.pdf becomes:

CHE2A Midterm Review CHE 2A Chem 2A Genius Scan.pdf

and OChem Lecture Notes.pdf becomes:

OChem Lecture Notes Organic Chemistry CHE8A CHE 8A Chem 8A.pdf

Everything — the store list, the SNAP/EBT patterns, the receipt heuristics, the scanner-app markers, the course rules, and the filename templates — lives in one YAML file you can edit.

How the name is built

Receipts use the template:

Receipt {datetime} {store} {SNAP EBT if present} {scanner-app} {your notes}

{datetime} — parsed from the receipt (YYYY-MM-DD HHMM); many date/time formats are understood (US MM/DD/YYYY, ISO YYYY-MM-DD, Jan 5, 2024, 12h/24h times). If no date is found it falls back to Receipt {store} ….
{store} — matched against a known-store list (Safeway, Costco, Trader Joe's, Whole Foods, Target, Walmart, Kroger, CVS, Walgreens, Aldi, Sprouts, and more — all editable).
SNAP EBT — inserted only when a SNAP/EBT line is detected.
{scanner-app} — detected from the OCR text or the original filename (Genius Scan, Adobe Scan, CamScanner, Microsoft Lens, …).
{your notes} — anything you pass with --notes.

Empty fields collapse cleanly — no double spaces, no dangling separators.

Documents (anything not detected as a receipt) use:

{title} {course-code aliases} {scanner-app} {your notes}

Install

Requires the Tesseract OCR engine on your PATH, plus poppler if you want to OCR PDFs.

# macOS
brew install tesseract poppler
# Debian/Ubuntu
sudo apt-get install tesseract-ocr poppler-utils

pip install -e .

Usage

# Dry run over a folder (prints planned renames, changes nothing):
receipt-renamer batch ~/Scans

# Apply the renames:
receipt-renamer batch ~/Scans --commit

# Move renamed files into a tidy archive instead of renaming in place:
receipt-renamer batch ~/Scans --commit --dest ~/Receipts

# One file, with a note:
receipt-renamer one ~/Scans/IMG_0001.jpg --commit --notes "reimburse work"

# Watch the folder and rename new scans as they arrive:
receipt-renamer watch ~/Scans --commit

# Print the default config so you can copy and tweak it:
receipt-renamer dump-config > my-rules.yaml
receipt-renamer batch ~/Scans --config my-rules.yaml --commit

Dry run is the default. Nothing is renamed until you pass --commit.

Configuration

Run receipt-renamer dump-config to see the full annotated default. Highlights:

stores — canonical name + aliases/spellings. Whole-word matching, so Target won't match "targeting". Longest matching alias wins.
snap_patterns — regexes that flag a SNAP/EBT receipt.
receipt_signals / receipt_min_signals — a page is a receipt if a known store is found, or if it hits at least this many signals (TOTAL, TAX, $x.xx, card brands, …). This catches receipts from stores not in your list.
courses — both a generic DEPT + number pattern (with a subject_map so CHE → Chem) and explicit rules (OChem → Organic Chemistry + the CHE8A/CHE 8A/Chem 8A family).
templates / datetime_format — the output filename shapes.

Architecture

The core operates entirely on text, never on pixels, so it's fast to test and the OCR backend is swappable:

module	responsibility
`config`	load + validate the YAML rule table
`ocr`	pluggable OCR (`OcrFn`); default backend = Tesseract via pytesseract / pdf2image
`stores`	whole-word store recognition
`receipts`	date/time parsing + SNAP/EBT detection
`courses`	course-code expansion
`classify`	receipt vs document decision
`rename`	filename assembly, sanitization, collision-safe targets (pure)
`processor`	OCR → plan → (optional) rename, never throws on a bad file
`watcher`	watchdog folder watcher with a write-settle delay
`cli`	`batch` / `one` / `watch` / `dump-config`

Tests

pip install -e ".[test]"
pytest

The bulk of the suite runs against saved OCR-text fixtures (in tests/fixtures/) and a mocked OCR function, so it needs no Tesseract. tests/test_ocr_end_to_end.py renders a synthetic receipt with Pillow and runs the real Tesseract pipeline; it skips automatically if the binary is absent.

Notes & limitations

OCR quality is bounded by Tesseract and the scan. Faded thermal receipts and skewed photos will parse worse; the heuristics are deliberately forgiving.
Date parsing favours the first plausible date on the page. Very unusual layouts may pick the wrong one — review a dry run before --commit.
For PDFs, only the first couple of pages are OCR'd (configurable via --pdf-pages).

License

MIT — see LICENSE.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.0

Jul 4, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

receipt_renamer-0.1.0.tar.gz (24.6 kB view details)

Uploaded Jul 4, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

receipt_renamer-0.1.0-py3-none-any.whl (21.7 kB view details)

Uploaded Jul 4, 2026 Python 3

File details

Details for the file receipt_renamer-0.1.0.tar.gz.

File metadata

Download URL: receipt_renamer-0.1.0.tar.gz
Upload date: Jul 4, 2026
Size: 24.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for receipt_renamer-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`1dfd727710f85abccae74e1170c4ef2a45bd58dba3bc41ef3139d8e07d12ef4b`
MD5	`71fd3458e8cfa64b75d45c7f3932adda`
BLAKE2b-256	`840623ba9cbff9486a61bbc25c79c12f4617a37e45890eb39342690b71ed0a83`

See more details on using hashes here.

File details

Details for the file receipt_renamer-0.1.0-py3-none-any.whl.

File metadata

Download URL: receipt_renamer-0.1.0-py3-none-any.whl
Upload date: Jul 4, 2026
Size: 21.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for receipt_renamer-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`8d82708b17485adc5cb3232db5f3cf908118f24bc2c256297bbbd93c3be8a266`
MD5	`6d6340ad939322ef00a527a21e89029c`
BLAKE2b-256	`2fc03bf96af7144438d807c74c1590b4bab82fd21b9e605dfefb8478a8026b64`

See more details on using hashes here.

receipt-renamer 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

receipt-renamer

How the name is built

Install

Usage

Configuration

Architecture

Tests

Notes & limitations

License

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes