Skip to main content

Process and analyze receipt data

Project description

Receipt Processor

Receipt Processor is a Python package for extracting, structuring, and analyzing receipt data. It combines OCR, database storage, budget tracking, and a Streamlit interface so you can convert receipt images into searchable spending records.

What it does

  • Extracts text from receipt images using EasyOCR and using that text for LLM parsing
  • Stores receipts and item details in SQLite databases
  • Provides spending analysis by total, month, category, and vendor
  • Supports budget tracking for monthly and category budgets
  • Includes a Streamlit app for upload, review, and manual correction

Key Features

  • OCR extraction for receipt images and PDFs
  • LLM-guided item categorization with a preset category vocabulary
  • Database ingestion using receipt_processor.db_ingest
  • Reporting and analytics using receipt_processor.db_queries
  • Interactive app in app.py for visual review and budget monitoring

Installation

Install from PyPI

When published, you can install Receipt Processor directly from PyPI:

pip install receipt-processor

Install from source

For development or the latest repository version, use uv to create an environment and install dependencies:

pip install uv
cd /path/to/Receipt-Processor
uv sync

After uv sync completes, activate the created virtual environment:

# Windows
.venv\Scripts\activate

# macOS/Linux
source .venv/bin/activate

Quick Start

Run the Streamlit interface:

streamlit run app.py

Then provide database paths such as:

  • data/receipts.db
  • data/budget.db

The app can create these databases automatically.

Basic Usage

Extract OCR text

from receipt_processor.ocr_utils import extract_text_from_image
text = extract_text_from_image("receipts/my_receipt.png")
print(text)

Initialize databases

from receipt_processor.db_ingest import initialize_database, initialize_budget_database
initialize_database(db_path="data/receipts.db")
initialize_budget_database(db_path="data/budget.db")

Insert receipt data

from receipt_processor.db_ingest import insert_receipt
receipt_id = insert_receipt(
    vendor="Walmart",
    date="2024-01-15",
    time="14:30:00",
    total_amount=45.99,
    tax_amount=3.50,
    items=[
        {"item_name": "Milk", "price": 3.99, "category": "Grocery"},
        {"item_name": "Bread", "price": 2.49, "category": "Grocery"},
    ],
    db_path="data/receipts.db"
)

Query spending

from receipt_processor.db_queries import get_total_spending, get_category_breakdown
print(get_total_spending(db_path="data/receipts.db"))
print(get_category_breakdown(db_path="data/receipts.db"))

Notes

  • Automatic categorization uses an LLM prompt with a fixed set of categories.
  • Batch receipt processing is designed for simple single-page receipts without refunds.
  • Complex receipts are safer to process manually using the insert workflow.

Documentation

A more detailed tutorial is available in tutorial.qmd and via the generated GitHub Pages site. The repository also includes API documentation in api.qmd.

Links

Repository Layout

  • app.py — Streamlit front-end for receipt upload and review
  • cat_try.py — category classification and LLM prompt logic
  • ocr_png_to_text.py — OCR extraction helpers
  • src/receipt_processor/ — core package implementation
  • scripts/ — example scripts for extraction, ingestion, and validation
  • _quarto.yml — website publishing configuration
  • pyproject.toml — project dependencies and metadata

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

receipt_processor-0.1.1.tar.gz (16.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

receipt_processor-0.1.1-py3-none-any.whl (17.0 kB view details)

Uploaded Python 3

File details

Details for the file receipt_processor-0.1.1.tar.gz.

File metadata

  • Download URL: receipt_processor-0.1.1.tar.gz
  • Upload date:
  • Size: 16.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.22 {"installer":{"name":"uv","version":"0.9.22","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for receipt_processor-0.1.1.tar.gz
Algorithm Hash digest
SHA256 9835d36b9335e0e6cdfe39dcc64aa0291f571d16171afa10908a5051e98c9e55
MD5 8475f28456f7e5cc5ce2aa2aa3766107
BLAKE2b-256 2b7999607e39818732996a754eb761143e0abd237facd6088a1fa1b7236a2bca

See more details on using hashes here.

File details

Details for the file receipt_processor-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: receipt_processor-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 17.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.22 {"installer":{"name":"uv","version":"0.9.22","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for receipt_processor-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 282a9f410af3a8a5a31bc3b2bdc4e29f76b3879bd3e40c279ebd296051a68803
MD5 d70212d215280b147292e31ff546c0ef
BLAKE2b-256 97bddae93adc8db57d4665b8caa2a6868f6b1e946de893b37a9404f3366a7205

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page