Process and analyze receipt data
Project description
Receipt Processor
Receipt Processor is a Python package for extracting, structuring, and analyzing receipt data. It combines OCR, database storage, budget tracking, and a Streamlit interface so you can convert receipt images into searchable spending records.
What it does
- Extracts text from receipt images using EasyOCR and using that text for LLM parsing
- Stores receipts and item details in SQLite databases
- Provides spending analysis by total, month, category, and vendor
- Supports budget tracking for monthly and category budgets
- Includes a Streamlit app for upload, review, and manual correction
Key Features
- OCR extraction for receipt images and PDFs
- LLM-guided item categorization with a preset category vocabulary
- Database ingestion using
receipt_processor.db_ingest - Reporting and analytics using
receipt_processor.db_queries - Interactive app in
app.pyfor visual review and budget monitoring
Installation
Install from PyPI
When published, you can install Receipt Processor directly from PyPI:
pip install receipt-processor
Install from source
For development or the latest repository version, use uv to create an environment and install dependencies:
pip install uv
cd /path/to/Receipt-Processor
uv sync
After uv sync completes, activate the created virtual environment:
# Windows
.venv\Scripts\activate
# macOS/Linux
source .venv/bin/activate
Quick Start
Run the Streamlit interface:
streamlit run app.py
Then provide database paths such as:
data/receipts.dbdata/budget.db
The app can create these databases automatically.
Basic Usage
Extract OCR text
from receipt_processor.ocr_utils import extract_text_from_image
text = extract_text_from_image("receipts/my_receipt.png")
print(text)
Initialize databases
from receipt_processor.db_ingest import initialize_database, initialize_budget_database
initialize_database(db_path="data/receipts.db")
initialize_budget_database(db_path="data/budget.db")
Insert receipt data
from receipt_processor.db_ingest import insert_receipt
receipt_id = insert_receipt(
vendor="Walmart",
date="2024-01-15",
time="14:30:00",
total_amount=45.99,
tax_amount=3.50,
items=[
{"item_name": "Milk", "price": 3.99, "category": "Grocery"},
{"item_name": "Bread", "price": 2.49, "category": "Grocery"},
],
db_path="data/receipts.db"
)
Query spending
from receipt_processor.db_queries import get_total_spending, get_category_breakdown
print(get_total_spending(db_path="data/receipts.db"))
print(get_category_breakdown(db_path="data/receipts.db"))
Notes
- Automatic categorization uses an LLM prompt with a fixed set of categories.
- Batch receipt processing is designed for simple single-page receipts without refunds.
- Complex receipts are safer to process manually using the insert workflow.
Documentation
A more detailed tutorial is available in tutorial.qmd and via the generated GitHub Pages site. The repository also includes API documentation in api.qmd.
Links
- Documentation source:
api.qmd - Tutorial source:
tutorial.qmd - Report source:
technical-report.qmd - Streamlit app: run
streamlit run app.py - GitHub repository: https://github.com/Xapamma/Receipt-Processor
Repository Layout
app.py— Streamlit front-end for receipt upload and reviewcat_try.py— category classification and LLM prompt logicocr_png_to_text.py— OCR extraction helperssrc/receipt_processor/— core package implementationscripts/— example scripts for extraction, ingestion, and validation_quarto.yml— website publishing configurationpyproject.toml— project dependencies and metadata
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file receipt_processor-0.1.1.tar.gz.
File metadata
- Download URL: receipt_processor-0.1.1.tar.gz
- Upload date:
- Size: 16.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.22 {"installer":{"name":"uv","version":"0.9.22","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9835d36b9335e0e6cdfe39dcc64aa0291f571d16171afa10908a5051e98c9e55
|
|
| MD5 |
8475f28456f7e5cc5ce2aa2aa3766107
|
|
| BLAKE2b-256 |
2b7999607e39818732996a754eb761143e0abd237facd6088a1fa1b7236a2bca
|
File details
Details for the file receipt_processor-0.1.1-py3-none-any.whl.
File metadata
- Download URL: receipt_processor-0.1.1-py3-none-any.whl
- Upload date:
- Size: 17.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.22 {"installer":{"name":"uv","version":"0.9.22","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
282a9f410af3a8a5a31bc3b2bdc4e29f76b3879bd3e40c279ebd296051a68803
|
|
| MD5 |
d70212d215280b147292e31ff546c0ef
|
|
| BLAKE2b-256 |
97bddae93adc8db57d4665b8caa2a6868f6b1e946de893b37a9404f3366a7205
|