Receipt and bill parser using OCR
Project description
receiptparser
Summary
A receipt and bill parser written in Python. Can be used as a Python module or CLI tool.
It was originally based on receipt-parser, but has effectively been completely rewritten/replaced.
So far, only German receipts are supported, but other countries can be added using a simple YAML configuration file.
Installation
pip3 install receiptparser
CLI Usage
A simple example to read all images (.jpg) from a directory and print the recognized data to stdout:
receiptparser tests/data/germany/img/
You can customize the output as follows:
receiptparser -v0 --format "{date:%Y-%m-%d} - {market} - {postal} - {sum}.jpg" tests/data/germany/img/
In this case, -v0
suppresses any output, except for what you specify in the --format FORMAT
parameter. FORMAT is a Python format string as specified here.
The following values can be used in the format string:
- market: The recognized name of the business
- postal: The recognized postal code of the business
- date: The recognized date of the bill or receipt
- sum: The dollar (or Euro, or other currency) amount of the bill or receipt
Syntax
usage: receiptparser [-h] [-c CONFIG] [--config-file CONFIG_FILE] [-s] [-t TESSERACT] [-f FORMAT] [-v {0,1,2}] input
positional arguments:
input file or directory from which images will be read
optional arguments:
-h, --help show this help message and exit
-c CONFIG, --config CONFIG
built-in config to use
--config-file CONFIG_FILE
like -c, but point to a file instead
-t TESSERACT, --tesseract TESSERACT
output directory for OCR recognized text (default is to discard)
-f FORMAT, --format FORMAT
format of the recognized output. default is pretty-printing
-v {0,1,2}, --verbosity {0,1,2}
increase output verbosity
Python usage
from receiptparser.config import read_config
from receiptparser.parser import process_receipt
config = read_config('my_config.yml')
receipt = process_receipt(config, "my_receipt.jpg", out_dir=None, verbosity=0)
print("Filename: ", receipt.filename)
print("Market: ", receipt.market)
print("Postal code:", receipt.postal)
print("Date: ", receipt.date)
print("Amount: ", receipt.sum)
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Hashes for receiptparser-1.0.4-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a0500e4692f95c83b84ea286ef35e288548fcb94b43e3afc36dfa31df70e4b8a |
|
MD5 | 4aaef0f3e322e29ae43ec6892774bfc5 |
|
BLAKE2b-256 | 7aedaf893f13a5dc3e9bbe012741fca6a05c40d586c8e7fc14154ed170ecb8ac |