Skip to main content

Receipt and bill parser using OCR

Project description

receiptparser

Build Status Coverage Status Code Climate Documentation Status

Summary

A receipt and bill parser written in Python. Can be used as a Python module or CLI tool.

It was originally based on receipt-parser, but has effectively been completely rewritten/replaced.

So far, only German receipts are supported, but other countries can be added using a simple YAML configuration file.

Installation

pip3 install receiptparser

CLI Usage

A simple example to read all images (.jpg) from a directory and print the recognized data to stdout:

receiptparser tests/data/germany/img/

You can customize the output as follows:

receiptparser -v0 --format "{date:%Y-%m-%d} - {market} - {postal} - {sum}.jpg" tests/data/germany/img/

In this case, -v0 suppresses any output, except for what you specify in the --format FORMAT parameter. FORMAT is a Python format string as specified here. The following values can be used in the format string:

  • market: The recognized name of the business
  • postal: The recognized postal code of the business
  • date: The recognized date of the bill or receipt
  • sum: The dollar (or Euro, or other currency) amount of the bill or receipt

Syntax

usage: receiptparser [-h] [-c CONFIG] [--config-file CONFIG_FILE] [-s] [-t TESSERACT] [-f FORMAT] [-v {0,1,2}] input

positional arguments:
  input                 file or directory from which images will be read

optional arguments:
  -h, --help            show this help message and exit
  -c CONFIG, --config CONFIG
                        built-in config to use
  --config-file CONFIG_FILE
                        like -c, but point to a file instead
  -s, --sharpen         whether to sharpen the image before OCR
  -t TESSERACT, --tesseract TESSERACT
                        output directory for OCR recognized text (default is to discard)
  -f FORMAT, --format FORMAT
                        format of the recognized output. default is pretty-printing
  -v {0,1,2}, --verbosity {0,1,2}
                        increase output verbosity

Python usage

from receiptparser.config import read_config
from receiptparser.parser import process_receipt

config = read_config('my_config.yml')
receipt = process_receipt(config, "my_receipt.jpg", sharpen=False, out_dir=None, verbosity=0)

print("Filename:   ", receipt.filename)
print("Market:     ", receipt.market)
print("Postal code:", receipt.postal)
print("Date:       ", receipt.date)
print("Amount:     ", receipt.sum)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

receiptparser-1.0.3-py2.py3-none-any.whl (10.5 kB view hashes)

Uploaded Python 2 Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page