Skip to main content

A supermarket receipt parser written in Python using tesseract OCR

Project description

A fuzzy receipt parser written in Python

This is a fuzzy receipt parser written in Python. It extracts information like the shop, the date, and the total from scanned receipts. It can work as a standalone script or as part of our IOS and Android application.

Dependencies

The receipt-parser-core library depend on imagemagick. Please install imagemagick with your favorite package manager.

Usage

To convert all images from the data/img/ folder to text using tesseract and parse the resulting text files, run

make run

Docker

A Dockerfile is available with all dependencies needed to run the program.
To build the image, run

make docker-build

To run it on the sample files, try

make docker-run

By default, running the image will execute the make run command. To use with your own images, run the following:

docker run -v <path_to_input_images>:/usr/src/app/data/img mre0/receipt_parser

History

This project started as a hackathon idea. Read more about it on the trivago techblog. Also read the comments on HackerNews There's also a talk about the project. The library is now available at PyPi.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

receipt_parser_core-0.2.5.tar.gz (12.1 kB view details)

Uploaded Source

Built Distribution

receipt_parser_core-0.2.5-py3-none-any.whl (14.8 kB view details)

Uploaded Python 3

File details

Details for the file receipt_parser_core-0.2.5.tar.gz.

File metadata

  • Download URL: receipt_parser_core-0.2.5.tar.gz
  • Upload date:
  • Size: 12.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.6 CPython/3.9.5 Linux/5.12.4-arch1-2

File hashes

Hashes for receipt_parser_core-0.2.5.tar.gz
Algorithm Hash digest
SHA256 fbe1eecf75f3c7ae19aebf6b510d4314df5ed6b318c912902bb1a59cf76400a8
MD5 fa851a918a7a137091a6f030870f2033
BLAKE2b-256 f553f564340519bc71a27ddba58ed2f20b2dbeb52fc4c2b06ca7a5b09cae8c5d

See more details on using hashes here.

File details

Details for the file receipt_parser_core-0.2.5-py3-none-any.whl.

File metadata

File hashes

Hashes for receipt_parser_core-0.2.5-py3-none-any.whl
Algorithm Hash digest
SHA256 dab82af428e33e56f339873979c22c2c967292dc872594594d2c0d7cf6f85c83
MD5 2664ec2d406e907817e1ef6be81d47bf
BLAKE2b-256 b4bad7635ce2410e2be48f3acb2d818272027c0132e99a2ef0bfab866ae8afe6

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page