Skip to main content

A package to extract details from BML transfer receipts.

Project description

BML OCR

A package to extract details from BML transfer receipts.

Installation

pip install bml-ocr

Usage

from bml_ocr.extract import extract_receipt_data
from bml_ocr.receipt_model import ReceiptModel

with open('datasets/receipt_1.jpg', 'rb') as f:
    receipt: ReceiptModel = extract_receipt_data(f.read())
    print(receipt)

How it works

The extract_receipt_data function is where OCR and data extraction takes place. This function returns a ReceiptModel.

1. Text recognition using EasyOCR

  • The readtext method will take in Image data as bytes and returns the result set containing a list of tuples. Eg.
(
    # Border positions of the text boxes
    [
      [47, 689],
      [184, 689],
      [184, 731],
      [47, 731]
    ],
    # Text recognized
    'Message',
    # Confidence level
    0.9999883302470524
)

2. Finding the 'Messages' keyword from the OCR results

  • In order to find this, I calculate the Lavenshtein distance from the result set from Step 1 for the 'Messages' keyword to get the most probable result.
  • This result is used to find the y axis value of the text border used in Step 3

3. Detecting all the gray lines

  • This is the separater between text sections.
  • Finding these involves looping through the pixels in the y axis and checking whether the background is white or not.
  • The loop will start from the y value from Step 2 which helps in retieving the gray line above 'Reference' section.
  • If the background is not white then we check horizontally if its constant. Those that are constant is returned as gray lines.

4. Extracting relevant data for sections.

  • Now that we have the gray line positions and the results from OCR, we use this to categorize the data into its respective sections based on the gray line above and below the section.
  • The categorized data is then mapped to a ReceiptModel as follows:
ReceiptModel(
    reference_number='BLAZ876699558640',
    transaction_date='25/05/2024 15.20',
    from_user='NAFFAH ARASHEED',
    to_user='Haisham',
    to_account='7730000203614',
    amount='MVR 1.00',
    remarks='Lorem ipsum dolor sit amet amegakure hokage shinobi'
)

Credits

  • @nishaalnaseer: For his original implementation of finding gray lines BML-OCR

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bml_ocr-0.1.1.tar.gz (4.0 kB view details)

Uploaded Source

Built Distribution

bml_ocr-0.1.1-py3-none-any.whl (4.8 kB view details)

Uploaded Python 3

File details

Details for the file bml_ocr-0.1.1.tar.gz.

File metadata

  • Download URL: bml_ocr-0.1.1.tar.gz
  • Upload date:
  • Size: 4.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.11.5

File hashes

Hashes for bml_ocr-0.1.1.tar.gz
Algorithm Hash digest
SHA256 b9dd4c01831dddd63b33aef129d4182f6b08b788ff76debbd22700f72b4e4f2c
MD5 6a366c036fbe446c1d45bf7de532ca61
BLAKE2b-256 304d909d9a53aefd58c387859970794cb596897584835a01936cb69b0614ba2e

See more details on using hashes here.

File details

Details for the file bml_ocr-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: bml_ocr-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 4.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.11.5

File hashes

Hashes for bml_ocr-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 3d121eb3ddb4223a2f93fec94e5ca86fcb4276d653d558c07159ec54a43575e4
MD5 d40b34751b426949c95745ceb5b8730a
BLAKE2b-256 29b42bf9cd98cb27120268e6e387f08d7b73cee6d1b3296b66930229480e9beb

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page