A package to extract details from BML transfer receipts.
Project description
BML OCR
A package to extract details from BML transfer receipts.
Installation
pip install bml-ocr
Usage
from bml_ocr.extract import extract_receipt_data
from bml_ocr.receipt_model import ReceiptModel
with open('datasets/receipt_1.jpg', 'rb') as f:
receipt: ReceiptModel = extract_receipt_data(f.read())
print(receipt)
How it works
The extract_receipt_data
function is where OCR and data extraction takes place. This function returns a ReceiptModel.
1. Text recognition using EasyOCR
- The
readtext
method will take in Image data as bytes and returns the result set containing a list of tuples. Eg.
(
# Border positions of the text boxes
[
[47, 689],
[184, 689],
[184, 731],
[47, 731]
],
# Text recognized
'Message',
# Confidence level
0.9999883302470524
)
2. Finding the 'Messages' keyword from the OCR results
- In order to find this, I calculate the Lavenshtein distance from the result set from Step 1 for the 'Messages' keyword to get the most probable result.
- This result is used to find the y axis value of the text border used in Step 3
3. Detecting all the gray lines
- This is the separater between text sections.
- Finding these involves looping through the pixels in the y axis and checking whether the background is white or not.
- The loop will start from the y value from Step 2 which helps in retieving the gray line above 'Reference' section.
- If the background is not white then we check horizontally if its constant. Those that are constant is returned as gray lines.
4. Extracting relevant data for sections.
- Now that we have the gray line positions and the results from OCR, we use this to categorize the data into its respective sections based on the gray line above and below the section.
- The categorized data is then mapped to a
ReceiptModel
as follows:
ReceiptModel(
reference_number='BLAZ876699558640',
transaction_date='25/05/2024 15.20',
from_user='NAFFAH ARASHEED',
to_user='Haisham',
to_account='7730000203614',
amount='MVR 1.00',
remarks='Lorem ipsum dolor sit amet amegakure hokage shinobi'
)
Credits
- @nishaalnaseer: For his original implementation of finding gray lines BML-OCR
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
bml_ocr-0.1.1.tar.gz
(4.0 kB
view details)
Built Distribution
File details
Details for the file bml_ocr-0.1.1.tar.gz
.
File metadata
- Download URL: bml_ocr-0.1.1.tar.gz
- Upload date:
- Size: 4.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.11.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b9dd4c01831dddd63b33aef129d4182f6b08b788ff76debbd22700f72b4e4f2c |
|
MD5 | 6a366c036fbe446c1d45bf7de532ca61 |
|
BLAKE2b-256 | 304d909d9a53aefd58c387859970794cb596897584835a01936cb69b0614ba2e |
File details
Details for the file bml_ocr-0.1.1-py3-none-any.whl
.
File metadata
- Download URL: bml_ocr-0.1.1-py3-none-any.whl
- Upload date:
- Size: 4.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.11.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3d121eb3ddb4223a2f93fec94e5ca86fcb4276d653d558c07159ec54a43575e4 |
|
MD5 | d40b34751b426949c95745ceb5b8730a |
|
BLAKE2b-256 | 29b42bf9cd98cb27120268e6e387f08d7b73cee6d1b3296b66930229480e9beb |