Skip to main content

This repository contains a Python program designed to extract Optical Character Recognition (OCR) data from bank statements in Saudi, detect income and classify expenses

Project description

Saudi Bank Statement Extraction

This repository contains a Python program designed to extract data from Saudi based bank statements.

Table of Contents

  1. Introduction
  2. Prerequisites
  3. Usage
  4. Modules Description

Introduction

The Python program imports several packages necessary for extracting data from bank statements. It accepts a pdf in bytes format,checks for tampering, if no tampering is detected converts the first page to a image and runs a custom trained YOLOv5 model to detect the bank. This results in a bank label which then runs the code corresponding to extracting data from that bank statement. The ouput of the extraction is a json file containing the revenues, expenses and cash flows by month.

Prerequisites

Ensure the following packages are installed: setuptools numpy scipy pandas Unidecode DateTime Pillow PyPDF2 Python-IO pdf2image torch scikit-learn pdfplumber google-cloud-translate You can install these packages using pip:

Usage

To use this program, you can clone the repository, place your images in the same directory and modify the IMAGES list accordingly. Run the program in your terminal or command prompt as: python ocr_and_facial_recognition.py

Please note that this program does not include any user interface and does not handle any errors or exceptions beyond what is included in the code.

Modules Description

Importing Necessary Packages: The program begins by importing all the necessary packages used in Bank Statement Extraction.

Data Introduction:

This section defines a list of image file names that will be used as input for the OCR and facial recognition steps of the program.

Load easyocr and Anti-Spoofing Model:

Two functions to load the easyOCR package with English language support and the anti-spoofing model respectively.

Data Preprocessing:

Several functions are defined here to open and read an image file, convert it to grayscale, perform a radon transform, find the busiest rotation, and rotate the image accordingly.

Facial recognition:

This section is dedicated to detecting faces in an image using a HOG (Histogram of Oriented Gradients) face detector, extracting features, and computing the similarity between two sets of features using the cosine similarity metric.

Information Extraction:

Finally, the program uses OCR to extract information from an image, computes the similarity between faces in different images, and outputs this information in a JSON file.

Please refer to the source code comments for more detailed explanations.

This is a basic explanation of the project and its usage. This project was last updated on 24th May 2023 and does not have any GUI or error handling beyond what is included in the code. For more details, please refer to the comments in the source code.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bankstatementextractor_sau-2.4.0.0.tar.gz (13.3 MB view details)

Uploaded Source

Built Distribution

File details

Details for the file bankstatementextractor_sau-2.4.0.0.tar.gz.

File metadata

File hashes

Hashes for bankstatementextractor_sau-2.4.0.0.tar.gz
Algorithm Hash digest
SHA256 9d7e337b520ca4bea92ccdf33831b9fb5dd9ef427440b1d21edf32ba6e3499aa
MD5 350df4b6563b7ed1b5079a4b9b0264a2
BLAKE2b-256 90d1385a4a80d72af32a36ff4f069f6417c1b0ed65166902378787f4d7a69cf8

See more details on using hashes here.

File details

Details for the file bankstatementextractor_sau-2.4.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for bankstatementextractor_sau-2.4.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 663f883250efa1ffa6e13510a577e79c0f4163c97e11bc02e8f7319ffdbff57f
MD5 2eb1fc9cf097acf3e912bf83b656697b
BLAKE2b-256 53aa9cbcde5585ace58124fb46c97f835d36818a16ab1ef7f8c2550d695e1c43

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page