Skip to main content

Bangla English Mixed Languge Scene / printed OCR Toolkit

Project description

apsis-ocr

Apsis-OCR is a Mixed language ocr system for Printed Documents developed at Apsis Solutions limited

The full system is build with 3 components:

  • Text detection : DBNet
  • Text recognition:
    • Bangla Text : ApsisNet
      • ApsisNet is a model developed at Apsis Solutions Limited.
      • It is used by bbOCR as the recognition model
      • ApsisNet is found to be the best among other available recognition models (such as tesseract and easyOCR) in the linked paper
    • English Text : SVTR-LCNet
  • Text classification : DenseNet121

Installation

As module/pypi package

cpu installation

pip install apsisocr
pip install onnxruntime
pip install fastdeploy-python -f https://www.paddlepaddle.org.cn/whl/fastdeploy.html

gpu installation

It is recommended to use conda environment . Specially for GPU.

  • installing cudatoolkit and cudnn:
conda install cudatoolkit
conda install cudnn
  • installing packages
pip install apsisocr
pip install onnxruntime-gpu
python -m pip install -U fastdeploy-gpu-python -f https://www.paddlepaddle.org.cn/whl/fastdeploy.html
  • exporting environment variables
mkdir -p $CONDA_PREFIX/etc/conda/activate.d
echo 'export LD_LIBRARY_PATH=$CUDNN_PATH/lib:$CONDA_PREFIX/lib/:$LD_LIBRARY_PATH' >> $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh

Building from source : Linux/Ubuntu

It is recommended to use conda environment .

  • clone the repository :
git clone https://github.com/mnansary/apsisOCR.git
cd apsisOCR
  • create a conda environment:
conda create -n apsisocr python=3.9
  • activate conda environment:
conda activate apsisocr
  • cpu installation :
bash install.sh cpu
  • gpu installation :
bash install.sh gpu

Useage

Apsisnet : Bangla Recognizer

  • useage
from apsisocr import ApsisNet
bnocr=ApsisNet()
bnocr.infer(crops)
  • docstring for ApsisNet.infer
"""
Perform inference on image crops.

Args:
    crops (list[np.ndarray]): List of image crops.
    batch_size (int): Batch size for inference (default: 32).
    normalize_unicode (bool): Flag to normalize unicode (default: True).

Returns:
    list[str]: List of inferred texts.
"""

SVTR-LCNet : English Recognizer

from apsisocr import SVTRLCNet
enocr=SVTRLCNet()
enocr.infer(crops)
  • docstring for SVTRLCNet.infer
"""
Perform inference on image crops.

Args:
    crops (list[np.ndarray]): List of image crops.
    batch_size (int): Batch size for inference.

Returns:
    list[str]: List of recognized texts.
"""

DenseNet121BnEnClassifier : Language classifier

from apsisocr import DenseNet121BnEnClassifier
lang=DenseNet121BnEnClassifier()
lang.infer(crops)
  • docstring for DenseNet121BnEnClassifier.infer
"""
Perform inference on image crops.

Args:
    crops (list[np.ndarray]): List of image crops.
    batch_size (int): Batch size for inference (default: 32).

Returns:
    list[str]: List of inferred languages.
"""

PaddleDBNet : Text Detector

  • check paddleOCR official website for better understanding of the model
# initialization
from apsisocr import PaddleDBNet
detector=PaddleDBNet()
# getting word boxes
word_boxes=detector.get_word_boxes(img)
# getting line boxes
line_boxes=detector.get_line_boxes(img)
# getting crop with either of the results
crops=detector.get_crops(img,word_boxes)

ApsisOCR : Overall System

from apsisocr import ApsisOCR
ocr=ApsisOCR()
results=ocr(img_path)
  • docstring for ApsisOCR.__call__
"""
Perform OCR on an image.

Args:
    img_path (str): Path to the image file.

Returns:
    dict: OCR results containing recognized text and associated information. The dictionary has the following structre
            {
            "text" : multiline text with newline separators
            "result" : list a dictionaries that contains the following structre:
                        {
                        "line_no" : the line number of the word
                        "word_no" : the word number in the line 
                        "poly"    : the four point polygonal bounding box of the word in the image
                        "text"    : the recognized text 
                        "lang"    : the classified language code
                        }
            }
"""  

check useage/useage.ipynb for examples

Deployment

  • cd deployment: change directory to deployment folder
  • change the configs as required in config.py
# This port will be used by the api_ocr.py 
OCR_API_PORT=3032
# This api address is diplayed after deploying api_ocr.py and this is used in app.py  
OCR_API="http://172.20.4.53:3032/ocr"
  • running the api and app:
python api_ocr.py # deploys at port 3032 by defautl
streamlit run app.py --server.port 3033 # deploys streamlit built frontend at 3033 port
  • The api_ocr.py lets the api to be used without any UI (a postman screenshot is attached below)

  • The app.py runs a streamlit UI

TESTED GPU INFERENCE SERVER CONFIG

OS          : Ubuntu 20.04.6 LTS      
Memory      : 62.4 GiB 
Processor   : Intel® Xeon(R) Silver 4214R CPU @ 2.40GHz × 24    
Graphics    : NVIDIA RTX A6000/PCIe/SSE2
Gnome       : 3.36.8

License

Contents of this repository are restricted to non-commercial research purposes only under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (CC BY-NC-SA 4.0).

Creative Commons License

Change Log

0.0.1 (24/09/2023)

  • added useage and examples

0.0.2 (24/09/2023)

  • added liscencing

0.0.3 (05/11/2023)

  • added base application without line and word number

0.0.4 (01/05/2024)

  • apsisbnocr

0.0.5 (22/11/2024)

  • added Null handling to apsisbnocr

0.0.6 (22/11/2024)

  • added Null handling to apsisbnocr - output typo update

0.0.7 (22/11/2024)

  • BNbaseOCR -> location and text only

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

apsisocr-0.0.7-py3-none-any.whl (22.6 kB view details)

Uploaded Python 3

File details

Details for the file apsisocr-0.0.7-py3-none-any.whl.

File metadata

  • Download URL: apsisocr-0.0.7-py3-none-any.whl
  • Upload date:
  • Size: 22.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.13

File hashes

Hashes for apsisocr-0.0.7-py3-none-any.whl
Algorithm Hash digest
SHA256 a8d4b8596c0336baa43b31db99ba6775eae72bbe2315fcaf8f13acff5e362f56
MD5 e03d3d54cc1864458ccfa7ac38ec1743
BLAKE2b-256 9a351e2b960548f439589b840acc9bddbba9a0e59697de11a79f9ca683871217

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page