Skip to main content

akaOCR Package Tools

Project description

akaOCR

License Python 3.7 ONNX Compatible Colab

Description

This package is compatible with akaOCR for ocr pipeline program (Text Detection, Text Recognition & Text Rotation), using ONNX format model (CPU & GPU speed can be x2 Times Faster). This code is referenced from this awesome repo.

Features

Feature 1: Text Detection.

from akaocr import BoxEngine
import cv2

img_path = "path/to/image.jpg"
image = cv2.imread(img_path)

# side_len: minimum inference image size
box_engine = BoxEngine(model_path: Any | None = None,
                        side_len: int | None = None,
                        conf_thres: float = 0.5
                        mask_thes: float = 0.4,
                        unclip_ratio: float = 2.0,
                        max_candidates: int = 1000,
                        device: str = 'cpu | gpu')

# inference for one image
results = box_engine(image) # [np.array([4 points], dtype=np.float32),...]

Feature 2: Text Recognition.

from akaocr import TextEngine
import cv2

img_path = "path/to/cropped_image.jpg"
cropped_image = cv2.imread(img_path)

text_engine = TextEngine(model_path: Any | None = None,
                        vocab_path: Any | None = None,
                        use_space_char: bool = True,
                        batch_sizes: int = 32,
                        model_shape: list = [3, 48, 320],
                        max_wh_ratio: int = None,
                        device: str = 'cpu | gpu')

# inference for one or more images
results = text_engine(cropped_image) # [(text, conf),...]

Feature 3: Text Rotation.

from akaocr import ClsEngine
import cv2

img_path = "path/to/cropped_image.jpg"
cropped_image = cv2.imread(img_path)

rotate_engine = ClsEngine(model_path: Any | None = None,
                        conf_thres: float = 0.75
                        device: str='cpu | gpu')

# classify input image as 0 or 180 degrees
# inference for one or more images
results = rotate_engine(cropped_image) # [(lable (0|180), conf),...]

Usage

  1. Perspective Transform Image.
import numpy as np

def transform_image(image, box):
    # Get perspective transform image

    assert len(box) == 4, "Shape of points must be 4x2"
    img_crop_width = int(
        max(
            np.linalg.norm(box[0] - box[1]),
            np.linalg.norm(box[2] - box[3])))
    img_crop_height = int(
        max(
            np.linalg.norm(box[0] - box[3]),
            np.linalg.norm(box[1] - box[2])))
    pts_std = np.float32([[0, 0], 
                        [img_crop_width, 0],
                        [img_crop_width, img_crop_height],
                        [0, img_crop_height]])
    M = cv2.getPerspectiveTransform(box, pts_std)
    dst_img = cv2.warpPerspective(
        image,
        M, (img_crop_width, img_crop_height),
        borderMode=cv2.BORDER_REPLICATE,
        flags=cv2.INTER_CUBIC)
    
    img_height, img_width = dst_img.shape[0:2]
    if img_height/img_width >= 1.5:
            dst_img = np.rot90(dst_img, k=3)

    return dst_img
  1. OCR Pipeline.
from akaocr import TextEngine
from akaocr import BoxEngine

import cv2

box_engine = BoxEngine()
text_engine = TextEngine()

def main():
    img_path = "sample.png"
    org_image = cv2.imread(img_path)
    images = []

    boxes = box_engine(org_image)
    for box in boxes:
        # crop & transform image
        image = transform_image(org_image, box)
        images.append(image)

    texts = text_engine(images)

if __name__ == '__main__':
    main()

Note: akaOCR (Transform documents into useful data with AI-based IDP - Intelligent Document Processing) - helps make inefficient manual entry a thing of the past—and reliable data insights a thing of the present. Details at: https://app.akaocr.io

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

akaocr-2.1.4.tar.gz (10.9 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

akaocr-2.1.4-py3-none-any.whl (10.9 MB view details)

Uploaded Python 3

File details

Details for the file akaocr-2.1.4.tar.gz.

File metadata

  • Download URL: akaocr-2.1.4.tar.gz
  • Upload date:
  • Size: 10.9 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.10.11

File hashes

Hashes for akaocr-2.1.4.tar.gz
Algorithm Hash digest
SHA256 8d801f13364d15753cebeb89d7912941157c06e8df6ec8d230fa267c06318a91
MD5 a38ae64bdb1eacc97e3dc1f950f9daba
BLAKE2b-256 998fec00a411544a084f3bb8f24a21d7804f71a39a8090b5a4be5d8d43ef3610

See more details on using hashes here.

File details

Details for the file akaocr-2.1.4-py3-none-any.whl.

File metadata

  • Download URL: akaocr-2.1.4-py3-none-any.whl
  • Upload date:
  • Size: 10.9 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.10.11

File hashes

Hashes for akaocr-2.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 ca6cdb2870fcfe98c34e4bd93367a9f961a8b19b88d717917302a3781c217cf0
MD5 acfa61e8096b4211b0e9335b774b5ae1
BLAKE2b-256 f4807d7df02bc98d9d0f18a0e3fa5a91958b345bd84c85a46e9a768eb3c7c509

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page