akaOCR Package Tools

Project description

akaOCR

Description

This package is compatible with akaOCR for ocr pipeline program (Text Detection & Text Recognition), using ONNX format model (CPU speed can be x2 Times Faster). This code is referenced from this awesome repo.

Installation

You can install the package using pip:

pip install akaocr

Usage

Step 1: Text Detection.

from akaocr import BoxEngine
import cv2

img_path = "path/to/image.jpg"
image = cv2.imread(img_path)

# side_len: minimum inference image size
box_engine = BoxEngine(model_path: Any | None = None,
                        side_len: int | None = None,
                        conf_thres: float = 0.5
                        mask_thes: float = 0.4,
                        unclip_ratio: float = 2.0,
                        max_candidates: int = 1000,
                        device: str = 'cpu | gpu',
                        num_workers: int = 4)

# inference for one image
results = box_engine(image) # [np.array([4 points]),...]

Step 2: Text Recognition.

from akaocr import TextEngine
import cv2

img_path = "path/to/cropped_image.jpg"
cropped_image = cv2.imread(img_path)

text_engine = TextEngine(model_path: Any | None = None,
                        vocab_path: Any | None = None,
                        use_space_char: bool = True,
                        model_shape: list = [3, 48, 320],
                        max_wh_ratio: int = None,
                        device: str='cpu | gpu',
                        num_workers: int = 4)

# inference for one or more images
results = text_engine(cropped_image) # [(text, conf),...]

Step 3: Perspective Transform Image.

import numpy as np

def transform_image(image, box):
    # Get perspective transform image

    assert len(box) == 4, "Shape of points must be 4x2"
    img_crop_width = int(
        max(
            np.linalg.norm(box[0] - box[1]),
            np.linalg.norm(box[2] - box[3])))
    img_crop_height = int(
        max(
            np.linalg.norm(box[0] - box[3]),
            np.linalg.norm(box[1] - box[2])))
    pts_std = np.float32([[0, 0], 
                        [img_crop_width, 0],
                        [img_crop_width, img_crop_height],
                        [0, img_crop_height]])
    M = cv2.getPerspectiveTransform(box, pts_std)
    dst_img = cv2.warpPerspective(
        image,
        M, (img_crop_width, img_crop_height),
        borderMode=cv2.BORDER_REPLICATE,
        flags=cv2.INTER_CUBIC)
    
    img_height, img_width = dst_img.shape[0:2]
    if img_height/img_width >= 1.25:
            dst_img = np.rot90(dst_img, k=3)

    return dst_img

Step 4: Text Image Rotation.

from akaocr import RotateEngine
import cv2

img_path = "path/to/cropped_image.jpg"
cropped_image = cv2.imread(img_path)

rotate_engine = RotateEngine(model_path: Any | None = None,
                        conf_thres: float = 0.75
                        device: str='cpu | gpu',
                        num_workers: int = 4)

# classify input image as 0 or 180 degrees
# inference for one or more images
results = rotate_engine(cropped_image) # [(lable (0|180), conf),...]

Step 5: Use in your code: OCR Pipeline.

from akaocr import TextEngine
from akaocr import BoxEngine

import cv2

box_engine = BoxEngine()
text_engine = TextEngine()

def main():
    img_path = "sample.png"
    org_image = cv2.imread(img_path)
    images = []

    boxes = box_engine(org_image)
    for box in boxes:
        # crop & transform image
        image = transform_image(org_image, box)
        images.append(image)

    texts = text_engine(images)

if __name__ == '__main__':
    main()

Note: akaOCR (Transform documents into useful data with AI-based IDP - Intelligent Document Processing) - helps make inefficient manual entry a thing of the past—and reliable data insights a thing of the present. Details at: https://app.akaocr.io

Project details

Release history Release notifications | RSS feed

2.0.16

Aug 6, 2024

2.0.15

Aug 6, 2024

This version

2.0.14

Aug 3, 2024

2.0.13

Aug 3, 2024

2.0.12

Jul 24, 2024

2.0.10

Jul 24, 2024

2.0.8

Jul 24, 2024

2.0.6

Jun 28, 2024

2.0.5

Jun 25, 2024

2.0.4

Jun 6, 2024

2.0.3

Jun 6, 2024

2.0.2

Jun 5, 2024

2.0.1

Jun 5, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

akaocr-2.0.14.tar.gz (10.9 MB view hashes)

Uploaded Aug 3, 2024 Source

Built Distribution

akaocr-2.0.14-py3-none-any.whl (10.9 MB view hashes)

Uploaded Aug 3, 2024 Python 3

Hashes for akaocr-2.0.14.tar.gz

Hashes for akaocr-2.0.14.tar.gz
Algorithm	Hash digest
SHA256	`2df44d2b3d06ab8a6cce72d34e07219f8276a8300b8d187782c220c015c069d4`
MD5	`a417595d08cf9d204492f24ae79355f5`
BLAKE2b-256	`4eaeb34511da79721c298e181075dc3a9d08f5c0472e558231d6dd1a38074998`

Hashes for akaocr-2.0.14-py3-none-any.whl

Hashes for akaocr-2.0.14-py3-none-any.whl
Algorithm	Hash digest
SHA256	`73150c7fe2be6235b28a661ab8ad6a385a0737a911df0505e176aaeb0a063365`
MD5	`b41e33e18cfd8285a9d180416b70e3df`
BLAKE2b-256	`4668a2913b176acbcb615c3a56ca825138401db1ed409136c44be734943bd454`