Skip to main content

Universal Character Recognizer (UCR): Simple, Intuitive, Extensible, Multi-Lingual OCR engine

Project description




Github Runner Covergae Status      Github Runner Covergae Status      Github Runner Covergae Status



Universal Character Recognizer (UCR) is an Open Source, Easy to use Python library to build Production Ready OCR applications with its highly Intuitive, Modular & Extensible API design and off-the-shelf Pretrained Models for over 25 languages.

Read UCR Documentation on ucr.docyard.ai

FeaturesSetupUsageAcknowledgement

PyPI - Python Version PyPI version

Features

  • Supports SOTA Text Detection and Recognition models
  • Built on top of Pytorch and Pytorch Lightning
  • Supports over 25 languages
  • Model Zoo contains 27 Pretrained Models across 25 languages
  • Modular Design Language allows Pick and Choose of different components
  • Easily extensible with Custom Components and attributes
  • Hydra config enables Rapid Prototyping with multiple configurations
  • Support for Packaging, Logging and Deployment tools straight out of the box

Setup

Installation

Require python version >= 3.6.2, install with pip (recommended)

  1. Prerequisites: Install compatible version of Pytorch and torchvision from official repository.
  2. Installation: Install the latest stable version of UCR:
pip install -U ucr

[Optional] Test Installation

Run dummy tests!

ucr test
# Optional: Add -l/--lang='language_id' to test on particular language!
ucr test -l='en_number'

Usage

Workflow

Execution flow of UCR is displayed above. Broadly it can be divided into 4 sub-parts.

  1. Input(img path/folder path/web address) goes into the Detection model which outputs bounding box coordinates of all the text boxes.
  2. The detected boxes are then checked for Orientation and corrected accordingly.
  3. Next, Recognition model runs on the corrected text boxes. It returns bounding box information and OCR output.
  4. Lastly, an optional Post Processing module is executed to improve/modify the results.

Quick Start

The following code snippet shows how to get started with UCR library.

from ucr import UCR

# initialization
ocr = UCR(lang="en_number", device="cpu")

# run prediction
result = ocr.predict('input_path', output='output_path')

# for saving annotated image
result = ocr.predict('input_path', output='output_path', save_image=True)

For complete list of arguments, refer Argument List

Model Zoo

A collection of pretrained models for detection, classification and recognition processes is present here !
These models can be useful for out-of-the-box inference on over 25 languages.

Acknowledgement

Substantial part of the UCR library is either inspired or inherited from the PaddleOCR library. Wherever possible the repository has been ported from PaddlePaddle to PyTorch framework including the direct translation of model parameters. Also, a big thanks to Clova AI, for open sourcing their testing script and pretrained models (CRAFT).

License

Apache License 2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ucr-0.2.14.tar.gz (29.3 MB view details)

Uploaded Source

Built Distribution

ucr-0.2.14-py3-none-any.whl (29.4 MB view details)

Uploaded Python 3

File details

Details for the file ucr-0.2.14.tar.gz.

File metadata

  • Download URL: ucr-0.2.14.tar.gz
  • Upload date:
  • Size: 29.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.5 CPython/3.8.5 Linux/5.4.0-65-generic

File hashes

Hashes for ucr-0.2.14.tar.gz
Algorithm Hash digest
SHA256 fdabcf1dfedca18d707af3f6eae83154b3c2d60fc01af510e5b7a7732256764b
MD5 1eb4e1bc4c8e66456b042d2b0f72d03f
BLAKE2b-256 eb0b95579f1b35ef35e42ddaa46fa015528e2bd3034ff39acd4e1eb861c4b1fc

See more details on using hashes here.

File details

Details for the file ucr-0.2.14-py3-none-any.whl.

File metadata

  • Download URL: ucr-0.2.14-py3-none-any.whl
  • Upload date:
  • Size: 29.4 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.5 CPython/3.8.5 Linux/5.4.0-65-generic

File hashes

Hashes for ucr-0.2.14-py3-none-any.whl
Algorithm Hash digest
SHA256 46fb305c592705defebfc1a8ce2567531c7529ae167bb7d222420244cfabc0f7
MD5 dd8286c137a14a4ccf744b67588ec4a3
BLAKE2b-256 aaf13ce7e293b923cacf8b973a066dc923f9fecbaba9bf4a2442cc83c4500716

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page