Skip to main content

Universal Character Recognizer (UCR): Simple, Intuitive, Extensible, Multi-Lingual OCR engine

Project description



Github Runner Covergae Status      Github Runner Covergae Status      Github Runner Covergae Status



Universal Character Recognizer (UCR) is an Open Source, Easy to use Python library to build Production Ready OCR applications with its highly Intuitive, Modular & Extensible API design and off-the-shelf Pretrained Models for over 25 languages.

Read UCR Documentation on ucr.docyard.ai

FeaturesSetupUsageAcknowledgement

PyPI - Python Version PyPI version

Demo

For details, click here!

Features

  • Supports SOTA Text Detection and Recognition models
  • Built on top of Pytorch and Pytorch Lightning
  • Supports over 25 languages
  • Model Zoo contains 27 Pretrained Models across 25 languages
  • Modular Design Language allows Pick and Choose of different components
  • Easily extensible with Custom Components and attributes
  • Hydra config enables Rapid Prototyping with multiple configurations
  • Support for Packaging, Logging and Deployment tools straight out of the box

Note: Some features are still in active development and might not be available.

Setup

Installation

Require python version >= 3.6.2, install with pip (recommended)

  1. Prerequisites: Install compatible version of Pytorch and torchvision from official repository.
  2. Installation: Install the latest stable version of UCR:
pip install -U ucr

[Optional] Test Installation

Run dummy tests!

ucr test
# Optional: Add -l/--lang='language_id' to test on particular language!
ucr test -l='en_number'

Usage

Workflow

Execution flow of UCR is displayed above. Broadly it can be divided into 4 sub-parts:

  1. Input (image/folder path or web address) is fed into the Detection model which outputs bounding box coordinates of all the text boxes.
  2. The detected boxes are then checked for Orientation and corrected accordingly.
  3. Next, Recognition model runs on the corrected text boxes. It returns bounding box information and OCR output.
  4. Lastly, an optional Post Processing module is executed to improve/modify the results.

Quick Start

The following code snippet shows how to get started with UCR library.

from ucr import UCR

# initialization
ocr = UCR(lang="en_number", device="cpu")

# run prediction
result = ocr.predict('input_path', output='output_path')

# for saving annotated image
result = ocr.predict('input_path', output='output_path', save_image=True)

For complete list of arguments, refer Argument List

Model Zoo

A collection of pretrained models for detection, classification and recognition processes is present here !
These models can be useful for out-of-the-box inference on over 25 languages.

Acknowledgement

Substantial part of the UCR library is either inspired or inherited from the PaddleOCR library. Wherever possible the repository has been ported from PaddlePaddle to PyTorch framework including the direct translation of model parameters. Also, a big thanks to Clova AI, for open sourcing their testing script and pretrained models (CRAFT).

License

Apache License 2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ucr-0.2.15.tar.gz (153.1 kB view details)

Uploaded Source

Built Distribution

ucr-0.2.15-py3-none-any.whl (213.6 kB view details)

Uploaded Python 3

File details

Details for the file ucr-0.2.15.tar.gz.

File metadata

  • Download URL: ucr-0.2.15.tar.gz
  • Upload date:
  • Size: 153.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.5 CPython/3.8.5 Linux/5.4.0-65-generic

File hashes

Hashes for ucr-0.2.15.tar.gz
Algorithm Hash digest
SHA256 c30372404162d50f6a8d9c702e5b1243137c087ba982509a42a59036c6477124
MD5 993065ad7812c895d1aa98ef38dbd63e
BLAKE2b-256 165ba591d1ea68efdf0000a5529a2d7b16bda1048951ff9df3be88bdc7eab172

See more details on using hashes here.

File details

Details for the file ucr-0.2.15-py3-none-any.whl.

File metadata

  • Download URL: ucr-0.2.15-py3-none-any.whl
  • Upload date:
  • Size: 213.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.5 CPython/3.8.5 Linux/5.4.0-65-generic

File hashes

Hashes for ucr-0.2.15-py3-none-any.whl
Algorithm Hash digest
SHA256 d36df926e05070384f2ded6b3fe01081b98416d0b0df3c7241c4797d6a93de87
MD5 23f64613758345d9a121707a76bb9b90
BLAKE2b-256 bf0568bb497145eb7aaf13d22624a355d4d1e9647a0c1d6d5dd4e26df6fa9ebe

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page