Skip to main content

Universal Character Recognizer (UCR): Simple, Intuitive, Extensible, Multi-Lingual OCR engine

Project description



Github Runner Covergae Status      Github Runner Covergae Status      Github Runner Covergae Status



Universal Character Recognizer (UCR) is an Open Source, Easy to use Python library to build Production Ready OCR applications with its highly Intuitive, Modular & Extensible API design and off-the-shelf Pretrained Models for over 25 languages.

Read UCR Documentation on ucr.docyard.ai

FeaturesSetupUsageAcknowledgement

PyPI - Python Version PyPI version

Demo

For details, click here!

Features

  • Supports SOTA Text Detection and Recognition models
  • Built on top of Pytorch and Pytorch Lightning
  • Supports over 25 languages
  • Model Zoo contains 27 Pretrained Models across 25 languages
  • Modular Design Language allows Pick and Choose of different components
  • Easily extensible with Custom Components and attributes
  • Hydra config enables Rapid Prototyping with multiple configurations
  • Support for Packaging, Logging and Deployment tools straight out of the box

Note: Some features are still in active development and might not be available.

Setup

Installation

Require python version >= 3.6.2, install with pip (recommended)

  1. Prerequisites: Install compatible version of Pytorch and torchvision from official repository.
  2. Installation: Install the latest stable version of UCR:
pip install -U ucr

[Optional] Test Installation

Run dummy tests!

ucr test
# Optional: Add -l/--lang='language_id' to test on particular language!
ucr test -l='en_number'

Usage

Workflow

Execution flow of UCR is displayed above. Broadly it can be divided into 4 sub-parts:

  1. Input (image/folder path or web address) is fed into the Detection model which outputs bounding box coordinates of all the text boxes.
  2. The detected boxes are then checked for Orientation and corrected accordingly.
  3. Next, Recognition model runs on the corrected text boxes. It returns bounding box information and OCR output.
  4. Lastly, an optional Post Processing module is executed to improve/modify the results.

Quick Start

The following code snippet shows how to get started with UCR library.

from ucr import UCR

# initialization
ocr = UCR(lang="en_number", device="cpu")

# run prediction
result = ocr.predict('input_path', output='output_path')

# for saving annotated image
result = ocr.predict('input_path', output='output_path', save_image=True)

For complete list of arguments, refer Argument List

Model Zoo

A collection of pretrained models for detection, classification and recognition processes is present here !
These models can be useful for out-of-the-box inference on over 25 languages.

Acknowledgement

Substantial part of the UCR library is either inspired or inherited from the PaddleOCR library. Wherever possible the repository has been ported from PaddlePaddle to PyTorch framework including the direct translation of model parameters. Also, a big thanks to Clova AI, for open sourcing their testing script and pretrained models (CRAFT).

License

Apache License 2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ucr-0.2.16.tar.gz (153.1 kB view details)

Uploaded Source

Built Distribution

ucr-0.2.16-py3-none-any.whl (213.6 kB view details)

Uploaded Python 3

File details

Details for the file ucr-0.2.16.tar.gz.

File metadata

  • Download URL: ucr-0.2.16.tar.gz
  • Upload date:
  • Size: 153.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.5 CPython/3.8.5 Linux/5.4.0-65-generic

File hashes

Hashes for ucr-0.2.16.tar.gz
Algorithm Hash digest
SHA256 639c538afeceabd17c197583109d31a6cb1f6a7529d8acf32899884c6b995038
MD5 c6aad7ed2d8b1374835dbe21f0e9748f
BLAKE2b-256 f671bdd2e235702c742d50a57dc329d2f360844759a9d2bfd5800e496392dd19

See more details on using hashes here.

File details

Details for the file ucr-0.2.16-py3-none-any.whl.

File metadata

  • Download URL: ucr-0.2.16-py3-none-any.whl
  • Upload date:
  • Size: 213.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.5 CPython/3.8.5 Linux/5.4.0-65-generic

File hashes

Hashes for ucr-0.2.16-py3-none-any.whl
Algorithm Hash digest
SHA256 f1a2df497475a57107cc128c7217ea8e1b8bb67621be2feb3b8c0ce4b19fd500
MD5 bfe6df216cd1cbd98680db6e8652b517
BLAKE2b-256 37b11fc0afe136d83672b23de808c86d647075a18e9d18b335f29e43cdba1520

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page