Skip to main content

No project description provided

Project description

Multiocr

This package intends to give a common interface for multiple ocr backends

Installation

pip install multiocr

Supported OCR Backends

  • Tesseract
  • PaddleOCR
  • Aws Textract
  • EasyOCR
  • Doctr-Ocr

the output for all ocr backend will be simillar

Code Example

Tesseract

from multiocr import OcrEngine

config = {
    "lang": "eng",
    "config" : "--psm 6"   
}
image_file = "path/to/image.jpg"
engine = OcrEngine("tesseract", config)
text_dict = engine.text_extraction(image_file)
json = engine.text_extraction_to_json(text_dict)
df = engine.text_extraction_to_df(text_dict)
plain_text = engine.extract_plain_text(text_dict)

PaddleOCR

from multiocr import OcrEngine

config = {
        "lang":"en"
    }
image_file = "path/to/image.jpg"
engine = OcrEngine("paddle_ocr", config)
text_dict = engine.text_extraction(image_file)
json = engine.text_extraction_to_json(text_dict)
df = engine.text_extraction_to_df(text_dict)
plain_text = engine.extract_plain_text(text_dict)

Aws Textract

from multiocr import OcrEngine

config = {
    "region_name":os.getenv("region_name"),
    "aws_access_key_id":os.getenv("aws_access_key_id"),
    "aws_secret_access_key":os.getenv("aws_secret_access_key")
}
image_file = "path/to/image.jpg"

engine = OcrEngine("aws_textract", config)
text_dict = engine.text_extraction(image_file)
json = engine.text_extraction_to_json(text_dict)
df = engine.text_extraction_to_df(text_dict)
plain_text = engine.extract_plain_text(text_dict)

EasyOCR

from multiocr import OcrEngine

config = {
    "lang_list": ["en"]
}
image_file = "path/to/image.jpg"
engine = OcrEngine("easy_ocr", config)
text_dict = engine.text_extraction(image_file)
json = engine.text_extraction_to_json(text_dict)
df = engine.text_extraction_to_df(text_dict)
plain_text = engine.extract_plain_text(text_dict)

TrOCR

from multiocr import OcrEngine

image_file = "path/to/image.jpg"
engine = OcrEngine("doctr_ocr")
text_dict = engine.text_extraction(image_file)
json = engine.text_extraction_to_json(text_dict)
df = engine.text_extraction_to_df(text_dict)
plain_text = engine.extract_plain_text(text_dict)

if you want to access the output of each individual ocr engine in their own raw format, we can fetch it this way

raw_ocr_output = engine.engine.raw_ocr

config is the each ocr's input parameters and it should be python dictionary. if not given, it'll default to each respective libraries default parameters

the input parameters for each ocr differs, and you can look at its respective repo for all allowable parameters

Reference & Acknowlegements

WIP - OCR Backends

  • MMOCR
  • Google Vision
  • Azure OCR
  • DocTR

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

multiocr-0.1.4.tar.gz (124.6 kB view details)

Uploaded Source

Built Distribution

multiocr-0.1.4-py3-none-any.whl (221.3 kB view details)

Uploaded Python 3

File details

Details for the file multiocr-0.1.4.tar.gz.

File metadata

  • Download URL: multiocr-0.1.4.tar.gz
  • Upload date:
  • Size: 124.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.4

File hashes

Hashes for multiocr-0.1.4.tar.gz
Algorithm Hash digest
SHA256 ec0225db4f35296eccfefeb7c533eb213a9054c489c5064b9460ff2deabbb79c
MD5 0ede51a3eceee9fe63835e1f948e71c3
BLAKE2b-256 fab65a8dd2331fe79ffe076a96ed7e15c4125a7b5a1631406cfa758af56b0b6e

See more details on using hashes here.

File details

Details for the file multiocr-0.1.4-py3-none-any.whl.

File metadata

  • Download URL: multiocr-0.1.4-py3-none-any.whl
  • Upload date:
  • Size: 221.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.4

File hashes

Hashes for multiocr-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 9d9fd4b1ffe91331e480c216be0a0d01b50bea972484bf8f679c61ed3912784e
MD5 d7516b83fb4e5f5c3ce3a66d5e336fcc
BLAKE2b-256 b5681f1e29fbe7927606db75b595ccfded4299e189921958585430339095214a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page