Skip to main content

Input Adaptor to verify file extension

Project description

Optical Character Recognition for images, Pdfs, zip files, tif files.

What you can expect from this repository:

  • Efficient ways to get textual information from your documents like images, pdfs, zip files.

Quick Tour

Get text from documents and save results in JSON.

Installation

Developer mode

pip install python-ocr

For tesseractOcr process

storage_type='local/aws' #currently only local and aws supported. local storage_path='Desired path of your OS where you want to store the output' # for local storage. local storage_path='S3 bucket' # for AWS storage (CASE SENSTIVE).

e.g. for Storing output to AWS

config={'storage_type':'AWS','storage_path:'your-bucket-name'}
from ocr import TesseractOcrProcessor
process=TesseractOcrProcessor(config)

For EasyOcr process

from ocr import EasyOcrProcessor
process=EasyOcrProcessor(config)

storage_type: type of storage local or aws.

storage_path: storage path is path where user wants to store the output result.

# Path of file
PATH=''

# reading image files
process.process_image(PATH)

# reading pdf files
process.process_pdf(PATH)

# reading zip files
process.process_zip(PATH)

Documentation:

The full package documentation is available here.

First of all, you have to create dict of storage_type and storage_path.

  1. storage_type: storage type is type of storage where the user wants to store the output result. It may be local or aws.

  2. storage_path: storage path is path where the user wants to store the output result.

    • if you want to store the file in local system than give the path of folder where user wants to store the result as storage_path.

    • if user wants to store the result in aws than in storage_path you have to give the bucket name.

config={'storage_type':'','storage_path':''}

Now create the object of EasyOcrProcessor which take the config as a object parameter.

process = EasyOcrProcessor(config)

Image process:

To read the text from image user have to call the process_image method of EasyOcrProcessor and pass the path of image file as a parameter in it. process_image method store the output at the storage_path.

process.process_image(PATH)

Pdf process:

To read the text from pdf file user have to call the process_pdf method of EasyOcrProcessor and pass the path of pdf file as a parameter in it. process_pdf method convert each page of pdf into images and create the result of each page and store the result at the storage_path.

process.process_pdf(PATH)

Zip process:

To read the text from zip file user have to call the process_zip method of EasyOcrProcessor and pass the path of zip file as a parameter in it. Zip should contain only files with valid extensions. process_zip method extract each file of zip one by one and save the result at the storage path.

process.process_zip(PATH)

Result output:

[{
        "left": 125,
        "top": 141,
        "right": 259,
        "bottom": 161,
        "text": "Folin MGA-5875",
        "confidence": 0.3961432168382489
    },
    {
        "left": 1115,
        "top": 140,
        "right": 1272,
        "bottom": 161,
        "text": "OM8 N0 : 2126-0006",
        "confidence": 0.41482855467690777
    },
    {
        "left": 1281,
        "top": 139,
        "right": 1498,
        "bottom": 165,
        "text": "Epiration Datc 12/31/2024",
        "confidence": 0.40780972855935615
}]

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

python_ocr-0.1.5.tar.gz (6.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

python_ocr-0.1.5-py3-none-any.whl (7.1 kB view details)

Uploaded Python 3

File details

Details for the file python_ocr-0.1.5.tar.gz.

File metadata

  • Download URL: python_ocr-0.1.5.tar.gz
  • Upload date:
  • Size: 6.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.8.10

File hashes

Hashes for python_ocr-0.1.5.tar.gz
Algorithm Hash digest
SHA256 c59c783ae8bc0137bba73cd3e04eb36994b087615e8b259ca198cb57a8baf6ff
MD5 4ba00a53a5e5eb0655ee4b0179153d83
BLAKE2b-256 e9ec8b678905965ad8e97e1fdcdc6895ad8355c3d0ab5fa0216a5395322ff33d

See more details on using hashes here.

File details

Details for the file python_ocr-0.1.5-py3-none-any.whl.

File metadata

  • Download URL: python_ocr-0.1.5-py3-none-any.whl
  • Upload date:
  • Size: 7.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.8.10

File hashes

Hashes for python_ocr-0.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 57e811b09b69951e26693c2c9c00d6cc9630536d439d7a9416a8fd0f2ef84c80
MD5 56666476f7874b3e062d61b764e1e9ef
BLAKE2b-256 ea110fe8c7493552db408e4bd08dab7691971d0da7e73b59622f57b7babec1cf

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page