Input Adaptor to verify file extension

These details have not been verified by PyPI

Project description

Optical Character Recognition for images, Pdfs, zip files, tif files.

What you can expect from this repository:

Efficient ways to get textual information from your documents like images, pdfs, zip files.

Quick Tour

Get text from documents and save results in JSON.

Installation

Developer mode

pip install python-ocr

For tesseractOcr process

storage_type='local/aws' #currently only local and aws supported. local storage_path='Desired path of your OS where you want to store the output' # for local storage. local storage_path='S3 bucket' # for AWS storage (CASE SENSTIVE).

e.g. for Storing output to AWS

config={'storage_type':'AWS','storage_path:'your-bucket-name'}

from ocr import TesseractOcrProcessor
process=TesseractOcrProcessor(config)

For EasyOcr process

from ocr import EasyOcrProcessor
process=EasyOcrProcessor(config)

storage_type: type of storage local or aws.

storage_path: storage path is path where user wants to store the output result.

# Path of file
PATH=''

# reading image files
process.process_image(PATH)

# reading pdf files
process.process_pdf(PATH)

# reading zip files
process.process_zip(PATH)

Documentation:

The full package documentation is available here.

First of all, you have to create dict of storage_type and storage_path.

storage_type: storage type is type of storage where the user wants to store the output result. It may be local or aws.
storage_path: storage path is path where the user wants to store the output result.
- if you want to store the file in local system than give the path of folder where user wants to store the result as storage_path.
- if user wants to store the result in aws than in storage_path you have to give the bucket name.

config={'storage_type':'','storage_path':''}

Now create the object of EasyOcrProcessor which take the config as a object parameter.

process = EasyOcrProcessor(config)

Image process:

To read the text from image user have to call the process_image method of EasyOcrProcessor and pass the path of image file as a parameter in it. process_image method store the output at the storage_path.

process.process_image(PATH)

Pdf process:

To read the text from pdf file user have to call the process_pdf method of EasyOcrProcessor and pass the path of pdf file as a parameter in it. process_pdf method convert each page of pdf into images and create the result of each page and store the result at the storage_path.

process.process_pdf(PATH)

Zip process:

To read the text from zip file user have to call the process_zip method of EasyOcrProcessor and pass the path of zip file as a parameter in it. Zip should contain only files with valid extensions. process_zip method extract each file of zip one by one and save the result at the storage path.

process.process_zip(PATH)

Result output:

[{
        "left": 125,
        "top": 141,
        "right": 259,
        "bottom": 161,
        "text": "Folin MGA-5875",
        "confidence": 0.3961432168382489
    },
    {
        "left": 1115,
        "top": 140,
        "right": 1272,
        "bottom": 161,
        "text": "OM8 N0 : 2126-0006",
        "confidence": 0.41482855467690777
    },
    {
        "left": 1281,
        "top": 139,
        "right": 1498,
        "bottom": 165,
        "text": "Epiration Datc 12/31/2024",
        "confidence": 0.40780972855935615
}]

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.5

Aug 12, 2022

0.1.4

Aug 12, 2022

0.1.3

Aug 9, 2022

0.1.2

Aug 9, 2022

0.1.1

Aug 9, 2022

0.1.0

Aug 9, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

python_ocr-0.1.5.tar.gz (6.8 kB view details)

Uploaded Aug 12, 2022 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

python_ocr-0.1.5-py3-none-any.whl (7.1 kB view details)

Uploaded Aug 12, 2022 Python 3

File details

Details for the file python_ocr-0.1.5.tar.gz.

File metadata

Download URL: python_ocr-0.1.5.tar.gz
Upload date: Aug 12, 2022
Size: 6.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.1 CPython/3.8.10

File hashes

Hashes for python_ocr-0.1.5.tar.gz
Algorithm	Hash digest
SHA256	`c59c783ae8bc0137bba73cd3e04eb36994b087615e8b259ca198cb57a8baf6ff`
MD5	`4ba00a53a5e5eb0655ee4b0179153d83`
BLAKE2b-256	`e9ec8b678905965ad8e97e1fdcdc6895ad8355c3d0ab5fa0216a5395322ff33d`

See more details on using hashes here.

File details

Details for the file python_ocr-0.1.5-py3-none-any.whl.

File metadata

Download URL: python_ocr-0.1.5-py3-none-any.whl
Upload date: Aug 12, 2022
Size: 7.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.1 CPython/3.8.10

File hashes

Hashes for python_ocr-0.1.5-py3-none-any.whl
Algorithm	Hash digest
SHA256	`57e811b09b69951e26693c2c9c00d6cc9630536d439d7a9416a8fd0f2ef84c80`
MD5	`56666476f7874b3e062d61b764e1e9ef`
BLAKE2b-256	`ea110fe8c7493552db408e4bd08dab7691971d0da7e73b59622f57b7babec1cf`

See more details on using hashes here.

python-ocr 0.1.5

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

Optical Character Recognition for images, Pdfs, zip files, tif files.

Quick Tour

Installation

For tesseractOcr process

For EasyOcr process

Documentation:

Image process:

Pdf process:

Zip process:

Result output:

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes