Skip to main content

Perform OCR using Google's Drive API v3

Project description

https://img.shields.io/pypi/v/google_drive_ocr?color=success Documentation Status Python Version Support GitHub Issues GitHub Followers Twitter Followers

Perform OCR using Google’s Drive API v3

Features

  • Perform OCR using Google’s Drive API v3

  • Class GoogleOCRApplication() for use in projects

  • Highly configurable CLI

  • Run OCR on a single image file

  • Run OCR on multiple image files

  • Run OCR on all images in directory

  • Use multiple workers (multiprocessing)

  • Work on a PDF document directly

Usage

Using in a Project

Create a GoogleOCRApplication application instance:

from google_drive_ocr import GoogleOCRApplication

app = GoogleOCRApplication('client_secret.json')

Perform OCR on a single image:

app.perform_ocr('image.png')

Perform OCR on mupltiple images:

app.perform_batch_ocr(['image_1.png', 'image_2.png', 'image_3.png'])

Perform OCR on multiple images using multiple workers (multiprocessing):

app.perform_batch_ocr(['image_1.png', 'image_3.png', 'image_2.png'], workers=2)

Using Command Line Interface

Typical usage with several options:

google-ocr --client-secret client_secret.json \
--upload-folder-id <google-drive-folder-id>  \
--image-dir images/ --extension .jpg \
--workers 4 --no-keep

Show help message with the full set of options:

google-ocr --help

Configuration

The default location for configuration is ~/.gdo.cfg. If configuration is written to this location with a set of options, we don’t have to specify those options again on the subsequent runs.

Save configuration and exit:

google-ocr --client-secret client_secret.json --write-config ~/.gdo.cfg

Read configuration from a custom location (if it was written to a custom location):

google-ocr --config ~/.my_config_file ..

Performing OCR

Note: It is assumed that the client-secret option is saved in configuration file.

Single image file:

google-ocr -i image.png

Multiple image files:

google-ocr -b image_1.png image_2.png image_3.png

All image files from a directory with a specific extension:

google-ocr --image-dir images/ --extension .png

Multiple workers (multiprocessing):

google-ocr -b image_1.png image_2.png image_3.png --workers 2

PDF files:

google-ocr --pdf document.pdf --pages 1-3 5 7-10 13

Note: You must setup a Google application and download client_secrets.json file before using google_drive_ocr.

Setup Instructions

Create a project on Google Cloud Platform

Wizard: https://console.developers.google.com/start/api?id=drive

Instructions:

History

0.2.0 (2021-06-29)

  • PDF file support

0.1.0 (2021-06-14)

  • First release on PyPI.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

google_drive_ocr-0.2.6.tar.gz (20.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

google_drive_ocr-0.2.6-py2.py3-none-any.whl (13.8 kB view details)

Uploaded Python 2Python 3

File details

Details for the file google_drive_ocr-0.2.6.tar.gz.

File metadata

  • Download URL: google_drive_ocr-0.2.6.tar.gz
  • Upload date:
  • Size: 20.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.10.1 pkginfo/1.5.0.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.0 CPython/3.8.11

File hashes

Hashes for google_drive_ocr-0.2.6.tar.gz
Algorithm Hash digest
SHA256 c47e4447d4ff15d68c145b72841be14775b6364a1ea381dff98e9ba502538234
MD5 99d992ba014e1f40ad8fda1add2971dd
BLAKE2b-256 4f8583101fdc3f197a2e153be116a8a1e0ab61a2c1371f6b49182b5958d4045c

See more details on using hashes here.

File details

Details for the file google_drive_ocr-0.2.6-py2.py3-none-any.whl.

File metadata

  • Download URL: google_drive_ocr-0.2.6-py2.py3-none-any.whl
  • Upload date:
  • Size: 13.8 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.10.1 pkginfo/1.5.0.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.0 CPython/3.8.11

File hashes

Hashes for google_drive_ocr-0.2.6-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 9ce3dd93eabc07ac3dd7d6b7902b14e089a2feb82dc7d92f174a226428f667c6
MD5 4f587d1abedf23af4f05a3f50208fe4f
BLAKE2b-256 69374f338f36c0f5583edc85b3a7ab9e52552e4d747ad6c68f9fe0d6b10792c5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page