Skip to main content

expose a single interface and API to few OCR tools

Project description

opticr

Python library to expose a single interface and API to few OCR tools (google vision, Tesseract)

Install

Required binaries available in the $PATH

poppler-utils (pdf2image)

https://github.com/Belval/pdf2image#how-to-install

tesseract

https://tesseract-ocr.github.io

Install OpticR

With pip

pip install opticr

With poetry

poetry add opticr

or to get the latest 'dangerous' version

poetry add  git+https://github.com/lzayep/opticr@main

Usage

from opticr import OpticR

ocr = OpticR("tesseract")
pathtofile = "test/contract.pdf
pages: list[str] = ocr.get_pages(pathtofile)

With google-vision:

from opticr import OpticR

ocr = OpticR("google-vision", options={"google-vision": {"auth": {"token": ""}}})

# file could come from an URL
pathtofile = "https://example.com/contract.pdf
pages: list[str] = ocr.get_pages(pathtofile)

Cache the result, if the file as already been OCR return immediatly the previous result. Result are stored temporarly in the local storage or shared storage such as Redis.

from opticr import OpticR

ocr = OpticR("tesseract", options={"cache":
                         {"backend": "redis", redis: "redis://"}}

# file could come from an URL
pathtofile = "https://example.com/contract.pdf
pages: list[str] = ocr.get_pages(pathtofile, cache=True)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

opticr-0.2.0.tar.gz (7.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

opticr-0.2.0-py3-none-any.whl (8.8 kB view details)

Uploaded Python 3

File details

Details for the file opticr-0.2.0.tar.gz.

File metadata

  • Download URL: opticr-0.2.0.tar.gz
  • Upload date:
  • Size: 7.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.2.1 CPython/3.10.1 Linux/5.10.102.1-microsoft-standard-WSL2

File hashes

Hashes for opticr-0.2.0.tar.gz
Algorithm Hash digest
SHA256 000dc8decef7c447518d22ff6f8d696520abae65c6a2b545d1e66bf6701676bd
MD5 7f84bf27371bae461826e16a4cbfd613
BLAKE2b-256 2bce45d5ba1816c16c1263205823626af71fe3f0f97777037e7c0d5531c70636

See more details on using hashes here.

File details

Details for the file opticr-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: opticr-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 8.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.2.1 CPython/3.10.1 Linux/5.10.102.1-microsoft-standard-WSL2

File hashes

Hashes for opticr-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0b1d8b850084064f94aa54f359b60ec67468b1cd1ffa09870b18ae28ab0b5090
MD5 fc881313056ae446cc33941402e2d245
BLAKE2b-256 6ba8a4b503548f1fb6182d2668f03548fc6b336b0b7934776954ac1e27b672e9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page