expose a single interface and API to few OCR tools
Project description
opticr
Python library to expose a single interface and API to few OCR tools (google vision, Tesseract)
Install
Required binaries available in the $PATH
poppler-utils (pdf2image)
https://github.com/Belval/pdf2image#how-to-install
tesseract
https://tesseract-ocr.github.io
Install OpticR
With pip
pip install opticr
With poetry
poetry add opticr
or to get the latest 'dangerous' version
poetry add git+https://github.com/lzayep/opticr@main
Usage
from opticr import OpticR
ocr = OpticR("tesseract")
pathtofile = "test/contract.pdf
pages: list[str] = ocr.get_pages(pathtofile)
With google-vision:
from opticr import OpticR
ocr = OpticR("google-vision", options={"google-vision": {"auth": {"token": ""}}})
# file could come from an URL
pathtofile = "https://example.com/contract.pdf
pages: list[str] = ocr.get_pages(pathtofile)
Cache the result, if the file as already been OCR return immediatly the previous result. Result are stored temporarly in the local storage or shared storage such as Redis.
from opticr import OpticR
ocr = OpticR("tesseract", options={"cache":
{"backend": "redis", redis: "redis://"}}
# file could come from an URL
pathtofile = "https://example.com/contract.pdf
pages: list[str] = ocr.get_pages(pathtofile, cache=True)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file opticr-0.2.0.tar.gz.
File metadata
- Download URL: opticr-0.2.0.tar.gz
- Upload date:
- Size: 7.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.2.1 CPython/3.10.1 Linux/5.10.102.1-microsoft-standard-WSL2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
000dc8decef7c447518d22ff6f8d696520abae65c6a2b545d1e66bf6701676bd
|
|
| MD5 |
7f84bf27371bae461826e16a4cbfd613
|
|
| BLAKE2b-256 |
2bce45d5ba1816c16c1263205823626af71fe3f0f97777037e7c0d5531c70636
|
File details
Details for the file opticr-0.2.0-py3-none-any.whl.
File metadata
- Download URL: opticr-0.2.0-py3-none-any.whl
- Upload date:
- Size: 8.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.2.1 CPython/3.10.1 Linux/5.10.102.1-microsoft-standard-WSL2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0b1d8b850084064f94aa54f359b60ec67468b1cd1ffa09870b18ae28ab0b5090
|
|
| MD5 |
fc881313056ae446cc33941402e2d245
|
|
| BLAKE2b-256 |
6ba8a4b503548f1fb6182d2668f03548fc6b336b0b7934776954ac1e27b672e9
|