Python tools for interacting with Tesseract
Project description
OCR utils
Python tools for interacting with Tesseract
Features
- Detects tables in PDF/images and performs OCR on each cell
- Performs OCR on PDF and generates SVG image
Quick Start
from ocr_utils import pdf_to_svg
pdf_to_svg(
input_filename='in.pdf',
output_filename='out.svg',
detect_tables=True,
lang='en',
)
Execution example
Input pdf
Output svg
Installation
Stable Release: pip install tesseract_ocr_utils
Development Head: pip install git+https://github.com/envinorma/ocr_utils.git
This library is built upon pytesseract and pdf2image which have non-pip requirements. Visit these libraries installation pages to install dependencies.
For example, on ubuntu, the following libraries need to be installed:
apt-get install libarchive13
apt-get install tesseract-ocr
apt-get install poppler-utils
Documentation
For full package documentation please visit envinorma.github.io/ocr_utils.
Development
See CONTRIBUTING.md for information related to developing the code.
MIT license
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
tesseract_ocr_utils-0.0.3.tar.gz
(559.6 kB
view hashes)
Built Distribution
Close
Hashes for tesseract_ocr_utils-0.0.3.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 05f81eddb32e23043da7a56b77fe6324cf2ec01767ec1af80bcb43a1914934f3 |
|
MD5 | e7009c9219d4bfe0f511df7e68e80343 |
|
BLAKE2b-256 | 07ca37911eb9050766e19b11f47a6f59b83448e993b5101e60fda279f9f9cf94 |
Close
Hashes for tesseract_ocr_utils-0.0.3-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b580604ed5f1ea36676c25ff68d1e13ec0d6d7cd5aa19a174b7cf15febb5ba08 |
|
MD5 | 27b79e7e04474094442efde0bdc39ba0 |
|
BLAKE2b-256 | 49ad93723c8be148e46686d2f5ed5223ce43c736df5b016729fdda7740b96630 |