Python tools for interacting with Tesseract
Project description
OCR utils
Python tools for interacting with Tesseract
Features
- Detects tables in PDF/images and performs OCR on each cell
- Performs OCR on PDF and generates SVG image
Quick Start
from ocr_utils import pdf_to_svg
pdf_to_svg(
input_filename='in.pdf',
output_filename='out.svg',
detect_tables=True,
lang='en',
)
Execution example
Input pdf
Output svg
Installation
Stable Release: pip install tesseract_ocr_utils
Development Head: pip install git+https://github.com/envinorma/ocr_utils.git
This library is built upon pytesseract and pdf2image which have non-pip requirements. Visit these libraries installation pages to install dependencies.
For example, on ubuntu, the following libraries need to be installed:
apt-get install libarchive13
apt-get install tesseract-ocr
apt-get install poppler-utils
Documentation
For full package documentation please visit envinorma.github.io/ocr_utils.
Development
See CONTRIBUTING.md for information related to developing the code.
MIT license
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
tesseract_ocr_utils-0.0.4.tar.gz
(559.6 kB
view hashes)
Built Distribution
Close
Hashes for tesseract_ocr_utils-0.0.4.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 23006a2a46769a085fe4b3cd3d8d55bc9646d2863063ea3a93150f4104021de9 |
|
MD5 | c84e5429e1c240f1873c04f17631b757 |
|
BLAKE2b-256 | 333b206c6c9bd7d9829878053fc3afe0d0f9ed08628040605d1a64e93586bdaf |
Close
Hashes for tesseract_ocr_utils-0.0.4-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d8c46c3bb2b33209810a7c0071e1d25126a75783be2393e7f6869544ee495f2a |
|
MD5 | 60236205a4ca1db689f90d46a8c2ff1d |
|
BLAKE2b-256 | 2f406892b544046a02f6221cdc685f4a452d657b788509307a8a9cea449c376c |