Python tools for interacting with Tesseract
Project description
OCR utils
Python tools for interacting with Tesseract
Features
- Detects tables in PDF/images and performs OCR on each cell
- Performs OCR on PDF and generates SVG image
Quick Start
from ocr_utils import pdf_to_svg
pdf_to_svg(
input_filename='in.pdf',
output_filename='out.svg',
detect_tables=True,
lang='eng',
)
Execution example
Input pdf
Output svg
Installation
Stable Release: pip install tesseract_ocr_utils
Development Head: pip install git+https://github.com/envinorma/ocr_utils.git
This library is built upon pytesseract and pdf2image which have non-pip requirements. Visit these libraries installation pages to install dependencies.
For example, on ubuntu, the following libraries need to be installed:
apt-get install libarchive13
apt-get install tesseract-ocr
apt-get install poppler-utils
Documentation
For full package documentation please visit envinorma.github.io/ocr_utils.
Development
See CONTRIBUTING.md for information related to developing the code.
MIT license
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
tesseract_ocr_utils-0.0.6.tar.gz
(560.3 kB
view hashes)
Built Distribution
Close
Hashes for tesseract_ocr_utils-0.0.6.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7a27d67141496ca1875368c35639d163b90c35c033547bd464fe33efae9a5035 |
|
MD5 | c5e7eb8dbe8404a1a9ad963af16e453d |
|
BLAKE2b-256 | 41947b2ed55fdd63f76bdc07cbf83e7f673fc39a40fca384603a635d77c68418 |
Close
Hashes for tesseract_ocr_utils-0.0.6-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9d262db1ac307b8f98d4f8f4e0e430d3b38ee435087a45079121d8810143550a |
|
MD5 | 9e132ca16dca8fa747996377e691395a |
|
BLAKE2b-256 | 5d7a8c7e3768b5d08d0c44d726f07afb3d99b29102380b6c93f0d1dbf64509a5 |