Skip to main content

Python tools for interacting with Tesseract

Project description

OCR utils

Build Status Documentation Code Coverage

Python tools for interacting with Tesseract


Features

  • Detects tables in PDF/images and performs OCR on each cell
  • Performs OCR on PDF and generates SVG image

Quick Start

from ocr_utils import pdf_to_svg

pdf_to_svg(
    input_filename='in.pdf',
    output_filename='out.svg',
    detect_tables=True,
    lang='eng',
)

Execution example

Input pdf

Input pdf

Output svg

Output svg

Installation

Stable Release: pip install tesseract_ocr_utils
Development Head: pip install git+https://github.com/envinorma/ocr_utils.git

This library is built upon pytesseract and pdf2image which have non-pip requirements. Visit these libraries installation pages to install dependencies.

For example, on ubuntu, the following libraries need to be installed:

apt-get install libarchive13
apt-get install tesseract-ocr
apt-get install poppler-utils

Documentation

For full package documentation please visit envinorma.github.io/ocr_utils.

Development

See CONTRIBUTING.md for information related to developing the code.

MIT license

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tesseract_ocr_utils-0.0.6.tar.gz (560.3 kB view details)

Uploaded Source

Built Distribution

tesseract_ocr_utils-0.0.6-py2.py3-none-any.whl (12.0 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file tesseract_ocr_utils-0.0.6.tar.gz.

File metadata

  • Download URL: tesseract_ocr_utils-0.0.6.tar.gz
  • Upload date:
  • Size: 560.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.10.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.9.4

File hashes

Hashes for tesseract_ocr_utils-0.0.6.tar.gz
Algorithm Hash digest
SHA256 7a27d67141496ca1875368c35639d163b90c35c033547bd464fe33efae9a5035
MD5 c5e7eb8dbe8404a1a9ad963af16e453d
BLAKE2b-256 41947b2ed55fdd63f76bdc07cbf83e7f673fc39a40fca384603a635d77c68418

See more details on using hashes here.

File details

Details for the file tesseract_ocr_utils-0.0.6-py2.py3-none-any.whl.

File metadata

  • Download URL: tesseract_ocr_utils-0.0.6-py2.py3-none-any.whl
  • Upload date:
  • Size: 12.0 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.10.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.9.4

File hashes

Hashes for tesseract_ocr_utils-0.0.6-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 9d262db1ac307b8f98d4f8f4e0e430d3b38ee435087a45079121d8810143550a
MD5 9e132ca16dca8fa747996377e691395a
BLAKE2b-256 5d7a8c7e3768b5d08d0c44d726f07afb3d99b29102380b6c93f0d1dbf64509a5

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page