Skip to main content

Python package for combining .hocr files and images into searchable PDFs

Project description

HOCkeR

Python package for combining hOCR files and images into searchable PDFs

Table of Contents

  1. What is hoCKeR?
  2. How to install
  3. How to Use
  4. Credits

What is hOCkeR?

HOCkeR is a Python package for combining hOCR files and images into searchable PDFs. The package lays the text on top of the image, and then creates a PDF with the text and image. The code used is from HOCRConverter by jbrinley. The code was designed for Python 2, therefore does not work with newer version of python, so I created this package as an update to the original code.

How to install

To install the package, run the following command within a python environment:

pip install hocker

If any errors occur whilst installing, try using the .whl file instead linked here

How to use hOCkeR

Below is an example of how to use hOCkeR to combine an png and a .hocr file into a PDF

import hocker as hkr

image_path = 'path/to/image.png'
hocr_path = 'path/to/image.hocr'

# Specify the element in the hocr file to use as the text
hocr = hkr.HOCRCombiner('ocrx_word') # For tesseract outputs, it is 'ocrx_word'

# Specify the hocr and image path
hocr.locate_image(image_path)
hocr.locate_hocr(hocr_path)

# Output the PDF
hocr.to_pdf('path/to/output.pdf')

Credits & links

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hocker-1.0.4.tar.gz (280.4 kB view details)

Uploaded Source

Built Distribution

hocker-1.0.4-py3-none-any.whl (5.3 kB view details)

Uploaded Python 3

File details

Details for the file hocker-1.0.4.tar.gz.

File metadata

  • Download URL: hocker-1.0.4.tar.gz
  • Upload date:
  • Size: 280.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.12

File hashes

Hashes for hocker-1.0.4.tar.gz
Algorithm Hash digest
SHA256 2886ccba7f5c7eca5c0ea4d98ac323f8c2da2c060b00f21361bbd8263dcf5c57
MD5 a8dd9dfe38df06a32bea43237e002ce2
BLAKE2b-256 ec2eec87c41d23a94cd257cc94a2c92cd08feaf15e5a79915d046c69c5749b38

See more details on using hashes here.

File details

Details for the file hocker-1.0.4-py3-none-any.whl.

File metadata

  • Download URL: hocker-1.0.4-py3-none-any.whl
  • Upload date:
  • Size: 5.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.12

File hashes

Hashes for hocker-1.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 4148b95bb2f5428f06d5186d0af2f18c2d29ebc46d34daf2edcc26aaf17e56dc
MD5 d3b1115ad24fd458951c83a6bd23f49d
BLAKE2b-256 24de8e0cbf53aa8a4e8239e54e813114d6a75398b0c06db40617c7dd9adebab9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page