Python package to convert PDF to text using OCR

These details have not been verified by PyPI

Project links

Project description

Code Style Pre-Commit Enabled

py-ocr-pdf

This project has been designed to allow you to OCR PDF files regardless of whether the PDF contains text or images.

Python Support

This project only actively supports current Python versions, Python 3.10 to 3.14.

Installation

You can install this package from pip using

pip install py-ocr-pdf

OS Dependencies

poppler-utils
tesseract-ocr

Linux PDF OCR Support

Install the following sudo apt-get install poppler-utils tesseract-ocr

Mac OS PDF OCR Support

This project uses pdftoppm and tesseract-ocr so you need to install poppler-utils and tesseract-ocr.

brew install poppler

Windows OS PDF OCR Support

On Windows you can install pdftoppm by following the instructions here:

Go to https://github.com/oschwartz10612/poppler-windows
Navigate there to the latest release
Download the zip
Unzip and save the files in a new folder
After you have installed the Zotero OCR plugin, adjust the location of pdftoppm in your settings

Found a Bug?

Issues are tracked via GitHub issues at the project issue page

Have A Feature Request?

Feature requests can be raised by creating an issue within the project issue page, but please create the issue with "Feature Request -" at the start of the issue

Testing

To run the tests use

coverage erase && \
python -W error::DeprecationWarning -W error::PendingDeprecationWarning -m coverage run --parallel -m pytest --ds tests.settings && \
coverage combine && \
coverage report

Compiling Requirements

Run pip install pip-tools then run python requirements/compile.py to generate the various requirements files. I use two local VIRTUALENVS to build the requirements, one running Python3.8 and the other running Python 3.11.

Building

This project uses hatchling python -m build --sdist

tox

Contributing

Check for open issues at the project issue page or open a new issue to start a discussion about a feature or bug.
Fork the repository on GitHub to start making changes.
Clone the repository
Initialise pre-commit by running pre-commit install
Install requirements from one of the requirement files
Add a test case to show that the bug is fixed or the feature is implemented correctly.
Test using python -W error::DeprecationWarning -W error::PendingDeprecationWarning -m coverage run --parallel -m pytest --ds tests.settings
Create a pull request, tagging the issue, bug me until I can merge your pull request. Also, don't forget to add yourself to AUTHORS.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.0.1

Aug 29, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

py_ocr_pdf-0.0.1.tar.gz (21.5 kB view details)

Uploaded Aug 29, 2025 Source

File details

Details for the file py_ocr_pdf-0.0.1.tar.gz.

File metadata

Download URL: py_ocr_pdf-0.0.1.tar.gz
Upload date: Aug 29, 2025
Size: 21.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.5

File hashes

Hashes for py_ocr_pdf-0.0.1.tar.gz
Algorithm	Hash digest
SHA256	`29a6a0928d081dc449d8e7eff56bfe008e6c70f79828dfd8def8e95e4685298a`
MD5	`91c1b5926b0ff9807136e03889534f58`
BLAKE2b-256	`897579228490f98d298d374c35a5c920186aa4df54e8c5510cdd3401c89c625a`

See more details on using hashes here.

py-ocr-pdf 0.0.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

py-ocr-pdf

Python Support

Installation

OS Dependencies

Linux PDF OCR Support

Mac OS PDF OCR Support

Windows OS PDF OCR Support

Found a Bug?

Have A Feature Request?

Testing

Compiling Requirements

Building

tox

Contributing

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes