Turn image-only PDFs into searchable, selectable PDFs with an OCR text layer
Project description
readablepdf
readablepdf converts image-only PDFs into searchable and selectable PDFs by adding an OCR text layer while preserving page visuals.
How it works
- Render each page to a PNG (
pdf2imagebacked by Poppler). - OCR each page with Tesseract in PDF mode (image layer + invisible text layer).
- Merge all OCR page PDFs into one output PDF.
All intermediate files are written inside an OS temporary directory and removed automatically.
System dependencies (Linux + macOS)
You need these binaries on the machine where you run readablepdf:
tesseractpdftoppm(from Poppler)
Ubuntu/Debian:
sudo apt update
sudo apt install -y tesseract-ocr poppler-utils
macOS (Homebrew):
brew install tesseract poppler
If you need languages beyond English, install matching Tesseract language packs.
Install and run with pipx
Once published on PyPI, you can run it directly without managing a virtualenv:
pipx run readablepdf input.pdf
Custom output/language/DPI:
pipx run readablepdf input.pdf -o output_ocr.pdf --lang eng --dpi 200
Local development
python3 -m venv .venv
source .venv/bin/activate
pip install -e .[dev]
ruff check .
ruff format --check .
pytest -q
python -m build
GitHub Actions
CIworkflow: lint + format check + tests + build on Ubuntu and macOS.Publish to PyPIworkflow: runs on GitHub Release publish (and manual trigger).
GitHub configuration needed for publish
- Create repository secret:
PYPI_API_TOKEN. - Set it to a PyPI API token with upload permission for this project.
- (Optional but recommended) create a protected environment named
pypiand require approvals.
Release flow
- Merge to
main. - Create a Git tag/release (for example
v0.1.1). Publish to PyPIworkflow uploads artifacts to PyPI with version derived from that tag.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file readablepdf-0.2.0.tar.gz.
File metadata
- Download URL: readablepdf-0.2.0.tar.gz
- Upload date:
- Size: 5.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2096fa5f11b9c4933ee1f3fe816c660f6c678fffdee21e08831c445e9b43dd70
|
|
| MD5 |
a7d16b9c78ecb8900540cfa2b4eded41
|
|
| BLAKE2b-256 |
c0b18e20619e395c1b78e4d8434f13e9624ab413aed0281693bfda7f6253a354
|
Provenance
The following attestation bundles were made for readablepdf-0.2.0.tar.gz:
Publisher:
publish.yml on acoomans/readablepdf
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
readablepdf-0.2.0.tar.gz -
Subject digest:
2096fa5f11b9c4933ee1f3fe816c660f6c678fffdee21e08831c445e9b43dd70 - Sigstore transparency entry: 1065966277
- Sigstore integration time:
-
Permalink:
acoomans/readablepdf@7040322587dc93c8376ed186e0d8ed6a7ddd89f2 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/acoomans
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@7040322587dc93c8376ed186e0d8ed6a7ddd89f2 -
Trigger Event:
release
-
Statement type:
File details
Details for the file readablepdf-0.2.0-py3-none-any.whl.
File metadata
- Download URL: readablepdf-0.2.0-py3-none-any.whl
- Upload date:
- Size: 5.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c94fd62276552b2bd51d2da177f94a3f27d24eca4c85604a1455f028a350fb99
|
|
| MD5 |
b071bb56aa9180af6c62641d74097f5d
|
|
| BLAKE2b-256 |
30ac3f964156ec064f4f9698bee5587d72b4d9cedf4b55a8358608e51c5c4c02
|
Provenance
The following attestation bundles were made for readablepdf-0.2.0-py3-none-any.whl:
Publisher:
publish.yml on acoomans/readablepdf
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
readablepdf-0.2.0-py3-none-any.whl -
Subject digest:
c94fd62276552b2bd51d2da177f94a3f27d24eca4c85604a1455f028a350fb99 - Sigstore transparency entry: 1065966290
- Sigstore integration time:
-
Permalink:
acoomans/readablepdf@7040322587dc93c8376ed186e0d8ed6a7ddd89f2 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/acoomans
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@7040322587dc93c8376ed186e0d8ed6a7ddd89f2 -
Trigger Event:
release
-
Statement type: