Extract text that has been highlighted in PDF documents.

These details have not been verified by PyPI

Project links

Project description

`extract-pdf-highlighted-text`

Extract text that has been highlighted in PDF documents.

How it works

Locates all highlight annotations in each page using PyPDF2.
Computes the bounding boxes of each highlight annotation.
Uses pdfminer.six to determine locations of all visible characters on the page.
For each annotation, matches the characters whose bounding boxes overlap the annotation's bounding box (using IoU).
Groups and prints out the highlighted text in reading order.

Installation

pip install extract-pdf-highlighted-text

After installation, run it as extract_pdf_highlighted_text.

Dependencies:

PyPDF2 (for annotation geometry)
pdfminer.six (for text locations)

Usage

extract_pdf_highlighted_text your_file.pdf

The script will print each extracted highlight in reading order.

Example Output

This is a highlighted passage.

Another highlighted bit here.

Limitations

Does not support image-based PDFs (no OCR).
Precision may depend on PDF quality and producer.

Contributing

Contributions are welcome! Please submit pull requests or open issues on the GitHub repository.

License

This project is licensed under the MIT License.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0a1 pre-release

Apr 12, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

extract_pdf_highlighted_text-0.1.0a1.tar.gz (4.7 kB view details)

Uploaded Apr 12, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

extract_pdf_highlighted_text-0.1.0a1-py2.py3-none-any.whl (5.4 kB view details)

Uploaded Apr 12, 2026 Python 2Python 3

File details

Details for the file extract_pdf_highlighted_text-0.1.0a1.tar.gz.

File metadata

Download URL: extract_pdf_highlighted_text-0.1.0a1.tar.gz
Upload date: Apr 12, 2026
Size: 4.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for extract_pdf_highlighted_text-0.1.0a1.tar.gz
Algorithm	Hash digest
SHA256	`23aa3645f77ff07d155c2a53810c5f044d8734409fc2994e49d4ffe81697bd0c`
MD5	`b91a6eaf2cee513720766f0608bafabd`
BLAKE2b-256	`d890b620d0e3c9c2a7a08718f8c36d876d3d6cd54a924ac2755fcd40dc1a3354`

See more details on using hashes here.

File details

Details for the file extract_pdf_highlighted_text-0.1.0a1-py2.py3-none-any.whl.

File metadata

Download URL: extract_pdf_highlighted_text-0.1.0a1-py2.py3-none-any.whl
Upload date: Apr 12, 2026
Size: 5.4 kB
Tags: Python 2, Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for extract_pdf_highlighted_text-0.1.0a1-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`29735eec2c0b2e36ee013245e38dfe7bcfd583aad794b7c4ad075142adb5ea15`
MD5	`ddbc2de5244b00ac22af252c5a8b6a40`
BLAKE2b-256	`c0d9979723818c263b8be93c5db607fdb8063175c684fbc86247484b0576ae5b`

See more details on using hashes here.

extract-pdf-highlighted-text 0.1.0a1

Navigation

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Project description

`extract-pdf-highlighted-text`

How it works

Installation

Usage

Example Output

Limitations

Contributing

License

Project details

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes