Extract and summarize highlights from PDF files.
Project description
📘 pdf_highlight_extractor
Extract highlighted text from PDF files using PyMuPDF.
This lightweight utility reads highlights from PDFs, along with the associated page number and highlight color. Perfect for summarizing annotated documents, research papers, or ebooks.
🔧 Installation
Install from PyPI:
pip install pdf-highlight-extractor
🚀 Usage
from pdf_highlight_extractor.reader import extract_highlights
highlights = extract_highlights("sample.pdf")
for h in highlights:
print(f"Page {h['page']} | Color: {h['color']} | Text: {h['text']}")
📝 Output Example
Page 2 | Color: (1.0, 1.0, 0.0) | Text: This is a highlighted phrase
Page 5 | Color: (0.0, 1.0, 0.0) | Text: Another important note
🧠 Features
- ✅ Extract text from highlights
- ✅ Get page number and highlight color
- ✅ Fallback extraction if highlight text is not directly stored
- ✅ Simple API for automation or personal use
🧪 Example PDF
You can test the tool using any PDF with highlights created in:
- Adobe Acrobat Reader
- Preview (macOS)
- Xodo or other PDF apps
📦 Requirements
- Python 3.7+
- PyMuPDF (automatically installed)
Only needed for development:
pip install -e .
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pdf_highlight_extractor-0.1.2.tar.gz.
File metadata
- Download URL: pdf_highlight_extractor-0.1.2.tar.gz
- Upload date:
- Size: 3.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4a00054e752bc1cb39e7f3f545d607eccf25c24fc43d793d3c62d357267a89a2
|
|
| MD5 |
1eb38964f70d489fe680f19c01fc0671
|
|
| BLAKE2b-256 |
4d1917b95ced3b64f33722fa0ad8183e1b7d54de634f2ce58835f587ba618cd9
|
File details
Details for the file pdf_highlight_extractor-0.1.2-py3-none-any.whl.
File metadata
- Download URL: pdf_highlight_extractor-0.1.2-py3-none-any.whl
- Upload date:
- Size: 3.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3e184e6e03fb1d86823a0f07b39ea587573e1eff6d9e1e9d11549f5311c56f97
|
|
| MD5 |
e5035d4239680c4639708297feb7813a
|
|
| BLAKE2b-256 |
9d595578cb7bb7312ff983ee45dba033d22080053c70c8340a66df84f4696d3f
|