Visual diff for born-digital PDFs — highlights changes directly on the original pages
Project description
pdfdelta
pdfdelta is a lightweight visual diff tool for born-digital PDFs.
Given an old and a new version of a PDF, it writes highlights directly onto the original pages so revisions are easy to review: deletions on the old file, additions on the new file.
It is mainly designed for academic papers and technical documents, where small wording changes matter and layout is part of the review process.
Features
- Highlights changes directly on the original PDF pages
- Works well for born-digital PDFs such as papers, reports, and drafts
- Handles multi-column layouts better than plain text diff tools
- Tries to reduce noisy highlights from simple reflow
- Keeps the review workflow visual and page-based
Installation
If you are using the repository directly:
pip install git+https://github.com/mli55/pdfdelta.git
Usage
pdfdelta old.pdf new.pdf
This writes two annotated files:
old_marked.pdf— original pages with deletions highlightednew_marked.pdf— revised pages with additions highlighted
Options
| Flag | Default | Description |
|---|---|---|
--old-out |
old_marked.pdf |
Output path for the annotated old PDF |
--new-out |
new_marked.pdf |
Output path for the annotated new PDF |
--opacity |
0.35 |
Highlight opacity (0.0–1.0) |
How It Works
old.pdf new.pdf
│ │
▼ ▼
┌──────────────────┐
│ Extract words │ PyMuPDF: word text + bounding boxes
└────────┬─────────┘
▼
┌──────────────────┐
│ Global diff │ Flatten all pages → SequenceMatcher
└────────┬─────────┘
▼
┌──────────────────┐
│ Word-level diff │ Per-word & sub-word precision
└────────┬─────────┘
▼
┌──────────────────┐
│ Reflow filter │ Suppress cross-page / cross-column noise
└────────┬─────────┘
▼
┌──────────────────┐
│ Annotate PDFs │ Highlights on original pages
└────────┬─────────┘
▼
old_marked.pdf
new_marked.pdf
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pdfdelta-0.1.0.tar.gz.
File metadata
- Download URL: pdfdelta-0.1.0.tar.gz
- Upload date:
- Size: 11.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
39a48b4e20f087ebaf7fc6c96800af164dde995a8d23c29c584dd6c6ff2cb5c0
|
|
| MD5 |
d10642f0182252738be0efb70d7fc20f
|
|
| BLAKE2b-256 |
ca585f1066400c3e3e07c87e98ec24943332ba6be551e51abb262cee3c7cbec2
|
File details
Details for the file pdfdelta-0.1.0-py3-none-any.whl.
File metadata
- Download URL: pdfdelta-0.1.0-py3-none-any.whl
- Upload date:
- Size: 11.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a61e564848df7ac1bbe47ecc21a0ca9b2e223422da5fefbb71eca3cd0e2725aa
|
|
| MD5 |
38220c808494f8f69d98f82d5e1bc330
|
|
| BLAKE2b-256 |
4713325a0ae01d9e3039ac2315597641728da157be07a891a47eb8e0d081773f
|