Compare two PDF files and generate a visual diff report with highlighted differences
Project description
PDF-Compare
A powerful tool for comparing PDF files. Generates vector-based side-by-side comparison reports with content-aware highlighting.
Features
- Vector-Based Rendering: Preserves text quality and keeps file sizes small (no image conversion)
- Searchable Output: Generated PDFs maintain searchable, selectable text
- Visual Comparison: Side-by-side view of two PDFs with intelligent page alignment
- Content-Aware Highlighting: Detects text changes based on content, ignoring layout shifts
- Smart Page Alignment: Automatically detects inserted/deleted pages
- Color-Coded Differences:
- Red: Deleted text (on the original document)
- Green: Added text (on the modified document)
- Multiple Interfaces: CLI, GUI Desktop App, and Python API
- Cross-Platform: Works on Windows, macOS, and Linux
Installation
pip install pdf-compare
Or using uv (recommended):
uv pip install pdf-compare
Prerequisites
- Python 3.12+ is required
Windows: Download from python.org and check "Add Python to PATH" during installation.
macOS:
brew install python@3.12
Linux (Ubuntu/Debian):
sudo apt install python3.12 python3.12-venv
Note: No additional dependencies (like Poppler) are required. PyMuPDF handles all PDF operations natively.
Quick Start
CLI Usage
# Compare two PDFs
pdf-compare original.pdf modified.pdf -o diff.pdf
# Launch GUI application
pdf-compare-gui
# Show help
pdf-compare --help
Python API
from pdf_compare import PDFComparator
# Create comparator instance
comparator = PDFComparator('original.pdf', 'modified.pdf')
# Generate comparison report
pdf_bytes = comparator.compare_visuals()
# Save to file
with open('report.pdf', 'wb') as f:
f.write(pdf_bytes)
API Reference
PDFComparator(file_a, file_b)
Main class for comparing PDF files.
Parameters:
file_a(str): Path to the first PDF (Original)file_b(str): Path to the second PDF (Modified)
Methods:
compare_visuals() -> bytes
Generate a vector-based visual comparison report.
Returns: PDF report as bytes, or None if no differences found.
Example:
from pdf_compare import PDFComparator
comparator = PDFComparator('a.pdf', 'b.pdf')
result = comparator.compare_visuals()
if result:
with open('diff.pdf', 'wb') as f:
f.write(result)
print("Report generated successfully")
else:
print("No differences found")
How It Works
- Text Extraction: Extracts text and layout information from each page using PyMuPDF
- Similarity Scoring: Calculates similarity between pages using sequence matching
- Smart Alignment: Detects insertions, deletions, and shifts between documents
- Vector-Based Report: Creates a new PDF that preserves the original vector content
- Visual Highlighting: Adds vector-based highlights over text differences (no rasterization)
- Optimized Output: Maintains searchable text and small file sizes
Example: Inserted Page
If you insert a page in the middle of a document:
- The inserted page is shown with a blank page on the left, labeled "Added"
- Subsequent pages are correctly aligned and labeled as "Shifted"
Project Structure
pdf-compare-py/
├── pdf_compare/
│ ├── __init__.py # Package initialization
│ ├── comparator.py # Core comparison logic
│ ├── cli.py # Command-line interface
│ ├── gui.py # Desktop GUI application
│ └── config.py # Configuration
├── scripts/
│ ├── build_windows.py # Build Windows executable
│ ├── build_linux.py # Build Linux executable
│ └── build_macos.py # Build macOS application
├── sample-files/ # Test PDFs for development
│ ├── original.pdf
│ ├── modified.pdf
│ ├── modified_extra_page.pdf
│ └── modified_removed_page.pdf
└── pyproject.toml # Python package configuration
Development
From Source
git clone https://github.com/grananda/PDF-Compare-Py.git
cd PDF-Compare-Py
uv pip install -e .
Testing:
# Compare sample files
pdf-compare sample-files/original.pdf sample-files/modified.pdf -o test-output.pdf
# Launch GUI
pdf-compare-gui
Sample files included for testing:
sample-files/original.pdf- Base documentsample-files/modified.pdf- Document with text changessample-files/modified_extra_page.pdf- Document with added pagesample-files/modified_removed_page.pdf- Document with removed page
GUI Application
# From source
uv run python pdf_compare/gui.py
# Or after installation
pdf-compare-gui
Building Standalone Executables
Windows Executable:
uv run python scripts/build_windows.py
# Result: dist/PDF Compare.exe
Linux Binary:
uv run python scripts/build_linux.py
# Result: dist/pdf-compare
macOS Application:
uv run python scripts/build_macos.py
# Result: dist/PDF Compare.app
Using as Git Submodule
This package can be integrated into other projects as a Git submodule:
git submodule add https://github.com/grananda/PDF-Compare-Py.git
Then import in your Python code:
from pdf_compare import PDFComparator
License
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
Support
For issues, questions, or contributions, visit: https://github.com/grananda/PDF-Compare-Py
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file py_pdf_compare-2026.2.3.tar.gz.
File metadata
- Download URL: py_pdf_compare-2026.2.3.tar.gz
- Upload date:
- Size: 587.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
49fd5015383311b5399b6a9cf3c94c59040d602a6ca6f7d5fd9b719d626be384
|
|
| MD5 |
44980d198e7bf9efda88cc28bf022880
|
|
| BLAKE2b-256 |
b40409596a3e2f8a39b6f7e2a42de42ea5634e96dc1214aa27fd5013bfc858bb
|
Provenance
The following attestation bundles were made for py_pdf_compare-2026.2.3.tar.gz:
Publisher:
build.yml on grananda/Py-PDF-Compare
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
py_pdf_compare-2026.2.3.tar.gz -
Subject digest:
49fd5015383311b5399b6a9cf3c94c59040d602a6ca6f7d5fd9b719d626be384 - Sigstore transparency entry: 943849560
- Sigstore integration time:
-
Permalink:
grananda/Py-PDF-Compare@82e070d807c3d32259f03f162018a76381d65c50 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/grananda
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
build.yml@82e070d807c3d32259f03f162018a76381d65c50 -
Trigger Event:
push
-
Statement type:
File details
Details for the file py_pdf_compare-2026.2.3-py3-none-any.whl.
File metadata
- Download URL: py_pdf_compare-2026.2.3-py3-none-any.whl
- Upload date:
- Size: 25.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5809e8c5d8cccb1e7d33f62437698368b97cc7a8d85788e4af28e966c27a239d
|
|
| MD5 |
b4bc5584c33a4f0a51ff62e3e5ecdce5
|
|
| BLAKE2b-256 |
b2c45ef6c25ad4da598ed9685d1eb862747ab01cc7edfb271474cc350f5197a9
|
Provenance
The following attestation bundles were made for py_pdf_compare-2026.2.3-py3-none-any.whl:
Publisher:
build.yml on grananda/Py-PDF-Compare
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
py_pdf_compare-2026.2.3-py3-none-any.whl -
Subject digest:
5809e8c5d8cccb1e7d33f62437698368b97cc7a8d85788e4af28e966c27a239d - Sigstore transparency entry: 943849563
- Sigstore integration time:
-
Permalink:
grananda/Py-PDF-Compare@82e070d807c3d32259f03f162018a76381d65c50 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/grananda
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
build.yml@82e070d807c3d32259f03f162018a76381d65c50 -
Trigger Event:
push
-
Statement type: