Skip to main content

Compare two PDF files and generate a visual diff report with highlighted differences

Project description

PDF-Compare

A powerful tool for comparing PDF files. Generates vector-based side-by-side comparison reports with content-aware highlighting.

Buy Me A Coffee

Features

  • Vector-Based Rendering: Preserves text quality and keeps file sizes small (no image conversion)
  • Searchable Output: Generated PDFs maintain searchable, selectable text
  • Visual Comparison: Side-by-side view of two PDFs with intelligent page alignment
  • Content-Aware Highlighting: Detects text changes based on content, ignoring layout shifts
  • Smart Page Alignment: Automatically detects inserted/deleted pages
  • Color-Coded Differences:
    • Red: Deleted text (on the original document)
    • Green: Added text (on the modified document)
  • Multiple Interfaces: CLI, GUI Desktop App, and Python API
  • Cross-Platform: Works on Windows, macOS, and Linux

Installation

pip install pdf-compare

Or using uv (recommended):

uv pip install pdf-compare

Prerequisites

  • Python 3.12+ is required

Windows: Download from python.org and check "Add Python to PATH" during installation.

macOS:

brew install python@3.12

Linux (Ubuntu/Debian):

sudo apt install python3.12 python3.12-venv

Note: No additional dependencies (like Poppler) are required. PyMuPDF handles all PDF operations natively.

Quick Start

CLI Usage

# Compare two PDFs
pdf-compare original.pdf modified.pdf -o diff.pdf

# Launch GUI application
pdf-compare-gui

# Show help
pdf-compare --help

Python API

from pdf_compare import PDFComparator

# Create comparator instance
comparator = PDFComparator('original.pdf', 'modified.pdf')

# Generate comparison report
pdf_bytes = comparator.compare_visuals()

# Save to file
with open('report.pdf', 'wb') as f:
    f.write(pdf_bytes)

API Reference

PDFComparator(file_a, file_b)

Main class for comparing PDF files.

Parameters:

  • file_a (str): Path to the first PDF (Original)
  • file_b (str): Path to the second PDF (Modified)

Methods:

compare_visuals() -> bytes

Generate a vector-based visual comparison report.

Returns: PDF report as bytes, or None if no differences found.

Example:

from pdf_compare import PDFComparator

comparator = PDFComparator('a.pdf', 'b.pdf')
result = comparator.compare_visuals()

if result:
    with open('diff.pdf', 'wb') as f:
        f.write(result)
    print("Report generated successfully")
else:
    print("No differences found")

How It Works

  1. Text Extraction: Extracts text and layout information from each page using PyMuPDF
  2. Similarity Scoring: Calculates similarity between pages using sequence matching
  3. Smart Alignment: Detects insertions, deletions, and shifts between documents
  4. Vector-Based Report: Creates a new PDF that preserves the original vector content
  5. Visual Highlighting: Adds vector-based highlights over text differences (no rasterization)
  6. Optimized Output: Maintains searchable text and small file sizes

Example: Inserted Page

If you insert a page in the middle of a document:

  • The inserted page is shown with a blank page on the left, labeled "Added"
  • Subsequent pages are correctly aligned and labeled as "Shifted"

Project Structure

pdf-compare-py/
├── pdf_compare/
│   ├── __init__.py         # Package initialization
│   ├── comparator.py       # Core comparison logic
│   ├── cli.py              # Command-line interface
│   ├── gui.py              # Desktop GUI application
│   └── config.py           # Configuration
├── scripts/
│   ├── build_windows.py    # Build Windows executable
│   ├── build_linux.py      # Build Linux executable
│   └── build_macos.py      # Build macOS application
├── sample-files/           # Test PDFs for development
│   ├── original.pdf
│   ├── modified.pdf
│   ├── modified_extra_page.pdf
│   └── modified_removed_page.pdf
└── pyproject.toml          # Python package configuration

Development

From Source

git clone https://github.com/grananda/PDF-Compare-Py.git
cd PDF-Compare-Py
uv pip install -e .

Testing:

# Compare sample files
pdf-compare sample-files/original.pdf sample-files/modified.pdf -o test-output.pdf

# Launch GUI
pdf-compare-gui

Sample files included for testing:

  • sample-files/original.pdf - Base document
  • sample-files/modified.pdf - Document with text changes
  • sample-files/modified_extra_page.pdf - Document with added page
  • sample-files/modified_removed_page.pdf - Document with removed page

GUI Application

# From source
uv run python pdf_compare/gui.py

# Or after installation
pdf-compare-gui

Building Standalone Executables

Windows Executable:

uv run python scripts/build_windows.py
# Result: dist/PDF Compare.exe

Linux Binary:

uv run python scripts/build_linux.py
# Result: dist/pdf-compare

macOS Application:

uv run python scripts/build_macos.py
# Result: dist/PDF Compare.app

Using as Git Submodule

This package can be integrated into other projects as a Git submodule:

git submodule add https://github.com/grananda/PDF-Compare-Py.git

Then import in your Python code:

from pdf_compare import PDFComparator

License

MIT

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Support

For issues, questions, or contributions, visit: https://github.com/grananda/PDF-Compare-Py

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

py_pdf_compare-2026.2.3.tar.gz (587.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

py_pdf_compare-2026.2.3-py3-none-any.whl (25.6 kB view details)

Uploaded Python 3

File details

Details for the file py_pdf_compare-2026.2.3.tar.gz.

File metadata

  • Download URL: py_pdf_compare-2026.2.3.tar.gz
  • Upload date:
  • Size: 587.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for py_pdf_compare-2026.2.3.tar.gz
Algorithm Hash digest
SHA256 49fd5015383311b5399b6a9cf3c94c59040d602a6ca6f7d5fd9b719d626be384
MD5 44980d198e7bf9efda88cc28bf022880
BLAKE2b-256 b40409596a3e2f8a39b6f7e2a42de42ea5634e96dc1214aa27fd5013bfc858bb

See more details on using hashes here.

Provenance

The following attestation bundles were made for py_pdf_compare-2026.2.3.tar.gz:

Publisher: build.yml on grananda/Py-PDF-Compare

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file py_pdf_compare-2026.2.3-py3-none-any.whl.

File metadata

File hashes

Hashes for py_pdf_compare-2026.2.3-py3-none-any.whl
Algorithm Hash digest
SHA256 5809e8c5d8cccb1e7d33f62437698368b97cc7a8d85788e4af28e966c27a239d
MD5 b4bc5584c33a4f0a51ff62e3e5ecdce5
BLAKE2b-256 b2c45ef6c25ad4da598ed9685d1eb862747ab01cc7edfb271474cc350f5197a9

See more details on using hashes here.

Provenance

The following attestation bundles were made for py_pdf_compare-2026.2.3-py3-none-any.whl:

Publisher: build.yml on grananda/Py-PDF-Compare

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page