Skip to main content

Compare two PDF files and generate a visual diff report with highlighted differences

Project description

PDF-Compare

A powerful tool for comparing PDF files. Generates vector-based side-by-side comparison reports with content-aware highlighting.

Buy Me A Coffee

Features

  • Vector-Based Rendering: Preserves text quality and keeps file sizes small (no image conversion)
  • Searchable Output: Generated PDFs maintain searchable, selectable text
  • Visual Comparison: Side-by-side view of two PDFs with intelligent page alignment
  • Content-Aware Highlighting: Detects text changes based on content, ignoring layout shifts
  • Smart Page Alignment: Automatically detects inserted/deleted pages
  • Color-Coded Differences:
    • Red: Deleted text (on the original document)
    • Green: Added text (on the modified document)
  • Multiple Interfaces: CLI, GUI Desktop App, and Python API
  • Cross-Platform: Works on Windows, macOS, and Linux

Installation

pip install pdf-compare

Or using uv (recommended):

uv pip install pdf-compare

Prerequisites

  • Python 3.12+ is required

Windows: Download from python.org and check "Add Python to PATH" during installation.

macOS:

brew install python@3.12

Linux (Ubuntu/Debian):

sudo apt install python3.12 python3.12-venv

Note: No additional dependencies (like Poppler) are required. PyMuPDF handles all PDF operations natively.

Quick Start

CLI Usage

# Compare two PDFs
pdf-compare original.pdf modified.pdf -o diff.pdf

# Launch GUI application
pdf-compare-gui

# Show help
pdf-compare --help

Python API

from pdf_compare import PDFComparator

# Create comparator instance
comparator = PDFComparator('original.pdf', 'modified.pdf')

# Generate comparison report
pdf_bytes = comparator.compare_visuals()

# Save to file
with open('report.pdf', 'wb') as f:
    f.write(pdf_bytes)

API Reference

PDFComparator(file_a, file_b)

Main class for comparing PDF files.

Parameters:

  • file_a (str): Path to the first PDF (Original)
  • file_b (str): Path to the second PDF (Modified)

Methods:

compare_visuals() -> bytes

Generate a vector-based visual comparison report.

Returns: PDF report as bytes, or None if no differences found.

Example:

from pdf_compare import PDFComparator

comparator = PDFComparator('a.pdf', 'b.pdf')
result = comparator.compare_visuals()

if result:
    with open('diff.pdf', 'wb') as f:
        f.write(result)
    print("Report generated successfully")
else:
    print("No differences found")

How It Works

  1. Text Extraction: Extracts text and layout information from each page using PyMuPDF
  2. Similarity Scoring: Calculates similarity between pages using sequence matching
  3. Smart Alignment: Detects insertions, deletions, and shifts between documents
  4. Vector-Based Report: Creates a new PDF that preserves the original vector content
  5. Visual Highlighting: Adds vector-based highlights over text differences (no rasterization)
  6. Optimized Output: Maintains searchable text and small file sizes

Example: Inserted Page

If you insert a page in the middle of a document:

  • The inserted page is shown with a blank page on the left, labeled "Added"
  • Subsequent pages are correctly aligned and labeled as "Shifted"

Project Structure

pdf-compare-py/
├── pdf_compare/
│   ├── __init__.py         # Package initialization
│   ├── comparator.py       # Core comparison logic
│   ├── cli.py              # Command-line interface
│   ├── gui.py              # Desktop GUI application
│   └── config.py           # Configuration
├── scripts/
│   ├── build_windows.py    # Build Windows executable
│   ├── build_linux.py      # Build Linux executable
│   └── build_macos.py      # Build macOS application
├── sample-files/           # Test PDFs for development
│   ├── original.pdf
│   ├── modified.pdf
│   ├── modified_extra_page.pdf
│   └── modified_removed_page.pdf
└── pyproject.toml          # Python package configuration

Development

From Source

git clone https://github.com/grananda/PDF-Compare-Py.git
cd PDF-Compare-Py
uv pip install -e .

Testing:

# Compare sample files
pdf-compare sample-files/original.pdf sample-files/modified.pdf -o test-output.pdf

# Launch GUI
pdf-compare-gui

Sample files included for testing:

  • sample-files/original.pdf - Base document
  • sample-files/modified.pdf - Document with text changes
  • sample-files/modified_extra_page.pdf - Document with added page
  • sample-files/modified_removed_page.pdf - Document with removed page

GUI Application

# From source
uv run python pdf_compare/gui.py

# Or after installation
pdf-compare-gui

Building Standalone Executables

Windows Executable:

uv run python scripts/build_windows.py
# Result: dist/PDF Compare.exe

Linux Binary:

uv run python scripts/build_linux.py
# Result: dist/pdf-compare

macOS Application:

uv run python scripts/build_macos.py
# Result: dist/PDF Compare.app

Using as Git Submodule

This package can be integrated into other projects as a Git submodule:

git submodule add https://github.com/grananda/PDF-Compare-Py.git

Then import in your Python code:

from pdf_compare import PDFComparator

License

MIT

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Support

For issues, questions, or contributions, visit: https://github.com/grananda/PDF-Compare-Py

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

py_pdf_compare-2026.2.1.tar.gz (587.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

py_pdf_compare-2026.2.1-py3-none-any.whl (25.6 kB view details)

Uploaded Python 3

File details

Details for the file py_pdf_compare-2026.2.1.tar.gz.

File metadata

  • Download URL: py_pdf_compare-2026.2.1.tar.gz
  • Upload date:
  • Size: 587.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for py_pdf_compare-2026.2.1.tar.gz
Algorithm Hash digest
SHA256 84a2786309847f6ec5e52e11a44e46ad0121445857fd7ac4cb43d065403742a2
MD5 8300c9dfa7a4755d820d3cb13a07e962
BLAKE2b-256 1240569b2fcb68b93ddd812ae8733635828dd4d560ca860f752684d31de60511

See more details on using hashes here.

Provenance

The following attestation bundles were made for py_pdf_compare-2026.2.1.tar.gz:

Publisher: build.yml on grananda/Py-PDF-Compare

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file py_pdf_compare-2026.2.1-py3-none-any.whl.

File metadata

File hashes

Hashes for py_pdf_compare-2026.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 99aca55b78c0fd53d94b1250a8679dd2390c768329ee39b1a5d04ad867fd6b84
MD5 c3b272dc1d74ced86343180cd6fac667
BLAKE2b-256 f02e299d30cc403b1d070f186fd1b26493a83b43aadef92ba33173fae71c2d49

See more details on using hashes here.

Provenance

The following attestation bundles were made for py_pdf_compare-2026.2.1-py3-none-any.whl:

Publisher: build.yml on grananda/Py-PDF-Compare

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page