Skip to main content

Finds differences between two PDF documents

Project description

pdf-diff

Finds differences between two PDF documents:

  1. Compares the text layers of two PDF documents and outputs the bounding boxes of changed text in JSON.
  2. Rasterizes the changed pages in the PDFs to a PNG and draws red outlines around changed text.

Example Image Output

The script is written in Python 3, and it relies on the pdftotext program.

Requirements

libxml2 >= 2.7.0, libxslt >= 1.1.23, poppler

Requirements installation for Ubuntu:

sudo apt-get install python3-lxml poppler-utils

Requirements installation for OS X:

brew install libxml2 libxslt poppler

Installation

From PyPI:

pip install pdf-diff

From source:

sudo python3 setup.py install

Running

Turn two PDFs into one large PNG image showing the differences:

pdf-diff before.pdf after.pdf > comparison_output.png

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pdf-diff-0.9.1.tar.gz (8.0 kB view details)

Uploaded Source

Built Distributions

pdf_diff-0.9.1-py3.5.egg (15.6 kB view details)

Uploaded Source

pdf_diff-0.9.1-py3-none-any.whl (11.6 kB view details)

Uploaded Python 3

File details

Details for the file pdf-diff-0.9.1.tar.gz.

File metadata

  • Download URL: pdf-diff-0.9.1.tar.gz
  • Upload date:
  • Size: 8.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.4.2 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.32.2 CPython/3.5.2

File hashes

Hashes for pdf-diff-0.9.1.tar.gz
Algorithm Hash digest
SHA256 6d31fe792a7fe3278e20a3a56c71c50cf895d650d4053cd02f12324137d352fe
MD5 d988f178c1f03c84ecb123b8440b7b50
BLAKE2b-256 31e0efcd2a80d5a2ca58265a26598e24b17f0b838ec2677bdab5232ba4e72abe

See more details on using hashes here.

File details

Details for the file pdf_diff-0.9.1-py3.5.egg.

File metadata

  • Download URL: pdf_diff-0.9.1-py3.5.egg
  • Upload date:
  • Size: 15.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.4.2 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.32.2 CPython/3.5.2

File hashes

Hashes for pdf_diff-0.9.1-py3.5.egg
Algorithm Hash digest
SHA256 105a8d147552866da6d32860377ec84625044b43407bb59bf1f7ceb66d1f0c93
MD5 332a7af33443036d87a763db607223b3
BLAKE2b-256 8744577264de99646cc14786c572b9a15827f5c4ea2d6040c04050c4850334c4

See more details on using hashes here.

File details

Details for the file pdf_diff-0.9.1-py3-none-any.whl.

File metadata

  • Download URL: pdf_diff-0.9.1-py3-none-any.whl
  • Upload date:
  • Size: 11.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.4.2 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.32.2 CPython/3.5.2

File hashes

Hashes for pdf_diff-0.9.1-py3-none-any.whl
Algorithm Hash digest
SHA256 ffc2bcc8a0db1cfb4a6728f3374f8e025cfcffc1e3a3ab9b1245d4964a42ddeb
MD5 70a052342883854422b2479748570f38
BLAKE2b-256 d668f212aa12ca9c9b2654b9a42957752b38374f5e61255f6fc8355bf0881b86

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page