Skip to main content

Reports broken hyperlinks in PDF documents

Project description

pdf-link-checker is a simple tool that parses a PDF document and checks for broken hyperlinks. This done by sending a simple HTTP request to each link found in a given document.

Getting it running

pip install pdf-link-checker
pdf-link-checker my-awesome-slides.pdf

Options

  • –max-threads

    Specifies the maximum number of allowed threads (default: 100).

    To speedup the run, pdf-link-checker will launch several threads in order to check several links in parallel. This option allows to set a limit to the number of threads.

  • –max-requests-per-host

    Specifies the maximum number of allowed requests per host.

    Some URLs may belong to the same host, and since pdf-link-checker can check many URLs at the same time, we may want to set a limit to the number of requests per host. Otherwise, some hosts may confuse the check with a DoS attack.

Getting help

You can post your questions to our dedicated mailing list:

http://lists.free-electrons.com/mailman/listinfo/pdf-link-checker-updates

TODO

(…because there’s no active project without a TODO list!)

  • Fix: some documents are failing on doc.initialize().
  • Fix: if the URL is a huge document, we should just check and not download it entirely.
  • Replace the thread array into a nice thread pool. Each thread from the pool should take an URL from a (protected) queue. We could also have one queue per host and thus handle the max-requests-per-host constraint without a separate parameter.

Version History

1.1.1
  • Remove extra print, just a leftover
1.1.0
  • Only allow https and ftp URIs. This prevents from failing on mailto: and file:// URIs.
  • Add better exception handling to avoid crashing
  • Add better timeout and request exception handling
  • Fix broken thread management
  • Remove stupid double-requests
  • Several small fixes
1.0.2
  • Updated repo location
  • Moved from distutils to setuptools
1.0.1
  • Version bump
1.0
  • Initial release

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for pdf-link-checker, version 1.1.1
Filename, size File type Python version Upload date Hashes
Filename, size pdf-link-checker-1.1.1.tar.gz (7.1 kB) File type Source Python version None Upload date Hashes View

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring DigiCert DigiCert EV certificate Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page