Skip to main content

Reports broken hyperlinks in PDF documents

Project description

pdf-link-checker is a simple tool that parses a PDF document and checks for broken hyperlinks. This done by sending a simple HTTP request to each link found in a given document.

Getting it running

pip install pdf-link-checker
pdf-link-checker my-awesome-slides.pdf

Options

  • –max-threads

    Specifies the maximum number of allowed threads (default: 100).

    To speedup the run, pdf-link-checker will launch several threads in order to check several links in parallel. This option allows to set a limit to the number of threads.

  • –max-requests-per-host

    Specifies the maximum number of allowed requests per host.

    Some URLs may belong to the same host, and since pdf-link-checker can check many URLs at the same time, we may want to set a limit to the number of requests per host. Otherwise, some hosts may confuse the check with a DoS attack.

Getting help

You can post your questions to our dedicated mailing list:

http://lists.free-electrons.com/mailman/listinfo/pdf-link-checker-updates

TODO

(…because there’s no active project without a TODO list!)

  • Fix: some documents are failing on doc.initialize().

  • Fix: if the URL is a huge document, we should just check and not download it entirely.

  • Replace the thread array into a nice thread pool. Each thread from the pool should take an URL from a (protected) queue. We could also have one queue per host and thus handle the max-requests-per-host constraint without a separate parameter.

Version History

1.1.1
  • Remove extra print, just a leftover

1.1.0
  • Only allow https and ftp URIs. This prevents from failing on mailto: and file:// URIs.

  • Add better exception handling to avoid crashing

  • Add better timeout and request exception handling

  • Fix broken thread management

  • Remove stupid double-requests

  • Several small fixes

1.0.2
  • Updated repo location

  • Moved from distutils to setuptools

1.0.1
  • Version bump

1.0
  • Initial release

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pdf-link-checker-1.1.1.tar.gz (7.1 kB view details)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page