pdf-link-checker·PyPI

Reports broken hyperlinks in PDF documents

Project description

pdf-link-checker is a simple tool that parses a PDF document and checks for broken hyperlinks. This done by sending a simple HTTP request to each link found in a given document.

Getting it running

pip install pdf-link-checker
pdf-link-checker my-awesome-slides.pdf

Options

–max-threads

Specifies the maximum number of allowed threads (default: 100).

To speedup the run, pdf-link-checker will launch several threads in order to check several links in parallel. This option allows to set a limit to the number of threads.
–max-requests-per-host

Specifies the maximum number of allowed requests per host.

Some URLs may belong to the same host, and since pdf-link-checker can check many URLs at the same time, we may want to set a limit to the number of requests per host. Otherwise, some hosts may confuse the check with a DoS attack.

Getting help

You can post your questions to our dedicated mailing list:

http://lists.free-electrons.com/mailman/listinfo/pdf-link-checker-updates

TODO

(…because there’s no active project without a TODO list!)

Fix: some documents are failing on doc.initialize().
Fix: if the URL is a huge document, we should just check and not download it entirely.
Replace the thread array into a nice thread pool. Each thread from the pool should take an URL from a (protected) queue. We could also have one queue per host and thus handle the max-requests-per-host constraint without a separate parameter.

Version History

1.1.1

Remove extra print, just a leftover

1.1.0

Only allow https and ftp URIs. This prevents from failing on mailto: and file:// URIs.
Add better exception handling to avoid crashing
Add better timeout and request exception handling
Fix broken thread management
Remove stupid double-requests
Several small fixes

1.0.2

Updated repo location
Moved from distutils to setuptools

1.0.1

Version bump

1.0

Initial release

Project details

Release history Release notifications | RSS feed

This version

1.1.1

May 6, 2013

1.1.0

May 6, 2013

1.0.2

May 3, 2013

1.0.1

Apr 12, 2013

1.0

Apr 12, 2013

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pdf-link-checker-1.1.1.tar.gz (7.1 kB view details)

Uploaded May 6, 2013 Source

File details

Details for the file pdf-link-checker-1.1.1.tar.gz.

File metadata

Download URL: pdf-link-checker-1.1.1.tar.gz
Upload date: May 6, 2013
Size: 7.1 kB
Tags: Source
Uploaded using Trusted Publishing? No

File hashes

Hashes for pdf-link-checker-1.1.1.tar.gz
Algorithm	Hash digest
SHA256	`525bcc512a076e4e3961c2a931b614321eb4d6028ff7a5965c2afed215499d24`
MD5	`b0364f365dc9d514d6236aa7301bf68a`
BLAKE2b-256	`d90ae4f65923861a6710828ad3d9b5bab5a7cdfed4714131fa2d84b0fef73bda`

See more details on using hashes here.

pdf-link-checker 1.1.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta