Skip to main content

Github Action and CLI tool to automatically check all links in a PDF for dead or broken links.

Project description

PDF Link Checker

Github Action and CLI tool to automatically check all links in a PDF for availability to find dead or broken links.

Usage

Install the package using pip

pip install pdflinkchecker_cli

and use it as a CLI tool

pdflinkchecker .                          # to search for and check all pdfs in the current directory recursively
pdflinkchecker path/to/pdf                # to check a specific file
pdflinkchecker path/to/pdf1 path/to/pdf2  # to check multiple specific files

Exemplary output looks like this

Analyzed /data/dummy1.pdf, found the following types of links/http codes: {200: 13}

Analyzed /data/dummy2.pdf, found the following types of links/http codes: {'mail': 4, 'tel': 4, 200: 49, 'error': 3}
|   Page Number | URL                      | Details                                                             |
|---------------+--------------------------+---------------------------------------------------------------------|
|             1 | https://www.example1.com | <HTTPError 999: 'INKApi Error'>                                     |
|             1 | https://www.example2.com | URLError(timeout('_ssl.c:1112: The handshake operation timed out')) |
|             1 | https://www.example3.com | <HTTPError 403: 'Forbidden'>                                        |

To use the Github Action, create a pdf_link_checker.yml in .github/workflows:

on: [ push ]

jobs:
  check_pdf_links:
    runs-on: ubuntu-latest
    name: Check PDF Links
    steps:
      - name: Checkout
        uses: actions/checkout@v3
      - name: PDF Link Checker
        uses: a-nau/pdf-link-checker@v0.2.0
        with:
          paths: '.'  # checks all PDFs, otherwise specify to file path(s)

To run within Docker

docker build -t pdf_link_checker .
docker run -it --rm --mount type=bind,source=${PWD},target=/data/ --name pdf_link_checker pdf_link_checker /data/.

Credits

I started from pdf-link-checker by Matt Briggs. Especially get_links_from_page is still heavily borrowed from the original.

This work is licensed under the MIT license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pdflinkchecker_cli-0.2.0.tar.gz (5.1 kB view details)

Uploaded Source

Built Distribution

pdflinkchecker_cli-0.2.0-py3-none-any.whl (5.7 kB view details)

Uploaded Python 3

File details

Details for the file pdflinkchecker_cli-0.2.0.tar.gz.

File metadata

  • Download URL: pdflinkchecker_cli-0.2.0.tar.gz
  • Upload date:
  • Size: 5.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.16

File hashes

Hashes for pdflinkchecker_cli-0.2.0.tar.gz
Algorithm Hash digest
SHA256 62954960efe6778b84a777bbc1f26d2f130d0f22bf8da9ec0f57831b39a6a9a4
MD5 75077d007653807ad0f5d9ed035cfa0f
BLAKE2b-256 bb42886fd20954d20f665b45219eb43130ef5504d8c857a40171f47ebee56554

See more details on using hashes here.

File details

Details for the file pdflinkchecker_cli-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for pdflinkchecker_cli-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 6042d3adf2992bd7a76cc5a5bb0ccb763e05f2dc5f9ae8345821a26f66e0e1bc
MD5 f7976033c768ab500c2536a829ce785a
BLAKE2b-256 2ac4f3347b9afe28c9ca79b692f5608d92ecbc4b09658d5b2761c2ac7dd565d3

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page