Github Action and CLI tool to automatically check all links in a PDF for dead or broken links.
Project description
PDF Link Checker
Github Action and CLI tool to automatically check all links in a PDF for availability to find dead or broken links.
Usage
Install the package using pip
pip install pdflinkchecker_cli
and use it as a CLI tool
pdflinkchecker . # to search for and check all pdfs in the current directory recursively
pdflinkchecker path/to/pdf # to check a specific file
pdflinkchecker path/to/pdf1 path/to/pdf2 # to check multiple specific files
Exemplary output looks like this
Analyzed /data/dummy1.pdf, found the following types of links/http codes: {200: 13}
Analyzed /data/dummy2.pdf, found the following types of links/http codes: {'mail': 4, 'tel': 4, 200: 49, 'error': 3}
| Page Number | URL | Details |
|---------------+--------------------------+---------------------------------------------------------------------|
| 1 | https://www.example1.com | <HTTPError 999: 'INKApi Error'> |
| 1 | https://www.example2.com | URLError(timeout('_ssl.c:1112: The handshake operation timed out')) |
| 1 | https://www.example3.com | <HTTPError 403: 'Forbidden'> |
To use the Github Action, create a pdf_link_checker.yml
in .github/workflows
:
on: [ push ]
jobs:
check_pdf_links:
runs-on: ubuntu-latest
name: Check PDF Links
steps:
- name: Checkout
uses: actions/checkout@v3
- name: PDF Link Checker
uses: a-nau/pdf-link-checker@v0.2.0
with:
paths: '.' # checks all PDFs, otherwise specify to file path(s)
To run within Docker
docker build -t pdf_link_checker .
docker run -it --rm --mount type=bind,source=${PWD},target=/data/ --name pdf_link_checker pdf_link_checker /data/.
Credits
I started from pdf-link-checker
by Matt Briggs. Especially get_links_from_page
is still heavily borrowed from the
original.
This work is licensed under the MIT license.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file pdflinkchecker_cli-0.2.0.tar.gz
.
File metadata
- Download URL: pdflinkchecker_cli-0.2.0.tar.gz
- Upload date:
- Size: 5.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.16
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 62954960efe6778b84a777bbc1f26d2f130d0f22bf8da9ec0f57831b39a6a9a4 |
|
MD5 | 75077d007653807ad0f5d9ed035cfa0f |
|
BLAKE2b-256 | bb42886fd20954d20f665b45219eb43130ef5504d8c857a40171f47ebee56554 |
File details
Details for the file pdflinkchecker_cli-0.2.0-py3-none-any.whl
.
File metadata
- Download URL: pdflinkchecker_cli-0.2.0-py3-none-any.whl
- Upload date:
- Size: 5.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.16
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6042d3adf2992bd7a76cc5a5bb0ccb763e05f2dc5f9ae8345821a26f66e0e1bc |
|
MD5 | f7976033c768ab500c2536a829ce785a |
|
BLAKE2b-256 | 2ac4f3347b9afe28c9ca79b692f5608d92ecbc4b09658d5b2761c2ac7dd565d3 |