Skip to main content

A python library

Project description

fastlinkcheck

Check for broken external and internal links.

fastlinkcheck checks for broken links in HTML documents. This occurs in parallel so performance is fast. Both external links and internal links are checked. Internal links are checked by verifying local files.

fastlinkcheck.fast.ai

Install

pip install fastlinkcheck

Usage

link_check(path:"Root directory searched recursively for HTML files", host:"Host and path (without protocol) of web server"='', config_file:"Location of file with urls to ignore"=None, exit_on_err:"Exit with a status code 1 if broken links are found."=False)

Check for broken links recursively in path.

The _example/ directory in this repo contains sample HTML files which we can use for demonstration.

The path parameter specifies the directory that will be searched recursively for HTML files that you wish to check.

Specifying the host parameter allows you detect links that are internal by identifying links with that host name. External links are verified by making a request to the appropriate website. On the other hand, internal links are verified by inspecting the presence and content of local files.

from fastlinkcheck import link_check

broken_links = link_check(path='_example', host='fastlinkcheck.com')
print(broken_links)
- 'http://somecdn.com/doesntexist.html' was found in the following pages:
  - `/Users/hamelsmu/github/fastlinkcheck/_example/test.html`
- Path('/Users/hamelsmu/github/fastlinkcheck/_example/test.js') was found in the following pages:
  - `/Users/hamelsmu/github/fastlinkcheck/_example/test.html`
print(f'Number of broken links found {len(broken_links)}')
Number of broken links found 2

Ignore links with a configuration file

You can choose to ignore files with a a plain-text file containing a list of urls to ignore. For example, the file linkcheck.rc contains a list of urls I want to ignore:

with open('_example/linkcheck.rc', 'r') as f: print(f.read())
test.js
https://www.google.com

In this case example/test.js will be filtered out from the list:

broken_links = link_check(path='_example', host='fastlinkcheck.com', config_file='_example/linkcheck.rc')
print(broken_links)
- 'http://somecdn.com/doesntexist.html' was found in the following pages:
  - `/Users/hamelsmu/github/fastlinkcheck/_example/test.html`

CLI Function

link_check can also be called from the command line. We can see various options by passing the --help flag. These options correspond to the same parameters as calling the link_check function described above.

link_check --help

usage: link_check [-h] [--host HOST] [--config_file CONFIG_FILE]
                  [--exit_on_err] [--pdb] [--xtra XTRA]
                  path

Check for broken links recursively in `path`.

positional arguments:
  path                  Root directory searched recursively for HTML files

optional arguments:
  -h, --help            show this help message and exit
  --host HOST           Host and path (without protocol) of web server
                        (default: )
  --config_file CONFIG_FILE
                        Location of file with urls to ignore
  --exit_on_err         Exit with a status code 1 if broken links are found.
                        (default: False)

Documentation

Docs site: fastlinkcheck.fast.ai

Appendix: Using link_check in GitHub Actions

In the below example we pass the flag --exit_on_err which will mark this workflow as failed if a broken link is found.

name: Check Links
on: [workflow_dispatch, push]

jobs:
  check-links:
    runs-on: ubuntu-latest
    steps:
    - uses: actions/checkout@v2
    - uses: actions/setup-python@v2
    - name: check for broken links
      run: |
        pip install fastlinkcheck
        link_check _example --exit_on_err 

You can open an issue if broken links are found by adding a few extra lines of code:

...
      - name: check for broken links
        run: |
          pip install fastlinkcheck
          errs=$(link_check _example)
          export GITHUB_TOKEN=${{ secrets.GITHUB_TOKEN }}
          gh issue create -t "test issue from gh" -b "$errs" -R ${{ github.repository }}

See the GitHub Actions docs for more information.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fastlinkcheck-0.0.19.tar.gz (10.3 kB view hashes)

Uploaded Source

Built Distribution

fastlinkcheck-0.0.19-py3-none-any.whl (10.1 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page