Skip to main content

A python library

Project description

fastlinkcheck

Check for broken external and internal links.

fastlinkcheck checks for broken links in HTML documents. This occurs in parallel so performance is fast. Both external links and internal links are checked. Internal links are checked by verifying local files.

Install

pip install fastlinkcheck

Usage

link_check(path:"Root directory searched recursively for HTML files", host:"Host and path (without protocol) of web server"='', config_file:"Location of file with urls to ignore"=None, actions_output:"Toggle GitHub Actions output on/off"=False, exit_on_found:"(CLI Only) Exit with status code 1 if broken links are found"=False, print_logs:"Toggle printing logs to stdout."=False)

Check for broken links recursively in path.

The _example/ directory in this repo contains sample HTML files which we can use for demonstration.

The path parameter specifies the directory that will be searched recursively for HTML files that you wish to check.

Specifying the host parameter allows you detect links that are internal by identifying links with that host name. External links are verified by making a request to the appropriate website. On the other hand, internal links are verified by inspecting the presence and content of local files.

from fastlinkcheck import link_check

broken_links = link_check(path='_example', host='fastlinkcheck.com')
print(broken_links)
- 'http://somecdn.com/doesntexist.html' was found in the following pages:
  - `/Users/hamelsmu/github/fastlinkcheck/_example/test.html`
- Path('/Users/hamelsmu/github/fastlinkcheck/_example/test.js') was found in the following pages:
  - `/Users/hamelsmu/github/fastlinkcheck/_example/test.html`

Print logs to stdout

You can optionally print logs to stdout with the print_logs parameter. This can be useful for debugging:

broken_links = link_check(path='_example', host='fastlinkcheck.com', print_logs=True)
ERROR: The Following Broken Links or Paths were found:
- 'http://somecdn.com/doesntexist.html' was found in the following pages:
  - `/Users/hamelsmu/github/fastlinkcheck/_example/test.html`
- Path('/Users/hamelsmu/github/fastlinkcheck/_example/test.js') was found in the following pages:
  - `/Users/hamelsmu/github/fastlinkcheck/_example/test.html`
print(f'Number of broken links found {len(broken_links)}')
Number of broken links found 2

Ignore links with a configuration file

You can choose to ignore files with a a plain-text file containing a list of urls to ignore. For example, the file linkcheck.rc contains a list of urls I want to ignore:

with open('_example/linkcheck.rc', 'r') as f: print(f.read())
test.js
https://www.google.com

In this case example/test.js will be filtered out from the list:

broken_links = link_check(path='_example', host='fastlinkcheck.com', config_file='_example/linkcheck.rc')
print(broken_links)
- 'http://somecdn.com/doesntexist.html' was found in the following pages:
  - `/Users/hamelsmu/github/fastlinkcheck/_example/test.html`

CLI Function

link_check can also be called from the command line. We can see various options by passing the --help flag. These options correspond to the same parameters as calling the link_check function described above.

link_check --help

usage: link_check [-h] [--host HOST] [--config_file CONFIG_FILE]
                  [--actions_output] [--exit_on_found] [--print_logs] [--pdb]
                  [--xtra XTRA]
                  path

Check for broken links recursively in `path`.

positional arguments:
  path                  Root directory searched recursively for HTML files

optional arguments:
  -h, --help            show this help message and exit
  --host HOST           Host and path (without protocol) of web server
                        (default: )
  --config_file CONFIG_FILE
                        Location of file with urls to ignore
  --actions_output      Toggle GitHub Actions output on/off (default: False)
  --exit_on_found       Exit with status code 1 if broken links are
                        found (default: False)
  --print_logs          Toggle printing logs to stdout. (default: False)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fastlinkcheck-0.0.4.tar.gz (10.1 kB view hashes)

Uploaded Source

Built Distribution

fastlinkcheck-0.0.4-py3-none-any.whl (9.9 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page