A python library
Project description
fastlinkcheck
Check for broken external and internal links.
fastlinkcheck
checks for broken links in HTML documents. This occurs in parallel so performance is fast. Both external links and internal links are checked. Internal links are checked by verifying local files.
Install
pip install fastlinkcheck
Usage
link_check
[source]
link_check
(path
:"Root directory searched recursively for HTML files",host
:"Host and path (without protocol) of web server"=''
,config_file
:"Location of file with urls to ignore"=None
)
Check for broken links recursively in path
.
The _example/ directory in this repo contains sample HTML files which we can use for demonstration.
The path
parameter specifies the directory that will be searched recursively for HTML files that you wish to check.
Specifying the host
parameter allows you detect links that are internal by identifying links with that host name. External links are verified by making a request to the appropriate website. On the other hand, internal links are verified by inspecting the presence and content of local files.
from fastlinkcheck import link_check
broken_links = link_check(path='_example', host='fastlinkcheck.com')
print(broken_links)
- 'http://somecdn.com/doesntexist.html' was found in the following pages:
- `/Users/hamelsmu/github/fastlinkcheck/_example/test.html`
- Path('/Users/hamelsmu/github/fastlinkcheck/_example/test.js') was found in the following pages:
- `/Users/hamelsmu/github/fastlinkcheck/_example/test.html`
print(f'Number of broken links found {len(broken_links)}')
Number of broken links found 2
Ignore links with a configuration file
You can choose to ignore files with a a plain-text file containing a list of urls to ignore. For example, the file linkcheck.rc
contains a list of urls I want to ignore:
with open('_example/linkcheck.rc', 'r') as f: print(f.read())
test.js
https://www.google.com
In this case example/test.js
will be filtered out from the list:
broken_links = link_check(path='_example', host='fastlinkcheck.com', config_file='_example/linkcheck.rc')
print(broken_links)
- 'http://somecdn.com/doesntexist.html' was found in the following pages:
- `/Users/hamelsmu/github/fastlinkcheck/_example/test.html`
CLI Function
link_check
can also be called from the command line. We can see various options by passing the --help
flag. These options correspond to the same parameters as calling the link_check
function described above.
link_check --help
usage: link_check [-h] [--host HOST] [--config_file CONFIG_FILE] [--pdb]
[--xtra XTRA]
path
Check for broken links recursively in `path`.
positional arguments:
path Root directory searched recursively for HTML files
optional arguments:
-h, --help show this help message and exit
--host HOST Host and path (without protocol) of web server
(default: )
--config_file CONFIG_FILE
Location of file with urls to ignore
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for fastlinkcheck-0.0.16-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d446af55bbf5e10e84a9bb95d439b3ec0491605ec8eed5bdb2a99b04b62f9e2c |
|
MD5 | c94ecd9d18fb65bb4fab2b05c061c8b6 |
|
BLAKE2b-256 | 756a906a21ddc173330c2a33b8d25368e6e41ce6ba0d9059d5672b699de338ed |