pyHtmlProofer - A tool for validating internal & external links in HTML files / Websites
Project description
pyHTMLProofer
Check for website and static HTML pages for link rot.
Features
pyHTMLProofer can be used on
- Static HTML pages (typically generated by an SSG). You can specify either files or directories to be checked.
- Webpages, you can specify a URL/link to be checked.
pyHTMLProofer at the moment does the following:
- Checks for broken internal links in HTML files
- Checks if external links in HTML or website link are valid
- Check for scripts / stylesheets in HTML files
- Check for images in HTML files
You can read more details below in What's Tested? section.
Roadmap
The follower features are under development:
- Check for images and alt-text in HTML files
- Check Favicons
- Check optimal SEO meta tags
- Caching results
- Config file
Installation
Install pyHTMLProofer with pip:
pip install pyhtmlproofer
What's tested?
You can configure pyHTMLProofer to check:
- a file
- a directory or list of directories
- a URL / Link
Links / Hyperlinks
a
, link
elements: PyHTMLProofer checks-
- If the internal links are valid
- If the internal references (
#in-page-links
) are valid - If the external links are valid
Images
img
elements: PyHTMLProofer checks -
- if the internal image references are valid
- if the external image references are valid
Scripts
script
elements: PyHTMLProofer checks -
- If the internal script references are valid
- If the external script references are reachable
Usage
a) To check a file:
import pyHtmlProofer
file = "path/to/file1.html"
pyHtmlProofer.file(file).check()
b) To check a directories:
import pyHtmlProofer
directory_paths = ["path/to/1/file.html", "path/to/2/file.html"]
pyHtmlProofer.directories(directory_paths).check()
c) To validate URL(s):
import pyHtmlProofer
links = ["https://example.com", "https://cloudbytes.dev"]
pyHtmlProofer.links(links).check()
CLI
There is also a CLI that can be used:
$ pyhtmlproofer check -F <file_name>
Available Config Options
PROOFER_DEFAULTS = {
"assume_extension": ".html",
"directory_index_file": "index.html",
"disable_external": False,
"ignore_files": [],
"ignore_urls": [],
"enforce_https": True,
"extensions": [".html"],
"log_level": "ERROR",
"report_to_file": True,
"report_filename": "proofer_report",
}
You can override the default configuration options by passing a dictionary of options.
import pyHtmlProofer
options = {"log_level": "ERROR", "disable_external": True}
directory_paths = ["path/to/1/file.html", "path/to/2/file.html"]
pyHtmlProofer.directories(directory_paths, , options=options).check()
Credits
The inspiration was by Ruby based HTMLProofer and lack of Python based alternatives. Although, pyHTMLProofer is not a Python rewrite, instead it focuses on solving problems that I encountered while maintaining CloudBytes/Dev> website.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for pyHtmlProofer-0.6.1a0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f1de09526a8ca600d7d5e92cdf372042259f78298ae801e597d13bd8904a691b |
|
MD5 | 34243ce7ee34beb4ab5002db0eda2f4d |
|
BLAKE2b-256 | ad434021e57c9598669d8b429ba10a718c25f4f9e39e808386146fdadf9fe72e |