pyHtmlProofer - A tool for validating internal & external links in HTML files / Websites

Project description

License

pyHTMLProofer

Check for website and static HTML pages for link rot.

Features

pyHTMLProofer can be used on

Static HTML pages (typically generated by an SSG). You can specify either files or directories to be checked.
Webpages, you can specify a URL/link to be checked.

pyHTMLProofer at the moment does the following:

Checks for broken internal links in HTML files
Checks if external links in HTML or website link are valid
Check for scripts / stylesheets in HTML files
Check for images in HTML files

You can read more details below in What's Tested? section.

Roadmap

The follower features are under development:

Check for images and alt-text in HTML files
Check Favicons
Check optimal SEO meta tags
Caching results
Config file

Installation

Install pyHTMLProofer with pip:

pip install pyhtmlproofer

What's tested?

You can configure pyHTMLProofer to check:

a file
a directory or list of directories
a URL / Link

Links / Hyperlinks

a, link elements: PyHTMLProofer checks-

If the internal links are valid
If the internal references (#in-page-links) are valid
If the external links are valid

Images

img elements: PyHTMLProofer checks -

if the internal image references are valid
if the external image references are valid

Scripts

script elements: PyHTMLProofer checks -

If the internal script references are valid
If the external script references are reachable

Usage

a) To check a file:

import pyHtmlProofer
file = "path/to/file1.html"
pyHtmlProofer.file(file).check()

b) To check a directories:

import pyHtmlProofer
directory_paths = ["path/to/1/file.html", "path/to/2/file.html"]
pyHtmlProofer.directories(directory_paths).check()

c) To validate URL(s):

import pyHtmlProofer
links = ["https://example.com", "https://cloudbytes.dev"]
pyHtmlProofer.links(links).check()

CLI

There is also a CLI that can be used:

$ pyhtmlproofer check -F <file_name>

Available Config Options

PROOFER_DEFAULTS = {
    "assume_extension": ".html",
    "directory_index_file": "index.html",
    "disable_external": False,
    "ignore_files": [],
    "ignore_urls": [],
    "enforce_https": True,
    "extensions": [".html"],
    "log_level": "ERROR",
    "report_to_file": True,
    "report_filename": "proofer_report",
}

You can override the default configuration options by passing a dictionary of options.

import pyHtmlProofer

options = {"log_level": "ERROR", "disable_external": True}
directory_paths = ["path/to/1/file.html", "path/to/2/file.html"]

pyHtmlProofer.directories(directory_paths, , options=options).check()

Credits

The inspiration was by Ruby based HTMLProofer and lack of Python based alternatives. Although, pyHTMLProofer is not a Python rewrite, instead it focuses on solving problems that I encountered while maintaining CloudBytes/Dev> website.

Project details

Release history Release notifications | RSS feed

0.7.3a0 pre-release

Sep 4, 2022

0.7.1a0 pre-release

Aug 24, 2022

0.6.20a0 pre-release

Aug 22, 2022

0.6.18a0 pre-release

Aug 20, 2022

0.6.17a0 pre-release

Aug 20, 2022

0.6.15a0 pre-release

Aug 20, 2022

0.6.14a0 pre-release

Aug 20, 2022

0.6.13a0 pre-release

Aug 19, 2022

0.6.12a0 pre-release

Aug 19, 2022

0.6.9a0 pre-release

Aug 19, 2022

0.6.8a0 pre-release

Aug 19, 2022

0.6.7a0 pre-release

Aug 19, 2022

0.6.6a0 pre-release

Aug 19, 2022

0.6.4a0 pre-release

Aug 18, 2022

0.6.3a0 pre-release

Aug 18, 2022

0.6.2a0 pre-release

Aug 18, 2022

This version

0.6.1a0 pre-release

Aug 18, 2022

0.5.1a0 pre-release

Aug 18, 2022

0.5.0a0 pre-release

Aug 18, 2022

0.4.20a0 pre-release

Aug 16, 2022

0.4.19a0 pre-release

Aug 15, 2022

0.3.14a0 pre-release

Aug 14, 2022

0.3.12a0 pre-release

Aug 14, 2022

0.2.11a0 pre-release

Aug 9, 2022

0.2.10a0 pre-release

Aug 4, 2022

0.2.9a0 pre-release

Jul 30, 2022

0.2.8a0 pre-release

Jul 29, 2022

0.2.7a0 pre-release

Jul 29, 2022

0.2.6a0 pre-release

Jul 29, 2022

0.2.5a0 pre-release

Jul 29, 2022

0.2.4a0 pre-release

Jul 28, 2022

0.2.3a0 pre-release

Jul 28, 2022

0.1.2a0 pre-release

Jul 28, 2022

0.1.1a0 pre-release

Jul 28, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyHtmlProofer-0.6.1a0.tar.gz (22.2 kB view hashes)

Uploaded Aug 18, 2022 Source

Built Distribution

pyHtmlProofer-0.6.1a0-py3-none-any.whl (24.6 kB view hashes)

Uploaded Aug 18, 2022 Python 3

Hashes for pyHtmlProofer-0.6.1a0.tar.gz

Hashes for pyHtmlProofer-0.6.1a0.tar.gz
Algorithm	Hash digest
SHA256	`62f3b84ef2811207e1711907aa85bcbc384b94fe84c50d60686b44d199c629b5`
MD5	`d07de87459e84aa7495e3fe799c82aa9`
BLAKE2b-256	`6ec77fff904719d1cc77746975e888b8d182a21b4649a36587930d0dfe475424`

Hashes for pyHtmlProofer-0.6.1a0-py3-none-any.whl

Hashes for pyHtmlProofer-0.6.1a0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f1de09526a8ca600d7d5e92cdf372042259f78298ae801e597d13bd8904a691b`
MD5	`34243ce7ee34beb4ab5002db0eda2f4d`
BLAKE2b-256	`ad434021e57c9598669d8b429ba10a718c25f4f9e39e808386146fdadf9fe72e`