Skip to main content

pyHtmlProofer - A tool for validating internal & external links in HTML files / Websites

Project description

CI PyPI Version License

pyhtmlproofer

Check for website and static HTML pages for link rot.

Features

pyhtmlproofer can be used on

  1. Static HTML pages (typically generated by an SSG). You can specify either files or directories to be checked.
  2. Webpages, you can specify a URL/link to be checked.

pyhtmlproofer at the moment does the following:

  1. Checks for broken internal links in HTML files
  2. Checks if external links in HTML or website link are valid
  3. Check for scripts / stylesheets in HTML files
  4. Check for images in HTML files

You can read more details below in What's Tested? section.

Roadmap

The follower features are under development:

  1. Check for images and alt-text in HTML files
  2. Check Favicons
  3. Check optimal SEO meta tags
  4. Caching results
  5. Config file

Installation

Install pyhtmlproofer with pip:

pip install pyhtmlproofer

What's tested?

You can configure pyhtmlproofer to check:

  • a file
  • a directory or list of directories
  • a URL / Link

Links / Hyperlinks

a, link elements: pyhtmlproofer checks-

  • If the internal links are valid
  • If the internal references (#in-page-links) are valid
  • If the external links are valid

Images

img elements: pyhtmlproofer checks -

  • if the internal image references are valid
  • if the external image references are valid

Scripts

script elements: pyhtmlproofer checks -

  • If the internal script references are valid
  • If the external script references are reachable

Usage

a) To check a file:

import pyhtmlproofer as proofer
file = "path/to/file1.html"
proofer.file(file).check()

b) To check a directories:

import pyhtmlproofer as proofer
directory_paths = ["path/to/1/file.html", "path/to/2/file.html"]
proofer.directories(directory_paths).check()

c) To validate URL(s):

import pyhtmlproofer as proofer
links = ["https://example.com", "https://cloudbytes.dev"]
proofer.links(links).check()

CLI

There is also a CLI that can be used:

$ pyhtmlproofer check -F <file_name>

Available Config Options

PROOFER_DEFAULTS = {
    "assume_extension": ".html",
    "directory_index_file": "index.html",
    "disable_external": False,
    "ignore_files": [],
    "ignore_urls": [],
    "enforce_https": True,
    "extensions": [".html"],
    "log_level": "ERROR",
    "report_to_file": True,
    "report_filename": "proofer_report",
}

You can override the default configuration options by passing a dictionary of options.

import pyhtmlproofer as proofer

options = {"log_level": "ERROR", "disable_external": True}
directory_paths = ["path/to/1/file.html", "path/to/2/file.html"]

proofer.directories(directory_paths, , options=options).check()

Credits

The inspiration was by Ruby based HTMLProofer and lack of Python based alternatives. Although, pyhtmlproofer is not a Python rewrite, instead it focuses on solving problems that I encountered while maintaining CloudBytes/Dev> website.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyHtmlProofer-0.7.3a0.tar.gz (30.9 kB view hashes)

Uploaded Source

Built Distribution

pyHtmlProofer-0.7.3a0-py3-none-any.whl (24.6 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page