Skip to main content

pyHtmlProofer - A tool for validating internal & external links in HTML files / Websites

Project description

CI PyPI Version License

pyhtmlproofer

Check for website and static HTML pages for link rot.

Features

pyhtmlproofer can be used on

  1. Static HTML pages (typically generated by an SSG). You can specify either files or directories to be checked.
  2. Webpages, you can specify a URL/link to be checked.

pyhtmlproofer at the moment does the following:

  1. Checks for broken internal links in HTML files
  2. Checks if external links in HTML or website link are valid
  3. Check for scripts / stylesheets in HTML files
  4. Check for images in HTML files

You can read more details below in What's Tested? section.

Roadmap

The follower features are under development:

  1. Check for images and alt-text in HTML files
  2. Check Favicons
  3. Check optimal SEO meta tags
  4. Caching results
  5. Config file

Installation

Install pyhtmlproofer with pip:

pip install pyhtmlproofer

What's tested?

You can configure pyhtmlproofer to check:

  • a file
  • a directory or list of directories
  • a URL / Link

Links / Hyperlinks

a, link elements: pyhtmlproofer checks-

  • If the internal links are valid
  • If the internal references (#in-page-links) are valid
  • If the external links are valid

Images

img elements: pyhtmlproofer checks -

  • if the internal image references are valid
  • if the external image references are valid

Scripts

script elements: pyhtmlproofer checks -

  • If the internal script references are valid
  • If the external script references are reachable

Usage

a) To check a file:

import pyhtmlproofer as proofer
file = "path/to/file1.html"
proofer.file(file).check()

b) To check a directories:

import pyhtmlproofer as proofer
directory_paths = ["path/to/1/file.html", "path/to/2/file.html"]
proofer.directories(directory_paths).check()

c) To validate URL(s):

import pyhtmlproofer as proofer
links = ["https://example.com", "https://cloudbytes.dev"]
proofer.links(links).check()

CLI

There is also a CLI that can be used:

$ pyhtmlproofer check -F <file_name>

Available Config Options

PROOFER_DEFAULTS = {
    "assume_extension": ".html",
    "directory_index_file": "index.html",
    "disable_external": False,
    "ignore_files": [],
    "ignore_urls": [],
    "enforce_https": True,
    "extensions": [".html"],
    "log_level": "ERROR",
    "report_to_file": True,
    "report_filename": "proofer_report",
}

You can override the default configuration options by passing a dictionary of options.

import pyhtmlproofer as proofer

options = {"log_level": "ERROR", "disable_external": True}
directory_paths = ["path/to/1/file.html", "path/to/2/file.html"]

proofer.directories(directory_paths, , options=options).check()

Credits

The inspiration was by Ruby based HTMLProofer and lack of Python based alternatives. Although, pyhtmlproofer is not a Python rewrite, instead it focuses on solving problems that I encountered while maintaining CloudBytes/Dev> website.

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyHtmlProofer-0.7.3a0.tar.gz (30.9 kB view details)

Uploaded Source

Built Distribution

pyHtmlProofer-0.7.3a0-py3-none-any.whl (24.6 kB view details)

Uploaded Python 3

File details

Details for the file pyHtmlProofer-0.7.3a0.tar.gz.

File metadata

  • Download URL: pyHtmlProofer-0.7.3a0.tar.gz
  • Upload date:
  • Size: 30.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: pdm/2.1.3 CPython/3.8.13

File hashes

Hashes for pyHtmlProofer-0.7.3a0.tar.gz
Algorithm Hash digest
SHA256 17ac23ef4cb1dff7fb0fb8e0105164c9cbda5345694c8743990b47f92cc6f3b8
MD5 ed0f19291ad9b656f7cd293619c6d45d
BLAKE2b-256 9f97dc65d74c96535f88327d87d02a8d55567f17c59531bc96f7886fe7cc7beb

See more details on using hashes here.

File details

Details for the file pyHtmlProofer-0.7.3a0-py3-none-any.whl.

File metadata

File hashes

Hashes for pyHtmlProofer-0.7.3a0-py3-none-any.whl
Algorithm Hash digest
SHA256 5fade45f8aa78930ad82ba061310290e3629262ea6d96f4a9cdf12f55f9231f0
MD5 3b3e147e8ae59dd6e1aa7e8cd2e1c6ed
BLAKE2b-256 f86297af3c80d61e8c40b3b789c461a975805dea0bc261c57b007a8e9c09173c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page