Skip to main content

Fast async broken-link checker for Markdown, HTML, and websites.

Project description

linkchecker-py

CI Python License: MIT

linkchecker-py is a fast async CLI for finding broken links in Markdown files, HTML files, and small to medium websites. It exists for documentation maintainers who want deterministic local checks, clean CI failures, and reports that can be attached to pull requests.

Terminal demo

Project Status

This project is early but actively maintained. The core CLI, parser, checker, crawler, reports, tests, and CI are in place, but the project should still be treated as pre-1.0 while configuration, release automation, and broader compatibility work mature.

The package metadata is prepared for publishing, but this README does not claim PyPI availability until the package is actually published.

GitHub Metadata

Suggested repository description:

Async Python CLI for finding broken links in Markdown, HTML, and small websites.

Recommended GitHub topics:

link-checker, markdown, html, cli, python, documentation, ci, httpx, rich

Highlights

  • Checks Markdown and HTML files from a src layout Python package.
  • Crawls same-origin websites with a configurable depth limit.
  • Validates HTTP status codes and URL fragments such as #install.
  • Checks local file links and generated Markdown heading anchors.
  • Excludes noisy links with glob patterns.
  • Controls concurrency and rate limiting for polite checks.
  • Respects robots.txt by default.
  • Caches remote results between runs.
  • Prints Rich terminal tables and writes JSON or Markdown reports.

Install

From source

git clone https://github.com/jannis793/linkchecker-py.git
cd linkchecker-py
python -m venv .venv
. .venv/bin/activate
python -m pip install -e .

Development install

python -m pip install -e ".[dev]"

Once the package is published, the intended CLI install path will be:

pipx install linkchecker-py

Quickstart

Check the README and docs in this repository:

linkchecker-py files README.md docs/

Write a Markdown report:

linkchecker-py files README.md docs/ --report link-report.md

Write a JSON report for CI artifacts:

linkchecker-py files README.md docs/ --report link-report.json

Crawl a website up to depth 2:

linkchecker-py site https://example.com --depth 2

Try It Locally

The examples directory contains small Markdown and HTML fixtures. This command is expected to fail with exit code 1 because the fixture includes one intentionally missing local file:

linkchecker-py files examples/site --report examples/link-report.md

Run a passing example by excluding that intentional broken link:

linkchecker-py files examples/site \
  --exclude "missing.md" \
  --report examples/link-report.md

The generated report is local output and is not committed.

Common Options

Skip links that are rate-limited, private, or intentionally local:

linkchecker-py files docs/ --exclude "https://localhost/*" --exclude "*/private/*"

Lower concurrency and add request pacing for remote checks:

linkchecker-py site https://example.com --depth 1 --concurrency 4 --rate-limit 1

Use cached remote results:

linkchecker-py files docs/ --cache

Skip robots.txt checks for private staging sites you own:

linkchecker-py site https://staging.example.com --no-robots

There is no project-level config file yet. Keep options explicit in scripts or CI commands:

linkchecker-py files README.md docs/ \
  --exclude "https://localhost/*" \
  --exclude "*/private/*" \
  --concurrency 8 \
  --rate-limit 2 \
  --cache \
  --report link-report.md

Output and Exit Codes

Terminal output is a Rich table with status, URL, status code, source, and message. JSON reports contain a summary plus a row per checked link:

{
  "summary": {
    "broken": 1,
    "ok": 1,
    "skipped": 0,
    "total": 2,
    "unknown": 0
  },
  "links": []
}

Exit codes are designed for CI:

  • 0: all checked links are OK, skipped, or unknown.
  • 1: at least one checked link is broken.
  • 2: the command could not run as requested, such as when files finds no supported Markdown or HTML files.

CI Usage

Source checkout workflow:

- uses: actions/checkout@v4
- uses: actions/setup-python@v5
  with:
    python-version: "3.12"
- run: python -m pip install -e .
- run: linkchecker-py files README.md docs/ --report link-report.md

Upload the report even when broken links fail the job:

- name: Check documentation links
  run: linkchecker-py files README.md docs/ --report link-report.md
- uses: actions/upload-artifact@v4
  if: always()
  with:
    name: link-report
    path: link-report.md

The repository's own CI runs ruff check . and pytest on Python 3.10, 3.11, 3.12, and 3.13.

A complete workflow another repository can adapt is available at examples/github-actions-link-check.yml.

Development

python -m pip install -e ".[dev]"
ruff check .
pytest
python -m build

Release steps are documented in docs/RELEASE.md. The current tag is v0.1.3; the next patch release would normally be v0.1.4 if the changes are documentation or bug fixes.

Troubleshooting

  • If a URL is reported as blocked by robots.txt, keep the skip or re-run with --no-robots for sites you control.
  • If a site rate-limits requests, lower --concurrency and set --rate-limit.
  • If local file links are skipped as outside the root, run the command from the documentation root or pass all relevant files/directories together.
  • If generated documentation uses custom heading IDs, prefer explicit HTML anchors or link to those IDs directly.

Limitations

  • Website crawling is intended for small to medium sites, not exhaustive internet-scale crawls.
  • JavaScript-rendered links are not executed in a browser.
  • Markdown heading anchors follow common GitHub-style slug behavior; documentation systems with custom slug rules can differ.
  • Cache entries are local to the current user cache directory and expire after one hour by default.

Roadmap

See ROADMAP.md for scoped near-term improvements and suggested starter issues.

Contributing

Bug reports, focused feature requests, and pull requests are welcome. See CONTRIBUTING.md for setup, testing, and review expectations. Please report security issues through SECURITY.md.

Changelog

Release notes are tracked in CHANGELOG.md.

License

MIT. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

linkchecker_py-0.1.3.tar.gz (24.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

linkchecker_py-0.1.3-py3-none-any.whl (15.0 kB view details)

Uploaded Python 3

File details

Details for the file linkchecker_py-0.1.3.tar.gz.

File metadata

  • Download URL: linkchecker_py-0.1.3.tar.gz
  • Upload date:
  • Size: 24.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for linkchecker_py-0.1.3.tar.gz
Algorithm Hash digest
SHA256 0d7d4faf48147ef4042f2ed901114531bd5dd51b0884d153ef6f6f419cc1edbf
MD5 8eface17e09a23765792a136ed9806cc
BLAKE2b-256 dad21994c18d8e2915e8747de18781ad264a0a004ec3587fc06bfdfabb4ffea4

See more details on using hashes here.

Provenance

The following attestation bundles were made for linkchecker_py-0.1.3.tar.gz:

Publisher: publish.yml on jannis793/linkchecker-py

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file linkchecker_py-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: linkchecker_py-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 15.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for linkchecker_py-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 456e289b8e855c2663c26a78f08e49d75f2ea26c3cf183a9f92b72d1d69f4ad1
MD5 470044ee7aa237e6bc9e852c875eafa8
BLAKE2b-256 f156c65a8ad587c4689c8ed7554a87601d65479618c6866f20b4611f1db4f15f

See more details on using hashes here.

Provenance

The following attestation bundles were made for linkchecker_py-0.1.3-py3-none-any.whl:

Publisher: publish.yml on jannis793/linkchecker-py

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page