Fast async broken-link checker for Markdown, HTML, and websites.
Project description
linkchecker-py
linkchecker-py is a fast async CLI for finding broken links in Markdown files, HTML files, and small to medium websites. It exists for documentation maintainers who want deterministic local checks, clean CI failures, and reports that can be attached to pull requests.
Project Status
This project is early but actively maintained. The core CLI, parser, checker, crawler, reports, tests, and CI are in place, but the project should still be treated as pre-1.0 while configuration, release automation, and broader compatibility work mature.
The package metadata is prepared for publishing, but this README does not claim PyPI availability until the package is actually published.
GitHub Metadata
Suggested repository description:
Async Python CLI for finding broken links in Markdown, HTML, and small websites.
Recommended GitHub topics:
link-checker, markdown, html, cli, python, documentation, ci, httpx, rich
Highlights
- Checks Markdown and HTML files from a
srclayout Python package. - Crawls same-origin websites with a configurable depth limit.
- Validates HTTP status codes and URL fragments such as
#install. - Checks local file links and generated Markdown heading anchors.
- Excludes noisy links with glob patterns.
- Controls concurrency and rate limiting for polite checks.
- Respects
robots.txtby default. - Caches remote results between runs.
- Prints Rich terminal tables and writes JSON or Markdown reports.
Install
From source
git clone https://github.com/jannis793/linkchecker-py.git
cd linkchecker-py
python -m venv .venv
. .venv/bin/activate
python -m pip install -e .
Development install
python -m pip install -e ".[dev]"
Once the package is published, the intended CLI install path will be:
pipx install linkchecker-py
Quickstart
Check the README and docs in this repository:
linkchecker-py files README.md docs/
Write a Markdown report:
linkchecker-py files README.md docs/ --report link-report.md
Write a JSON report for CI artifacts:
linkchecker-py files README.md docs/ --report link-report.json
Crawl a website up to depth 2:
linkchecker-py site https://example.com --depth 2
Try It Locally
The examples directory contains small Markdown and HTML fixtures. This command is expected to fail with exit code 1 because the fixture includes one intentionally missing local file:
linkchecker-py files examples/site --report examples/link-report.md
Run a passing example by excluding that intentional broken link:
linkchecker-py files examples/site \
--exclude "missing.md" \
--report examples/link-report.md
The generated report is local output and is not committed.
Common Options
Skip links that are rate-limited, private, or intentionally local:
linkchecker-py files docs/ --exclude "https://localhost/*" --exclude "*/private/*"
Lower concurrency and add request pacing for remote checks:
linkchecker-py site https://example.com --depth 1 --concurrency 4 --rate-limit 1
Use cached remote results:
linkchecker-py files docs/ --cache
Skip robots.txt checks for private staging sites you own:
linkchecker-py site https://staging.example.com --no-robots
There is no project-level config file yet. Keep options explicit in scripts or CI commands:
linkchecker-py files README.md docs/ \
--exclude "https://localhost/*" \
--exclude "*/private/*" \
--concurrency 8 \
--rate-limit 2 \
--cache \
--report link-report.md
Output and Exit Codes
Terminal output is a Rich table with status, URL, status code, source, and message. JSON reports contain a summary plus a row per checked link:
{
"summary": {
"broken": 1,
"ok": 1,
"skipped": 0,
"total": 2,
"unknown": 0
},
"links": []
}
Exit codes are designed for CI:
0: all checked links are OK, skipped, or unknown.1: at least one checked link is broken.2: the command could not run as requested, such as whenfilesfinds no supported Markdown or HTML files.
CI Usage
Source checkout workflow:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: "3.12"
- run: python -m pip install -e .
- run: linkchecker-py files README.md docs/ --report link-report.md
Upload the report even when broken links fail the job:
- name: Check documentation links
run: linkchecker-py files README.md docs/ --report link-report.md
- uses: actions/upload-artifact@v4
if: always()
with:
name: link-report
path: link-report.md
The repository's own CI runs ruff check . and pytest on Python 3.10, 3.11, 3.12, and 3.13.
A complete workflow another repository can adapt is available at examples/github-actions-link-check.yml.
Development
python -m pip install -e ".[dev]"
ruff check .
pytest
python -m build
Release steps are documented in docs/RELEASE.md. The current tag is v0.1.3; the next patch release would normally be v0.1.4 if the changes are documentation or bug fixes.
Troubleshooting
- If a URL is reported as blocked by
robots.txt, keep the skip or re-run with--no-robotsfor sites you control. - If a site rate-limits requests, lower
--concurrencyand set--rate-limit. - If local file links are skipped as outside the root, run the command from the documentation root or pass all relevant files/directories together.
- If generated documentation uses custom heading IDs, prefer explicit HTML anchors or link to those IDs directly.
Limitations
- Website crawling is intended for small to medium sites, not exhaustive internet-scale crawls.
- JavaScript-rendered links are not executed in a browser.
- Markdown heading anchors follow common GitHub-style slug behavior; documentation systems with custom slug rules can differ.
- Cache entries are local to the current user cache directory and expire after one hour by default.
Roadmap
See ROADMAP.md for scoped near-term improvements and suggested starter issues.
Contributing
Bug reports, focused feature requests, and pull requests are welcome. See CONTRIBUTING.md for setup, testing, and review expectations. Please report security issues through SECURITY.md.
Changelog
Release notes are tracked in CHANGELOG.md.
License
MIT. See LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file linkchecker_py-0.1.3.tar.gz.
File metadata
- Download URL: linkchecker_py-0.1.3.tar.gz
- Upload date:
- Size: 24.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0d7d4faf48147ef4042f2ed901114531bd5dd51b0884d153ef6f6f419cc1edbf
|
|
| MD5 |
8eface17e09a23765792a136ed9806cc
|
|
| BLAKE2b-256 |
dad21994c18d8e2915e8747de18781ad264a0a004ec3587fc06bfdfabb4ffea4
|
Provenance
The following attestation bundles were made for linkchecker_py-0.1.3.tar.gz:
Publisher:
publish.yml on jannis793/linkchecker-py
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
linkchecker_py-0.1.3.tar.gz -
Subject digest:
0d7d4faf48147ef4042f2ed901114531bd5dd51b0884d153ef6f6f419cc1edbf - Sigstore transparency entry: 1710338276
- Sigstore integration time:
-
Permalink:
jannis793/linkchecker-py@8e5159a19d3b1a4b0080d33eccc06c0d8cd7ee5e -
Branch / Tag:
refs/heads/main - Owner: https://github.com/jannis793
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@8e5159a19d3b1a4b0080d33eccc06c0d8cd7ee5e -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file linkchecker_py-0.1.3-py3-none-any.whl.
File metadata
- Download URL: linkchecker_py-0.1.3-py3-none-any.whl
- Upload date:
- Size: 15.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
456e289b8e855c2663c26a78f08e49d75f2ea26c3cf183a9f92b72d1d69f4ad1
|
|
| MD5 |
470044ee7aa237e6bc9e852c875eafa8
|
|
| BLAKE2b-256 |
f156c65a8ad587c4689c8ed7554a87601d65479618c6866f20b4611f1db4f15f
|
Provenance
The following attestation bundles were made for linkchecker_py-0.1.3-py3-none-any.whl:
Publisher:
publish.yml on jannis793/linkchecker-py
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
linkchecker_py-0.1.3-py3-none-any.whl -
Subject digest:
456e289b8e855c2663c26a78f08e49d75f2ea26c3cf183a9f92b72d1d69f4ad1 - Sigstore transparency entry: 1710338314
- Sigstore integration time:
-
Permalink:
jannis793/linkchecker-py@8e5159a19d3b1a4b0080d33eccc06c0d8cd7ee5e -
Branch / Tag:
refs/heads/main - Owner: https://github.com/jannis793
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@8e5159a19d3b1a4b0080d33eccc06c0d8cd7ee5e -
Trigger Event:
workflow_dispatch
-
Statement type: