Skip to main content

Detect broken links in Markdown, reStructuredText, and HTML documentation

Project description

linksanity (🏀17)

Detect broken links and redirects in Markdown, reStructuredText, and HTML documentation.

$ linksanity scan ./docs/
docs/api/guide.md
  BROKEN    line   12  ./missing.md — file not found
  REDIRECT  line   45  https://old.example.com → https://new.example.com

ok=38   broken=1   redirect=1   skipped=0

Features

  • Static scan — parse .md, .rst, and .html source files without a browser
  • Live crawl — follow links on a deployed site using a headless browser (Playwright)
  • Exit codes0 = clean, 1 = broken links found (ideal for CI)
  • Multiple formats — console (Rich), JSON, CSV; optional Markdown summary report
  • Anchor validation — opt-in --check-anchors flag
  • GitHub Issues — create or update an issue summarising broken links
  • Ignore domains — skip domains you don't control
  • JS-rendered pages — route specific domains through Playwright in scan mode
  • Retry logic — exponential back-off on 429/503; HEAD→GET fallback on 405

Install

pip install linksanity

For JS-rendered pages (Playwright headless browser):

pip install "linksanity[browser]"
playwright install chromium

Requires Python 3.11+.

From source:

git clone https://github.com/ya8282/linksanity
cd linksanity
pip install -e ".[dev,browser]"
playwright install chromium

Quick start

Scan local source files

# Scan a directory (finds all .md / .rst / .html files recursively)
linksanity scan ./docs/

# Scan specific files or globs
linksanity scan README.md docs/**/*.md

# Validate anchor fragments too
linksanity scan ./docs/ --check-anchors

# Write JSON output; exit 1 if broken links found
linksanity scan ./docs/ --format json --output results.json

# Create a Markdown summary report
linksanity scan ./docs/ --report report.md

# Skip domains you don't control
echo "internal.corp.example.com" > ignore.txt
linksanity scan ./docs/ --ignore-domains ignore.txt

Crawl a live site

# Crawl up to 500 pages (default)
linksanity crawl https://docs.example.com

# Limit crawl depth
linksanity crawl https://docs.example.com --max-pages 50

# Ignore external domains
linksanity crawl https://docs.example.com --ignore-domains ignore.txt

CI integration

Add a link-check job that runs on every pull request and on a weekly schedule.

# .github/workflows/linkcheck.yml
name: Link check

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]
  schedule:
    - cron: "0 8 * * 1"   # every Monday at 08:00 UTC

permissions:
  contents: read

jobs:
  linkcheck:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - uses: actions/setup-python@v5
        with:
          python-version: "3.11"
          cache: pip

      - name: Install linksanity
        run: pip install linksanity

      - name: Check links
        run: |
          linksanity scan ./docs/ \
            --skip-urls .linksanity-skip \
            --format json \
            --output linkcheck.json

      - name: Upload results
        if: always()
        uses: actions/upload-artifact@v4
        with:
          name: linkcheck-results
          path: linkcheck.json

File-based skip list — commit a .linksanity-skip file at your repo root to exclude auth-gated or staging URLs. Supports * wildcards:

# .linksanity-skip
https://app.example.com/login
https://staging.example.com/*
https://internal.corp.example.com/*

Report broken links to a GitHub Issue — useful for scheduled runs that find regressions after merge:

      - name: Report broken links
        if: failure()
        env:
          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
        run: |
          linksanity scan ./docs/ \
            --github-issue \
            --repo ${{ github.repository }}

GITHUB_TOKEN is always read from the environment — never pass it as a CLI flag or store it in a file.

Crawl a live docs site — swap scan for crawl to test a deployed site:

      - name: Crawl live docs
        run: |
          pip install "linksanity[browser]"
          playwright install --with-deps chromium
          linksanity crawl https://docs.example.com \
            --max-pages 200 \
            --block-analytics \
            --format json \
            --output crawl-results.json

GitHub Issue reporting

Use --github-issue when you want broken links surfaced as a trackable GitHub Issue rather than just a failed CI run. It creates or updates a single [linksanity] issue listing every broken URL, so the team has a persistent record to triage — not just a red check mark that disappears on the next push.

When to use it:

  • Scheduled runs — a weekly cron job catches link rot that crept in after your last merge. The issue stays open until you fix the links and the check goes green.
  • Repos without branch protection — if broken links won't block a PR merge, an issue is the only signal that survives past the CI run.
  • Large docs sites — when dozens of links break at once (e.g. a domain migration), a single issue is easier to triage than scrolling through CI logs.

When you don't need it:

  • PRs where branch protection already blocks the merge on failure — a failed job is sufficient.
  • Local runs and one-off checks.

Setup:

export GITHUB_TOKEN=ghp_...
linksanity scan ./docs/ --github-issue --repo owner/repo

GITHUB_TOKEN is read from the environment only — never pass it as a CLI flag or store it in a file. In GitHub Actions, use the built-in token:

env:
  GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

The workflow job also needs issues: write permission:

permissions:
  contents: read
  issues: write

Use with AI agents

linksanity is designed to be a clean tool call for AI agents. Use --format json so an agent can parse structured output without screen-scraping console text.

Exit codes are the primary signal:

Code Meaning
0 All links OK
1 One or more broken links
2 Invocation error

JSON output schema

linksanity scan ./docs/ --format json --output results.json

Each item in the output array has:

[
  {
    "url": "https://example.com/old",
    "source_file": "docs/guide.md",
    "line": 42,
    "status": "broken",
    "status_code": 404,
    "redirect_url": null,
    "error": null
  }
]

status is one of "ok", "broken", "redirect", "skipped", or "error".

Python subprocess usage

Use this when you want to drive linksanity from a Python script or agent — for example, to file tickets, send alerts, or trigger auto-repair after a scan. linksanity doesn't expose a public Python API, so subprocess.run is the correct integration point.

result.returncode is the fast path: check it before touching the file. If it's 2, something went wrong with invocation — read result.stderr for the error message rather than trying to parse the output file.

import json
import subprocess

result = subprocess.run(
    ["linksanity", "scan", "./docs/", "--format", "json", "--output", "results.json"],
    capture_output=True,  # stdout goes to the file; stderr carries error messages
    text=True,
)

if result.returncode == 2:
    raise RuntimeError(f"linksanity invocation error: {result.stderr.strip()}")

with open("results.json") as f:
    links = json.load(f)

# result.returncode == 1 means broken links exist; iterate to act on them
broken = [r for r in links if r["status"] == "broken"]

MCP tool definition

Register linksanity as a tool so an AI agent can call it on demand:

{
  "name": "check_links",
  "description": "Scan documentation files for broken links. Returns structured JSON. Exit code 1 means broken links were found.",
  "inputSchema": {
    "type": "object",
    "properties": {
      "paths": {
        "type": "array",
        "items": { "type": "string" },
        "description": "Files or directories to scan"
      },
      "skip_urls_file": {
        "type": "string",
        "description": "Path to a file listing URLs to skip (optional)"
      }
    },
    "required": ["paths"]
  }
}

Invoke it in your MCP server by shelling out to linksanity scan <paths> --format json --output /tmp/results.json and returning the parsed JSON.

Claude Code / claude-code tool call

If you use Claude Code, you can invoke linksanity directly from the Claude CLI:

! linksanity scan ./docs/ --format json --output results.json

Then ask Claude to interpret the output:

Read results.json and summarise which links are broken and why they might have rotted.

Options

linksanity scan <paths...>

Flag Default Description
--workers N 5 Max concurrent HTTP checks
--timeout N 10 Per-request timeout (seconds)
--retry N 2 Retries on 429/503
--check-anchors off Validate #fragment links
--ignore-domains FILE One domain per line to skip
--js-domains FILE Domains to check via Playwright
--skip-urls FILE URLs/patterns to skip (one per line, * wildcards ok)
--format console console, json, or csv
--output FILE stdout Write results to file
--report FILE Write Markdown summary to file
--github-issue off Open/update a GitHub Issue
--repo OWNER/REPO Required with --github-issue
--config FILE auto Path to linksanity.toml

linksanity crawl <url>

Same flags as scan, minus --check-anchors and --js-domains, plus:

Flag Default Description
--max-pages N 500 Stop after N pages crawled
--playwright-workers N 2 Max concurrent browser sessions
--skip-urls FILE URLs/patterns to skip (one per line, * wildcards ok)
--block-analytics off Block analytics/tracking domains in the browser

Configuration file

Place a linksanity.toml in your project root (auto-discovered):

workers = 10
timeout = 15
retry = 3
check_anchors = false
max_pages = 200
block_analytics = true

ignore_domains = ["status.example.com", "internal.example.com"]
js_domains = ["spa.example.com"]
skip_urls = [
  "https://app.example.com/login",
  "https://staging.example.com/*",
]

Exit codes

Code Meaning
0 All links OK (or only redirects/skipped)
1 One or more broken links
2 Invocation error (bad arguments, missing file)

Development

git clone https://github.com/linksanity/linksanity
cd linksanity
python -m venv .venv && source .venv/bin/activate
pip install -e ".[dev,browser]"
playwright install chromium

# Run tests
pytest

# Lint + type check
ruff check linksanity/ tests/
mypy linksanity/

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

linksanity-0.1.1.tar.gz (36.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

linksanity-0.1.1-py3-none-any.whl (28.5 kB view details)

Uploaded Python 3

File details

Details for the file linksanity-0.1.1.tar.gz.

File metadata

  • Download URL: linksanity-0.1.1.tar.gz
  • Upload date:
  • Size: 36.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.7

File hashes

Hashes for linksanity-0.1.1.tar.gz
Algorithm Hash digest
SHA256 9ed3dc169887ffa792f173d5fc71dae7b3fb34d89d70c7620faec8e7c3db4737
MD5 a87a5f70789fc4476747fe6e4b1ea6ee
BLAKE2b-256 bd739a3e679000a761fcf8463f804505a6c179e4858fd046a80b76fcd2f4b296

See more details on using hashes here.

File details

Details for the file linksanity-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: linksanity-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 28.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.7

File hashes

Hashes for linksanity-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 9237d02e523b53bcd70b082b2496c3f3e6970dc2893e15964f60fd5a5ecea43e
MD5 789bdec2db30b72fd056fab981ff1090
BLAKE2b-256 a093a61529a0e9837cb09bf8c392d27ed5267d427beb98306cfc31312f212704

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page