Skip to main content

Website crawler, link checker, and content analyzer

Project description

rms-link-checker

GitHub release; latest by date GitHub Release Date Test Status Documentation Status Code coverage
PyPI - Version PyPI - Format PyPI - Downloads PyPI - Python Version
GitHub commits since latest release GitHub commit activity GitHub last commit
Number of GitHub open issues Number of GitHub closed issues Number of GitHub open pull requests Number of GitHub closed pull requests
GitHub License Number of GitHub stars GitHub forks

rms-link-checker is a Python command-line application that crawls a website starting from a given root URL, checks all discovered links for validity, detects misplaced asset files, and produces a plain-text report summarizing the results.

Full documentation is available at rms-link-checker.readthedocs.io.

Features

  • Crawls an entire website starting from a single root URL
  • Checks all discovered links (internal and external) for validity
  • Detects broken links (4xx/5xx responses) and broken anchor fragments
  • Follows and reports redirect chains
  • Detects misplaced asset files (images, documents, scripts, etc.)
  • Configurable depth limit, request limit, and thread count
  • YAML configuration file support with CLI override precedence
  • Non-HTTP scheme links (mailto:, tel:, etc.) recorded and reported
  • SSL certificate errors reported per domain
  • Plain-text report with 11 sections

Installation

End-user (recommended)

pipx install rms-link-checker

Developer

git clone https://github.com/SETI/rms-link-checker.git
cd rms-link-checker
python3 -m venv venv
source venv/bin/activate
pip install -e ".[dev]"

Quick Start

link_check https://example.com

With options:

link_check https://example.com --max-depth 3 --max-threads 20 -o report.txt

With a configuration file:

link_check --config-file config.yaml

Contributing

Information on contributing to this package can be found in the Contributing Guide.

Licensing

This code is licensed under the Apache License v2.0.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rms_link_checker-2.0.0.tar.gz (127.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

rms_link_checker-2.0.0-py3-none-any.whl (38.2 kB view details)

Uploaded Python 3

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page