Skip to main content

CLI tool to fetch URLs from sitemap.xml, check their existence, and generate performance reports

Project description

Siteprobe

Siteprobe is a Rust-based CLI tool that fetches all URLs from a given sitemap.xml url, checks their existence, and generates a performance report. It supports various features such as authentication, concurrency control, caching bypass, and more.

Screenshot of Siteprobe statistics

Features

  • Fetch and parse sitemap.xml to extract URLs, including nested Sitemap Index files recursively.
  • Check the existence and response times of each URL.
  • Generate a detailed performance CSV report.
  • Support for Basic Authentication.
  • Adjustable concurrency limits for request handling.
  • Configurable request timeout settings.
  • Support for configuring rate limits, such as 300 requests per 5-minute interval.
  • Redirect handling with security precautions.
  • Filtering and reporting slow URLs based on a threshold.
  • Custom User-Agent header support.
  • Option to append random timestamps to URLs to bypass caching mechanisms.
  • Save downloaded documents for further inspection or use as a static site mirror.

Installation

You can install Siteprobe using Cargo:

cargo install siteprobe

Alternatively, build from source:

git clone https://github.com/bartTC/siteprobe.git
cd siteprobe
cargo build --release

Usage

siteprobe <sitemap_url> [OPTIONS]

Arguments

  • <sitemap_url> - The URL of the sitemap to be fetched and processed.

Options

Usage: siteprobe [OPTIONS] <SITEMAP_URL>

Arguments:
  <SITEMAP_URL>  The URL of the sitemap to be fetched and processed.

Options:
      --basic-auth <BASIC_AUTH>
          Basic authentication credentials in the format `username:password`
  -c, --concurrency-limit <CONCURRENCY_LIMIT>
          Maximum number of concurrent requests allowed [default: 4]
  -l, --rate-limit <RATE_LIMIT>
          The rate limit for all requests in the format 'requests/time[unit]',
          where unit can be seconds (`s`), minutes (`m`), or hours (`h`). E.g.
          '-l 300/5m' for 300 requests per 5 minutes, or '-l 100/1h' for 100
          requests per hour.
  -o, --output-dir <OUTPUT_DIR>
          Directory where all downloaded documents will be saved
  -a, --append-timestamp
          Append a random timestamp to each URL to bypass caching mechanisms
  -r, --report-path <REPORT_PATH>
          File path for storing the generated `report.csv`
  -j, --report-path-json <REPORT_PATH_JSON>
          File path for storing the generated `report.json`
  -t, --request-timeout <REQUEST_TIMEOUT>
          Default timeout (in seconds) for each request [default: 10]
      --user-agent <USER_AGENT>
          Custom User-Agent header to be used in requests [default: "Mozilla/5.0
          (compatible; Siteprobe/0.5.0)"]
      --slow-num <SLOW_NUM>
          Limit the number of slow documents displayed in the report. [default:
          100]
  -s, --slow-threshold <SLOW_THRESHOLD>
          Show slow responses. The value is the threshold (in seconds) for
          considering a document as 'slow'. E.g. '-s 3' for 3 seconds or '-s
          0.05' for 50ms.
  -f, --follow-redirects
          Controls automatic redirects. When enabled, the client will follow
          HTTP redirects (up to 10 by default). Note that for security, Basic
          Authentication credentials are intentionally not forwarded during
          redirects to prevent unintended credential exposure.
  -h, --help
          Print help

Example Usage

# Fetch and analyze a sitemap with default settings
siteprobe https://example.com/sitemap.xml

# Save the report to a specific file
siteprobe https://example.com/sitemap.xml --report-path ./results/report.csv --output-dir ./example.com

# Set concurrency limit to 10 and timeout to 5 seconds
siteprobe https://example.com/sitemap.xml --concurrency-limit 10 --request-timeout 5

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

siteprobe-1.2.1-py3-none-manylinux_2_34_x86_64.whl (4.5 MB view details)

Uploaded Python 3manylinux: glibc 2.34+ x86-64

siteprobe-1.2.1-py3-none-manylinux_2_34_aarch64.whl (4.2 MB view details)

Uploaded Python 3manylinux: glibc 2.34+ ARM64

siteprobe-1.2.1-py3-none-macosx_11_0_arm64.whl (2.1 MB view details)

Uploaded Python 3macOS 11.0+ ARM64

siteprobe-1.2.1-py3-none-macosx_10_12_x86_64.whl (2.1 MB view details)

Uploaded Python 3macOS 10.12+ x86-64

File details

Details for the file siteprobe-1.2.1-py3-none-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for siteprobe-1.2.1-py3-none-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 d1f63309aa82f74abf4c232128c8578402f55d123387ba8e7d82b513c02ae7ef
MD5 959329d34985f6f889b24da6abda1dff
BLAKE2b-256 e18e9dce3e27a114782aefd278da6aff3292b7c56c21d59bb746674f0978faf6

See more details on using hashes here.

File details

Details for the file siteprobe-1.2.1-py3-none-manylinux_2_34_aarch64.whl.

File metadata

File hashes

Hashes for siteprobe-1.2.1-py3-none-manylinux_2_34_aarch64.whl
Algorithm Hash digest
SHA256 4bb4e1bf71dd8bbdba140106ce64aacfb4580e35a0cb8cc8d7d0e1d84df91b67
MD5 811eafc959e341ab018bdfa3650743e8
BLAKE2b-256 c0a6ab1b3e040a3e8bea354a62b6ba59794c74d40ac7f909970ef76f1ae8c8bc

See more details on using hashes here.

File details

Details for the file siteprobe-1.2.1-py3-none-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for siteprobe-1.2.1-py3-none-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 7ee452677b4b74776b709e2f94cd0c73202d517e39ebcc94e2aeb4eb3dfff2f1
MD5 9e3eeb7c9530516f2b63535d40e0902e
BLAKE2b-256 1c7993b89a78117287dba71996c4ff49c538014a334dc92db36cc67cb28667b4

See more details on using hashes here.

File details

Details for the file siteprobe-1.2.1-py3-none-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for siteprobe-1.2.1-py3-none-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 953f0bb548ba88628bd202b256ec3afc26279f89ca9e4fc89d45ad5a8127da8b
MD5 c0aedc0a6203151a2952423732fee5ab
BLAKE2b-256 91ce3d5550b9b07cf5f1579cea5c32d06d3ad4ac6245a398fe52047c45da6d6e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page