Fetch a given sitemap and retrieve all URLs in it.
Project description
fetch-sitemap
Retrieves all URLs of a given sitemap.xml URL and fetches each page one by one. Useful for (load) testing the entire site for error responses.
Note: The default concurrency limit is 5, so five URLs are fetched at once.
Depending on your server's worker count, this might already be enough to DoS it.
Try --concurrency-limit=2 and increase if you feel comfortable.
Usage: fetch-sitemap [-h] [--basic-auth BASIC_AUTH] [-l LIMIT] [-c CONCURRENCY_LIMIT]
[-t REQUEST_TIMEOUT] [--random] [--report-path REPORT_PATH]
[-o OUTPUT] [-v]
sitemap_url
Fetch a given sitemap and retrieve all URLs in it.
Positional Arguments:
sitemap_url URL of the sitemap to fetch
Options:
-h, --help show this help message and exit
--basic-auth BASIC_AUTH
Basic auth information. Use: 'username:password' (default: None)
-l, --limit LIMIT Maximum number of URLs to fetch from the given sitemap.xml
(default: None)
-c, --concurrency-limit CONCURRENCY_LIMIT
Max number of concurrent requests (default: 5)
-t, --request-timeout REQUEST_TIMEOUT
Timeout for fetching a URL in seconds (default: 30)
--random Append a random string like ?12334232343 to each URL to bypass
frontend cache (default: False)
--report-path REPORT_PATH
Store results in a CSV file (example: ./report.csv) (default:
None)
-o, --output-dir OUTPUT
Store all fetched sitemap documents in this folder (default: None)
-v, --version Show program's version number and exit
🤺 Local Development
poetry install
poetry run fetch-sitemap -h
poetry run ./tests.sh
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file fetch_sitemap-15.tar.gz.
File metadata
- Download URL: fetch_sitemap-15.tar.gz
- Upload date:
- Size: 6.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.2 CPython/3.12.3 Darwin/23.4.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1a7489380561ca49967f8cd4d2ba640de3ce1392909e55552348fd6e390806ff
|
|
| MD5 |
92b4d2e1f79a0b88b3ccc52ea435beb1
|
|
| BLAKE2b-256 |
a12330fd08f9dde8146dd4969a5088ca4fec86b994ad24fcc9588952339fbbda
|
File details
Details for the file fetch_sitemap-15-py3-none-any.whl.
File metadata
- Download URL: fetch_sitemap-15-py3-none-any.whl
- Upload date:
- Size: 6.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.2 CPython/3.12.3 Darwin/23.4.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
44bae0733ccf67206705018ec671b14ff363c796e1d449d07fac5d018ec535d8
|
|
| MD5 |
7b57c4c616dbccbbc640df1b72847552
|
|
| BLAKE2b-256 |
0fa9867fa95c85ae073f7829de047c2feacacb754543f8cca84cbff67e681696
|