ultimate-sitemap-parser

A performant library for parsing and crawling sitemaps

These details have been verified by PyPI

Project links

repository

GitHub Statistics

Maintainers

freddyheppell

These details have not been verified by PyPI

Project links

Project description

Ultimate Sitemap Parser (USP) is a performant and robust Python library for parsing and crawling sitemaps.

Features

Supports all sitemap formats:
Field-tested with ~1 million URLs as part of the Media Cloud project
Error-tolerant with more common sitemap bugs
Tries to find sitemaps not listed in robots.txt
Uses fast and memory efficient Expat XML parsing
Doesn’t consume much memory even with massive sitemap hierarchies
Provides a generated sitemap tree as easy to use object tree
Supports using a custom web client
Uses a small number of actively maintained third-party modules
Reasonably tested

Installation

pip install ultimate-sitemap-parser

or using Anaconda:

conda install -c conda-forge ultimate-sitemap-parser

Usage

from usp.tree import sitemap_tree_for_homepage

tree = sitemap_tree_for_homepage('https://www.example.org/')

for page in tree.all_pages():
    print(page.url)

sitemap_tree_for_homepage() will return a tree of AbstractSitemap subclass objects that represent the sitemap hierarchy found on the website; see a reference of AbstractSitemap subclasses. AbstractSitemap.all_pages() returns a generator to efficiently iterate over pages without loading the entire tree into memory.

For more examples and details, see the documentation.

Project details

These details have been verified by PyPI

Project links

repository

GitHub Statistics

Maintainers

freddyheppell

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.8.0

Jan 25, 2026

1.7.0.post2

Jan 20, 2026

1.7.0.post1

Jan 17, 2026

1.7.0

Jan 11, 2026

1.6.0

Sep 10, 2025

1.5.0

Aug 11, 2025

1.4.0

Apr 23, 2025

1.3.1

Mar 31, 2025

1.3.0

Mar 17, 2025

1.2.0

Feb 18, 2025

1.1.1

Jan 29, 2025

1.1.0

Jan 20, 2025

1.0.0

Jan 13, 2025

1.0.0rc1 pre-release

Dec 18, 2024

0.5

Jul 31, 2019

0.4

Jul 18, 2019

0.3

Jul 17, 2019

0.2

Jul 16, 2019

0.1

Nov 29, 2018

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ultimate_sitemap_parser-1.8.0.tar.gz (39.4 kB view details)

Uploaded Jan 25, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

ultimate_sitemap_parser-1.8.0-py3-none-any.whl (44.2 kB view details)

Uploaded Jan 25, 2026 Python 3

File details

Details for the file ultimate_sitemap_parser-1.8.0.tar.gz.

File metadata

Download URL: ultimate_sitemap_parser-1.8.0.tar.gz
Upload date: Jan 25, 2026
Size: 39.4 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ultimate_sitemap_parser-1.8.0.tar.gz
Algorithm	Hash digest
SHA256	`b89e173a7a30ae8d3fbf5c51e8b72985f2f1240e338064a315f35652c89442bc`
MD5	`1f4a209176912c392f94930f2bf288e6`
BLAKE2b-256	`8e87b4767b5181b0a6b7aafa5021b6b37b3eea104ba8ad80d3b527e8f423c85c`

See more details on using hashes here.

Provenance

The following attestation bundles were made for ultimate_sitemap_parser-1.8.0.tar.gz:

Publisher: publish.yml on GateNLP/ultimate-sitemap-parser

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: ultimate_sitemap_parser-1.8.0.tar.gz
- Subject digest: b89e173a7a30ae8d3fbf5c51e8b72985f2f1240e338064a315f35652c89442bc
- Sigstore transparency entry: 853745038
- Sigstore integration time: Jan 25, 2026
Source repository:
- Permalink: GateNLP/ultimate-sitemap-parser@182f4642f145230b68e7518e627883edd09168ca
- Branch / Tag: refs/tags/1.8.0
- Owner: https://github.com/GateNLP
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@182f4642f145230b68e7518e627883edd09168ca
- Trigger Event: push

File details

Details for the file ultimate_sitemap_parser-1.8.0-py3-none-any.whl.

File metadata

Download URL: ultimate_sitemap_parser-1.8.0-py3-none-any.whl
Upload date: Jan 25, 2026
Size: 44.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ultimate_sitemap_parser-1.8.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`438dcfe8aa8efc4e587b567a4abc8b1d2486c52b5ca52d5c81520f08e0434449`
MD5	`f640936fdaa288676300527aea33c1f6`
BLAKE2b-256	`43966d3eee0013dfebd45b2e3650d4db96c27471f30115dbfcb9e0a002406c1a`

See more details on using hashes here.

Provenance

The following attestation bundles were made for ultimate_sitemap_parser-1.8.0-py3-none-any.whl:

Publisher: publish.yml on GateNLP/ultimate-sitemap-parser

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: ultimate_sitemap_parser-1.8.0-py3-none-any.whl
- Subject digest: 438dcfe8aa8efc4e587b567a4abc8b1d2486c52b5ca52d5c81520f08e0434449
- Sigstore transparency entry: 853745041
- Sigstore integration time: Jan 25, 2026
Source repository:
- Permalink: GateNLP/ultimate-sitemap-parser@182f4642f145230b68e7518e627883edd09168ca
- Branch / Tag: refs/tags/1.8.0
- Owner: https://github.com/GateNLP
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@182f4642f145230b68e7518e627883edd09168ca
- Trigger Event: push

ultimate-sitemap-parser 1.8.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Features

Installation

Usage

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance