Skip to main content

Ultimate Sitemap Parser

Project description

Build Status Documentation Status Coverage Status

Website sitemap parser for Python 3.5+.

Features

  • Supports multiple sitemap formats:

  • Field-tested with ~1 million URLs as part of the Media Cloud project

  • Error-tolerant with more common sitemap bugs

  • Uses fast and memory efficient Expat XML parsing

  • Provides a generated sitemap tree as easy to use object tree

  • Supports using a custom web client

  • Uses a small number of actively maintained third-party modules

  • Reasonably tested

Installation

pip install ultimate_sitemap_parser

Usage

from usp.tree import sitemap_tree_for_homepage

tree = sitemap_tree_for_homepage('https://www.nytimes.com/')
print(tree.all_pages())

Check out the API reference in the documentation for more details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ultimate_sitemap_parser-0.1.tar.gz (13.5 kB view hashes)

Uploaded Source

Built Distribution

ultimate_sitemap_parser-0.1-py2.py3-none-any.whl (16.0 kB view hashes)

Uploaded Python 2 Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page