Ultimate Sitemap Parser
Project description
Website sitemap parser for Python 3.5+.
Features
Supports multiple sitemap formats:
Field-tested with ~1 million URLs as part of the Media Cloud project
Error-tolerant with more common sitemap bugs
Uses fast and memory efficient Expat XML parsing
Provides a generated sitemap tree as easy to use object tree
Supports using a custom web client
Uses a small number of actively maintained third-party modules
Reasonably tested
Installation
pip install ultimate_sitemap_parser
Usage
from usp.tree import sitemap_tree_for_homepage
tree = sitemap_tree_for_homepage('https://www.nytimes.com/')
print(tree.all_pages())
Check out the API reference in the documentation for more details.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Close
Hashes for ultimate_sitemap_parser-0.1.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 43241b5538cd48297dc34e01aa168d10d99386e3771a5959d80238a8b56f7086 |
|
MD5 | 4a89ff1aa74e4ec8437d17815bd16af9 |
|
BLAKE2b-256 | dcc983a28a62dbbf8e1ef86d4d5db76204fc9d4bd062b2af3750596ea44fc537 |
Close
Hashes for ultimate_sitemap_parser-0.1-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 52a9da9de8cbd9c04e3b1e001dfd21b0620954062070f8d3ea2992f3228f06e6 |
|
MD5 | 692a04b92c908b92ede07de796606f49 |
|
BLAKE2b-256 | f1fba7c2d10935fdba98fa2b940fae9840db95c53744a8881bd1a33b58942e02 |