Retrieve all URLs from a sitemap.
Project description
getsitemap
.. image:: https://readthedocs.org/projects/getsitemap/badge/?version=latest :target: https://getsitemap.readthedocs.io/en/latest/?badge=latest :alt: Documentation Status
.. image:: https://badge.fury.io/py/getsitemap.svg :target: https://badge.fury.io/py/getsitemap
.. image:: https://img.shields.io/pypi/dm/getsitemap :target: https://pypistats.org/packages/getsitemap
.. image:: https://img.shields.io/pypi/l/getsitemap :target: https://github.com/capjamesg/getsitemap/blob/main/LICENSE
.. image:: https://img.shields.io/pypi/pyversions/getsitemap :target: https://badge.fury.io/py/getsitemap |
getsitemap is a Python library that retrieves all of the URLs that are found in all of the sitemaps on a website.
This project may be useful if you are building a search crawler or sitemap URL status code validators.
You can read the documentation for this project on Read the Docs <https://getsitemap.readthedocs.io/en/latest/>_.
Installation 💻
To get started, pip install getsitemap:
::
pip install getsitemap
Quickstart ⚡
get all URLs recursively in all sitemaps
.. code-block:: python
import getsitemap
urls = getsitemap.get_individual_sitemap("https://jamesg.blog/sitemap.xml")
print(urls)
get all URLs in a single sitemap
.. code-block:: python
import getsitemap
all_urls = getsitemap.retrieve_sitemap_urls("https://sitemap")
print(all_urls)
Code Quality
This library uses tox, pytest, and flake8 to assure code quality.
To run code quality checks, run the following command:
.. code-block:: bash
tox
License 👩⚖️
This project is licensed under an MIT License <LICENSE>_.
Contributing 🛠️
We would love to have your help in improving getsitemap. Have an idea for a new feature or a bug to fix? Leave information in a GitHub Issue to start a discussion!
If you have
Contributors 💻
- capjamesg
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file getsitemap-0.1.5.tar.gz.
File metadata
- Download URL: getsitemap-0.1.5.tar.gz
- Upload date:
- Size: 5.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ecfdd3fe514c71412bb6493815c4342794cfd524e73ece566be87bb506b2f2a2
|
|
| MD5 |
007d0c541fcc44e106efc8f946f5b47c
|
|
| BLAKE2b-256 |
e58d8ee0d916b3533a5fc79591b1f4948529eb62b2eeb08ee64fde0e5ae92b14
|
File details
Details for the file getsitemap-0.1.5-py3-none-any.whl.
File metadata
- Download URL: getsitemap-0.1.5-py3-none-any.whl
- Upload date:
- Size: 5.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e94abfba4f9c0fa6be0c0592e3af8c9ca4c24806b5ea0aca105c298339e31db9
|
|
| MD5 |
10c5441bf8e2ccc85c7f85fcd4efd26f
|
|
| BLAKE2b-256 |
d5a9f02f609f964bd97a70461e3e38753b92e110f0d87512341c41c49af99c1c
|