Retrieve all URLs from a sitemap.
Project description
getsitemap
.. image:: https://readthedocs.org/projects/getsitemap/badge/?version=latest :target: https://getsitemap.readthedocs.io/en/latest/?badge=latest :alt: Documentation Status
.. image:: https://badge.fury.io/py/getsitemap.svg :target: https://badge.fury.io/py/getsitemap
.. image:: https://img.shields.io/pypi/dm/getsitemap :target: https://pypistats.org/packages/getsitemap
.. image:: https://img.shields.io/pypi/l/getsitemap :target: https://github.com/capjamesg/getsitemap/blob/main/LICENSE
.. image:: https://img.shields.io/pypi/pyversions/getsitemap :target: https://badge.fury.io/py/getsitemap |
getsitemap is a Python library that retrieves all of the URLs that are found in all of the sitemaps on a website.
This project may be useful if you are building a search crawler or sitemap URL status code validators.
You can read the documentation for this project on Read the Docs <https://getsitemap.readthedocs.io/en/latest/>
_.
Installation 💻
To get started, pip install getsitemap
:
::
pip install getsitemap
Quickstart ⚡
get all URLs recursively in all sitemaps
.. code-block:: python
import getsitemap
urls = getsitemap.get_individual_sitemap("https://jamesg.blog/sitemap.xml")
print(urls)
get all URLs in a single sitemap
.. code-block:: python
import getsitemap
all_urls = getsitemap.retrieve_sitemap_urls("https://sitemap")
print(all_urls)
Code Quality
This library uses tox
, pytest
, and flake8
to assure code quality.
To run code quality checks, run the following command:
.. code-block:: bash
tox
License 👩⚖️
This project is licensed under an MIT License <LICENSE>
_.
Contributing 🛠️
We would love to have your help in improving getsitemap
. Have an idea for a new feature or a bug to fix? Leave information in a GitHub Issue to start a discussion!
If you have
Contributors 💻
- capjamesg
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file getsitemap-0.1.5.tar.gz
.
File metadata
- Download URL: getsitemap-0.1.5.tar.gz
- Upload date:
- Size: 5.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ecfdd3fe514c71412bb6493815c4342794cfd524e73ece566be87bb506b2f2a2 |
|
MD5 | 007d0c541fcc44e106efc8f946f5b47c |
|
BLAKE2b-256 | e58d8ee0d916b3533a5fc79591b1f4948529eb62b2eeb08ee64fde0e5ae92b14 |
File details
Details for the file getsitemap-0.1.5-py3-none-any.whl
.
File metadata
- Download URL: getsitemap-0.1.5-py3-none-any.whl
- Upload date:
- Size: 5.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e94abfba4f9c0fa6be0c0592e3af8c9ca4c24806b5ea0aca105c298339e31db9 |
|
MD5 | 10c5441bf8e2ccc85c7f85fcd4efd26f |
|
BLAKE2b-256 | d5a9f02f609f964bd97a70461e3e38753b92e110f0d87512341c41c49af99c1c |