Skip to main content

Retrieve all URLs from a sitemap.

Project description

getsitemap

.. image:: https://readthedocs.org/projects/getsitemap/badge/?version=latest :target: https://getsitemap.readthedocs.io/en/latest/?badge=latest :alt: Documentation Status

.. image:: https://badge.fury.io/py/getsitemap.svg :target: https://badge.fury.io/py/getsitemap

.. image:: https://img.shields.io/pypi/dm/getsitemap :target: https://pypistats.org/packages/getsitemap

.. image:: https://img.shields.io/pypi/l/getsitemap :target: https://github.com/capjamesg/getsitemap/blob/main/LICENSE

.. image:: https://img.shields.io/pypi/pyversions/getsitemap :target: https://badge.fury.io/py/getsitemap |

getsitemap is a Python library that retrieves all of the URLs that are found in all of the sitemaps on a website.

This project may be useful if you are building a search crawler or sitemap URL status code validators.

You can read the documentation for this project on Read the Docs <https://getsitemap.readthedocs.io/en/latest/>_.

Installation 💻

To get started, pip install getsitemap:

::

pip install getsitemap

Quickstart ⚡

get all URLs recursively in all sitemaps


.. code-block:: python

   import getsitemap

   urls = getsitemap.get_individual_sitemap("https://jamesg.blog/sitemap.xml")

   print(urls)

get all URLs in a single sitemap

.. code-block:: python

import getsitemap

all_urls = getsitemap.retrieve_sitemap_urls("https://sitemap")

print(all_urls)

Code Quality

This library uses tox, pytest, and flake8 to assure code quality.

To run code quality checks, run the following command:

.. code-block:: bash

tox

License 👩‍⚖️

This project is licensed under an MIT License <LICENSE>_.

Contributing 🛠️

We would love to have your help in improving getsitemap. Have an idea for a new feature or a bug to fix? Leave information in a GitHub Issue to start a discussion!

If you have

Contributors 💻

  • capjamesg

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

getsitemap-0.1.5.tar.gz (5.6 kB view details)

Uploaded Source

Built Distribution

getsitemap-0.1.5-py3-none-any.whl (5.4 kB view details)

Uploaded Python 3

File details

Details for the file getsitemap-0.1.5.tar.gz.

File metadata

  • Download URL: getsitemap-0.1.5.tar.gz
  • Upload date:
  • Size: 5.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.12

File hashes

Hashes for getsitemap-0.1.5.tar.gz
Algorithm Hash digest
SHA256 ecfdd3fe514c71412bb6493815c4342794cfd524e73ece566be87bb506b2f2a2
MD5 007d0c541fcc44e106efc8f946f5b47c
BLAKE2b-256 e58d8ee0d916b3533a5fc79591b1f4948529eb62b2eeb08ee64fde0e5ae92b14

See more details on using hashes here.

File details

Details for the file getsitemap-0.1.5-py3-none-any.whl.

File metadata

  • Download URL: getsitemap-0.1.5-py3-none-any.whl
  • Upload date:
  • Size: 5.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.12

File hashes

Hashes for getsitemap-0.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 e94abfba4f9c0fa6be0c0592e3af8c9ca4c24806b5ea0aca105c298339e31db9
MD5 10c5441bf8e2ccc85c7f85fcd4efd26f
BLAKE2b-256 d5a9f02f609f964bd97a70461e3e38753b92e110f0d87512341c41c49af99c1c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page