Skip to main content

A set of data tools in Python

Project description

PRs Welcome License:MIT PyPi:Find-Sitemap Code style: black

Find-Sitemap

Find Sitemap is a simple SEO tool to help you find the sitemap.

>>> from Find_Sitemap import FindSitemap
>>> main = FindSitemap('google.com')
>>> main.crawl()
...
...
check 13801/13804: https://google.com/xmap.php
check 13802/13804: https://google.com/xmap.jsp
check 13803/13804: https://google.com/xmap.asp
check 13804/13804: https://google.com/xmap.html
--------------------
Find sitemap urls len: 1
Find sitemap urls list: ['https://www.google.com/sitemap.xml']

Getting Started

Installing Requests on PyPI:

$ pip install Find-Sitemap

Prerequisites

Usage

  1. Show the subdomains, slugs_L1, slugs_L2, filetypes parameters.

    >>> from Find_Sitemap import FindSitemap
    >>> main = FindSitemap('google.com')
    >>> main.subdomains
    {'www.'}
    
    >>> main.slugs_L1
    {'/default', '/sitemap', '/feeds', '/api', '/contents' ...}
    
    >>> main.slugs_L2
    {'/sitemap', '/stock', '/sitemap1', '/sitemap0', ...}
    
    >>> main.filetypes
    {'txt', 'xml', 'xml.gz', 'jsp', 'html', ...}
    
  2. Add the subdomains, slugs_L1, slugs_L2, filetypes parameters.

    >>> from Find_Sitemap import FindSitemap
    >>> main = FindSitemap('google.com')
    >>> main.subdomains.add("shop.")
    >>> main.slugs_L1.add("/node")
    >>> main.slugs_L2.add("/site")
    >>> main.filetypes.add("xml")
    
  3. Remove the subdomains, slugs_L1, slugs_L2, filetypes parameters.

    >>> from Find_Sitemap import FindSitemap
    >>> main = FindSitemap('google.com')
    >>> main.subdomains.remove("shop.")
    >>> main.slugs_L1.remove("/node")
    >>> main.slugs_L2.remove("/site")
    >>> main.filetypes.remove("xml")
    
  4. Run the crawler.

    >>> from Find_Sitemap import FindSitemap
    >>> main = FindSitemap('google.com')
    >>> main.crawl()
    ...
    ...
    check 13801/13804: https://google.com/xmap.php
    check 13802/13804: https://google.com/xmap.jsp
    check 13803/13804: https://google.com/xmap.asp
    check 13804/13804: https://google.com/xmap.html
    --------------------
    Find sitemap urls len: 1
    Find sitemap urls list: ['https://www.google.com/sitemap.xml']
    

Contributing

Authors

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Find_Sitemap-0.1.4.tar.gz (8.8 kB view hashes)

Uploaded Source

Built Distribution

Find_Sitemap-0.1.4-py3-none-any.whl (10.2 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page