Skip to main content

Script/Library to read and parse sitemap.xml data

Project description

Script and library which reads urls and converts to objects, allows exporting as CSV or JSON.

Handle sitemaps according to: https://www.sitemaps.org/protocol.html

Installation

pip install site-map-parser

Usage

Script usage

smapper $url > /tmp/data.csv

Logs written to ~/sitemap_run.log

Arguments

Argument

Options

Default

Information

-h

N/A

N/A

Outputs argument data

url

e.g. http://www .example.com `` - ``http://www .example.com /other_sitem ap.xml

N/A

Required - sitemap data to retrieve

-l, –log

CRITICAL or ERROR or WARNING or INFO or DEBUG

INFO

logs to sitemapper_ run.log in install folder

-e, –exporter

csv or json

csv

Export format of the data

Library Usage

from sitemapparser import SiteMapParser

sm = SiteMapParser('http://www.example.com')    # reads /sitemap.xml
if sm.has_sitemaps():
    sitemaps = sm.getSitemaps() # returns generator of sitemapper.Sitemap instances
else:
    urls = sm.getUrls()         # returns generator of sitemapper.Url instances

Exporting

Two exporters are available: csv and json

CSV Exporter
from sitemapparser.exporters import CSVExporter

# sm set as per earlier library usage example

csv_exporter = CSVExporter(sm)
if sm.has_sitemaps():
    print(csv_exporter.export_sitemaps())
elif sm.has_urls():
    print(csv_exporter.export_urls())
JSON Exporter
from sitemapparser.exporters import JSONExporter

# sm set as per earlier library usage example

json_exporter = JSONExporter(sm)
if sm.has_sitemaps():
    print(json_exporter.export_sitemaps())
elif sm.has_urls():
    print(json_exporter.export_urls())

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

site-map-parser-0.1.11.tar.gz (7.0 kB view details)

Uploaded Source

Built Distribution

site_map_parser-0.1.11-py3-none-any.whl (10.4 kB view details)

Uploaded Python 3

File details

Details for the file site-map-parser-0.1.11.tar.gz.

File metadata

  • Download URL: site-map-parser-0.1.11.tar.gz
  • Upload date:
  • Size: 7.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.6.0 requests-toolbelt/0.9.1 tqdm/4.38.0 CPython/3.7.5

File hashes

Hashes for site-map-parser-0.1.11.tar.gz
Algorithm Hash digest
SHA256 093a1985ba7408ae7ab9c203b3c275c6d8be409a6d75097650bc0f833ed38a94
MD5 cf4b9b498b72f9a19d220d106ead4cfa
BLAKE2b-256 d7790a7963736c16bb68bffa3b008d625483f57e028d19837a3c5571732b7d1e

See more details on using hashes here.

File details

Details for the file site_map_parser-0.1.11-py3-none-any.whl.

File metadata

  • Download URL: site_map_parser-0.1.11-py3-none-any.whl
  • Upload date:
  • Size: 10.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.6.0 requests-toolbelt/0.9.1 tqdm/4.38.0 CPython/3.7.5

File hashes

Hashes for site_map_parser-0.1.11-py3-none-any.whl
Algorithm Hash digest
SHA256 7bfb54a971d05f6476f7a8dc85b921f0d393427dd03d5a0d886dda28bc2a3885
MD5 5930d9aa7ad2a8d38802fa4ac153f2ee
BLAKE2b-256 0bc12034f48d4ee5fb865e27c9ef7a9d56123fc7fa72032761354e8dfa1efaac

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page