Skip to main content

Script/Library to read and parse sitemap.xml data

Project description

Site Map Parser

Script and library which reads urls and converts to objects, allows exporting as CSV or JSON.

Handle sitemaps according to: https://www.sitemaps.org/protocol.html

Installation

pip install site-map-parser

Usage

Script usage

smapper $url > /tmp/data.csv

Logs written to ~/sitemap_run.log

Arguments

Argument Options Default Information
-h N/A N/A Outputs argument data
url e.g. http://www.example.com - http://www.example.com/other_sitemap.xml N/A Required - sitemap data to retrieve
-l, --log CRITICAL or ERROR or WARNING or INFO or DEBUG INFO logs to sitemapper_run.log in install folder
-e, --exporter csv or json csv Export format of the data

Library Usage

from sitemapparser import SiteMapParser

sm = SiteMapParser('http://www.example.com')    # reads /sitemap.xml
if sm.has_sitemaps():
    sitemaps = sm.get_sitemaps() # returns iterator of sitemapper.Sitemap instances
else:
    urls = sm.get_urls()         # returns iterator of sitemapper.Url instances

Exporting

Two exporters are available: csv and json

CSV Exporter
from sitemapparser.exporters import CSVExporter

# sm set as per earlier library usage example

csv_exporter = CSVExporter(sm)
if sm.has_sitemaps():
    print(csv_exporter.export_sitemaps())
elif sm.has_urls():
    print(csv_exporter.export_urls())
JSON Exporter
from sitemapparser.exporters import JSONExporter

# sm set as per earlier library usage example

json_exporter = JSONExporter(sm)
if sm.has_sitemaps():
    print(json_exporter.export_sitemaps())
elif sm.has_urls():
    print(json_exporter.export_urls())

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for site-map-parser, version 0.3.3
Filename, size File type Python version Upload date Hashes
Filename, size site_map_parser-0.3.3-py3-none-any.whl (11.2 kB) File type Wheel Python version py3 Upload date Hashes View hashes
Filename, size site-map-parser-0.3.3.tar.gz (7.2 kB) File type Source Python version None Upload date Hashes View hashes

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN SignalFx SignalFx Supporter DigiCert DigiCert EV certificate StatusPage StatusPage Status page