Skip to main content

Script/Library to read and parse sitemap.xml data

Project description

Site Map Parser

Script and library which reads urls and converts to objects, allows exporting as CSV or JSON.

Handle sitemaps according to: https://www.sitemaps.org/protocol.html

Installation

pip install site-map-parser

Usage

Script usage

smapper $url > /tmp/data.csv

Logs written to ~/sitemap_run.log

Arguments

Argument Options Default Information
-h N/A N/A Outputs argument data
url e.g. http://www.example.com - http://www.example.com/other_sitemap.xml N/A Required - sitemap data to retrieve
-l, --log CRITICAL or ERROR or WARNING or INFO or DEBUG INFO logs to sitemapper_run.log in install folder
-e, --exporter csv or json csv Export format of the data

Library Usage

from sitemapparser import SiteMapParser

sm = SiteMapParser('http://www.example.com')    # reads /sitemap.xml
if sm.has_sitemaps():
    sitemaps = sm.get_sitemaps() # returns iterator of sitemapper.Sitemap instances
else:
    urls = sm.get_urls()         # returns iterator of sitemapper.Url instances

Exporting

Two exporters are available: csv and json

CSV Exporter
from sitemapparser.exporters import CSVExporter

# sm set as per earlier library usage example

csv_exporter = CSVExporter(sm)
if sm.has_sitemaps():
    print(csv_exporter.export_sitemaps())
elif sm.has_urls():
    print(csv_exporter.export_urls())
JSON Exporter
from sitemapparser.exporters import JSONExporter

# sm set as per earlier library usage example

json_exporter = JSONExporter(sm)
if sm.has_sitemaps():
    print(json_exporter.export_sitemaps())
elif sm.has_urls():
    print(json_exporter.export_urls())

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

site-map-parser-0.3.7.tar.gz (7.6 kB view details)

Uploaded Source

Built Distribution

site_map_parser-0.3.7-py3-none-any.whl (11.2 kB view details)

Uploaded Python 3

File details

Details for the file site-map-parser-0.3.7.tar.gz.

File metadata

  • Download URL: site-map-parser-0.3.7.tar.gz
  • Upload date:
  • Size: 7.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.10.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.8.9

File hashes

Hashes for site-map-parser-0.3.7.tar.gz
Algorithm Hash digest
SHA256 f38ef4174e5ca7477d7d7d323ee8816698714db7bda7483a626f1dcb3722a5f8
MD5 8e959c0ffa5b6c4a75dcf919e4c0e892
BLAKE2b-256 87e2bead8c39cd1d99ad9bfe875ee6c3dcdc0b367708c34cd8466f6f55a5f09d

See more details on using hashes here.

File details

Details for the file site_map_parser-0.3.7-py3-none-any.whl.

File metadata

  • Download URL: site_map_parser-0.3.7-py3-none-any.whl
  • Upload date:
  • Size: 11.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.10.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.8.9

File hashes

Hashes for site_map_parser-0.3.7-py3-none-any.whl
Algorithm Hash digest
SHA256 a884ec82425318d1dcb42b0bb00f53ff18a6d45f588ebb66b17fc92c1bb1300e
MD5 626356b7ea16b503d93f53e25247b63b
BLAKE2b-256 4dff02d1308f56e87255ebb05e274c37ba5ecae561257f34cfea62ce3362762c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page