Script/Library to read and parse sitemap.xml data
Project description
Script and library which reads urls and converts to objects, allows exporting as CSV or JSON.
Handle sitemaps according to: https://www.sitemaps.org/protocol.html
Installation
pip install site-map-parser
Usage
Script usage
smapper $url > /tmp/data.csv
Arguments
Argument |
Options |
Default |
Information |
---|---|---|---|
-h |
N/A |
N/A |
Outputs argument data |
url |
e.g. http://www .example.com `` - ``http://www .example.com /other_sitem ap.xml |
N/A |
Required - sitemap data to retrieve |
-l, –log |
CRITICAL or ERROR or WARNING or INFO or DEBUG |
INFO |
logs to sitemapper_ run.log in install folder |
-e, –exporter |
csv or json |
csv |
Export format of the data |
Library Usage
from sitemapparser import SiteMapParser
sm = SiteMapParser('http://www.example.com') # reads /sitemap.xml
if sm.has_sitemaps():
sitemaps = sm.getSitemaps() # returns generator of sitemapper.Sitemap instances
else:
urls = sm.getUrls() # returns generator of sitemapper.Url instances
Exporting
Two exporters are available: csv and json
CSV Exporter
from sitemapparser.exporters import CSVExporter
# sm set as per earlier library usage example
csv_exporter = CSVExporter(sm)
if sm.has_sitemaps():
print(csv_exporter.export_sitemaps())
elif sm.has_urls():
print(csv_exporter.export_urls())
JSON Exporter
from sitemapparser.exporters import JSONExporter
# sm set as per earlier library usage example
json_exporter = JSONExporter(sm)
if sm.has_sitemaps():
print(json_exporter.export_sitemaps())
elif sm.has_urls():
print(json_exporter.export_urls())
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
site-map-parser-0.1.8.tar.gz
(7.0 kB
view hashes)
Built Distribution
Close
Hashes for site_map_parser-0.1.8-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 77e9187fabd323b4ad820d89772a82abe001b3dd510d21c8fb3c0918afe79c76 |
|
MD5 | ce2d3afc976f5cec6520104d5a1ae5a2 |
|
BLAKE2b-256 | 9fb511d5ed8a096d381239099eaed458fe03244dd4313491b9de2e7946a4bbed |