Script/Library to read and parse sitemap.xml data
Project description
Script and library which reads urls and converts to objects, allows exporting as CSV or JSON.
Handle sitemaps according to: https://www.sitemaps.org/protocol.html
Installation
pip install site-map-parser
Usage
Script usage
smapper $url > /tmp/data.csv
Logs written to ~/sitemap_run.log
Arguments
Argument |
Options |
Default |
Information |
---|---|---|---|
-h |
N/A |
N/A |
Outputs argument data |
url |
e.g. http://www .example.com `` - ``http://www .example.com /other_sitem ap.xml |
N/A |
Required - sitemap data to retrieve |
-l, –log |
CRITICAL or ERROR or WARNING or INFO or DEBUG |
INFO |
logs to sitemapper_ run.log in install folder |
-e, –exporter |
csv or json |
csv |
Export format of the data |
Library Usage
from sitemapparser import SiteMapParser
sm = SiteMapParser('http://www.example.com') # reads /sitemap.xml
if sm.has_sitemaps():
sitemaps = sm.getSitemaps() # returns generator of sitemapper.Sitemap instances
else:
urls = sm.getUrls() # returns generator of sitemapper.Url instances
Exporting
Two exporters are available: csv and json
CSV Exporter
from sitemapparser.exporters import CSVExporter
# sm set as per earlier library usage example
csv_exporter = CSVExporter(sm)
if sm.has_sitemaps():
print(csv_exporter.export_sitemaps())
elif sm.has_urls():
print(csv_exporter.export_urls())
JSON Exporter
from sitemapparser.exporters import JSONExporter
# sm set as per earlier library usage example
json_exporter = JSONExporter(sm)
if sm.has_sitemaps():
print(json_exporter.export_sitemaps())
elif sm.has_urls():
print(json_exporter.export_urls())
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
site-map-parser-0.1.11.tar.gz
(7.0 kB
view hashes)
Built Distribution
Close
Hashes for site_map_parser-0.1.11-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7bfb54a971d05f6476f7a8dc85b921f0d393427dd03d5a0d886dda28bc2a3885 |
|
MD5 | 5930d9aa7ad2a8d38802fa4ac153f2ee |
|
BLAKE2b-256 | 0bc12034f48d4ee5fb865e27c9ef7a9d56123fc7fa72032761354e8dfa1efaac |