Script/Library to read and parse sitemap.xml data
Project description
Script and library which reads urls and converts to objects, allows exporting as CSV or JSON.
Handle sitemaps according to: https://www.sitemaps.org/protocol.html
Installation
pip install site-map-parser
Usage
Script usage
smapper $url > /tmp/data.csv
Logs written to ~/sitemap_run.log
Arguments
Argument |
Options |
Default |
Information |
---|---|---|---|
-h |
N/A |
N/A |
Outputs argument data |
url |
e.g. http://www .example.com `` - ``http://www .example.com /other_sitem ap.xml |
N/A |
Required - sitemap data to retrieve |
-l, –log |
CRITICAL or ERROR or WARNING or INFO or DEBUG |
INFO |
logs to sitemapper_ run.log in install folder |
-e, –exporter |
csv or json |
csv |
Export format of the data |
Library Usage
from sitemapparser import SiteMapParser
sm = SiteMapParser('http://www.example.com') # reads /sitemap.xml
if sm.has_sitemaps():
sitemaps = sm.get_sitemaps() # returns iterator of sitemapper.Sitemap instances
else:
urls = sm.get_urls() # returns iterator of sitemapper.Url instances
Exporting
Two exporters are available: csv and json
CSV Exporter
from sitemapparser.exporters import CSVExporter
# sm set as per earlier library usage example
csv_exporter = CSVExporter(sm)
if sm.has_sitemaps():
print(csv_exporter.export_sitemaps())
elif sm.has_urls():
print(csv_exporter.export_urls())
JSON Exporter
from sitemapparser.exporters import JSONExporter
# sm set as per earlier library usage example
json_exporter = JSONExporter(sm)
if sm.has_sitemaps():
print(json_exporter.export_sitemaps())
elif sm.has_urls():
print(json_exporter.export_urls())
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
site-map-parser-0.3.0.tar.gz
(7.6 kB
view hashes)
Built Distribution
Close
Hashes for site_map_parser-0.3.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | dda938f21f41623b0af4fec95283ef826207720c052621fc7e0a5a46bda66bf1 |
|
MD5 | 20089864e44a66fa57282f1d6bd047a1 |
|
BLAKE2b-256 | 2280a93345b271e919304ed7dbaa3ecc671668fa3fba0a292a0a49be77aa9cf6 |