Script/Library to read and parse sitemap.xml data
Project description
Site Map Parser
Script and library which reads urls and converts to objects, allows exporting as CSV or JSON.
Handle sitemaps according to: https://www.sitemaps.org/protocol.html
Installation
pip install site-map-parser
Usage
Script usage
smapper $url > /tmp/data.csv
Logs written to ~/sitemap_run.log
Arguments
Argument | Options | Default | Information |
---|---|---|---|
-h | N/A | N/A | Outputs argument data |
url | e.g. http://www.example.com - http://www.example.com/other_sitemap.xml |
N/A | Required - sitemap data to retrieve |
-l, --log | CRITICAL or ERROR or WARNING or INFO or DEBUG |
INFO |
logs to sitemapper_run.log in install folder |
-e, --exporter | csv or json |
csv |
Export format of the data |
Library Usage
from sitemapparser import SiteMapParser
sm = SiteMapParser('http://www.example.com') # reads /sitemap.xml
if sm.has_sitemaps():
sitemaps = sm.get_sitemaps() # returns iterator of sitemapper.Sitemap instances
else:
urls = sm.get_urls() # returns iterator of sitemapper.Url instances
Exporting
Two exporters are available: csv and json
CSV Exporter
from sitemapparser.exporters import CSVExporter
# sm set as per earlier library usage example
csv_exporter = CSVExporter(sm)
if sm.has_sitemaps():
print(csv_exporter.export_sitemaps())
elif sm.has_urls():
print(csv_exporter.export_urls())
JSON Exporter
from sitemapparser.exporters import JSONExporter
# sm set as per earlier library usage example
json_exporter = JSONExporter(sm)
if sm.has_sitemaps():
print(json_exporter.export_sitemaps())
elif sm.has_urls():
print(json_exporter.export_urls())
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
site-map-parser-0.3.6.tar.gz
(7.2 kB
view details)
Built Distribution
File details
Details for the file site-map-parser-0.3.6.tar.gz
.
File metadata
- Download URL: site-map-parser-0.3.6.tar.gz
- Upload date:
- Size: 7.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/20.7.0 requests-toolbelt/0.9.1 tqdm/4.55.1 CPython/3.8.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 68cd05d159f1e81881e9d6c2810b9e25cba59a323b437bf75f58628c3d92148b |
|
MD5 | 6916167428ccebc27bfba9b6f7533e5e |
|
BLAKE2b-256 | b0093749aff94653f926e2b4a1087cd5730146fb235fa0d16eb6bbeb6398b34b |
File details
Details for the file site_map_parser-0.3.6-py3-none-any.whl
.
File metadata
- Download URL: site_map_parser-0.3.6-py3-none-any.whl
- Upload date:
- Size: 11.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/20.7.0 requests-toolbelt/0.9.1 tqdm/4.55.1 CPython/3.8.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | cf92a1e344eefd40a45c85ce112f581c53077e46118c2c1bcfd0b48071c983aa |
|
MD5 | 81b58860390b367b4a74b693b2bab658 |
|
BLAKE2b-256 | 18cd19cdf1de27fcf7648a3a88c4aa7cbd754645969dd7d19f79c10bbff9786c |