Web crawler and sitemap generator.
Project description
Sitemap generator library for python. Fork from https://github.com/Haikson/sitemap-generator.
Installing
pip install nix-sitemap-generator
Usage
1. Import crawler from pysitemap
from pysitemap import crawler
2. Call crawler()
crawler( 'https//site.com', out_file='debug/sitemap.xml', exclude_urls=[".pdf", ".jpg", ".zip"], http_request_options={"ssl": False}, parser=Parser )
Example
import sys import logging from pysitemap import crawler from pysitemap.parsers.lxml_parser import Parser if __name__ == '__main__': if '--iocp' in sys.argv: from asyncio import events, windows_events sys.argv.remove('--iocp') logging.info('using iocp') el = windows_events.ProactorEventLoop() events.set_event_loop(el) # root_url = sys.argv[1] root_url = 'https://www.haikson.com' crawler( root_url, out_file='debug/sitemap.xml', exclude_urls=[".pdf", ".jpg", ".zip"], http_request_options={"ssl": False}, parser=Parser )
Changes
v. 0.10.1
Refactored the code to make it more readable.
Removed prints() calls from code.
Added verbose mode to crawler().
Added type hints to crawler() arguments.
Add ValueError handling when try to add_signal_handler()
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Close
Hashes for nix-sitemap-generator-0.10.1.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5c448ead092a83ae14c61cfb95b1dfe3f8b627e8e8401a01e69750f91f3fab9b |
|
MD5 | 69e8368411be6ccb83b3bf01473e3164 |
|
BLAKE2b-256 | a9509a4861b00146dfb984a6bed78c7e5bbc07b127aa2b554a38db6f11d21889 |
Close
Hashes for nix_sitemap_generator-0.10.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0f28d09d7aafa0e61a23846a1ab5e01b37fe894c442209e99c3f80a56e75ba75 |
|
MD5 | d40d552f4a9a8a717adc6f1dbb48f75d |
|
BLAKE2b-256 | 03c7762813012d878fb9a59dd0918578327df55eea004034a7b102e12586f9a1 |