Web crawler and sitemap generator.
Project description
Sitemap generator library for python. Fork from https://github.com/Haikson/sitemap-generator.
Installing
pip install nix-sitemap-generator
Usage
1. Import crawler from pysitemap
from pysitemap import crawler
2. Call crawler()
crawler( 'https//site.com', out_file='debug/sitemap.xml', exclude_urls=[".pdf", ".jpg", ".zip"], http_request_options={"ssl": False}, parser=Parser )
Example
import sys import logging from pysitemap import crawler from pysitemap.parsers.lxml_parser import Parser if __name__ == '__main__': if '--iocp' in sys.argv: from asyncio import events, windows_events sys.argv.remove('--iocp') logging.info('using iocp') el = windows_events.ProactorEventLoop() events.set_event_loop(el) # root_url = sys.argv[1] root_url = 'https://www.haikson.com' crawler( root_url, out_file='debug/sitemap.xml', exclude_urls=[".pdf", ".jpg", ".zip"], http_request_options={"ssl": False}, parser=Parser )
Changes
v. 0.10.1
Refactored the code to make it more readable.
Removed prints() calls from code.
Added verbose mode to crawler().
Added type hints to crawler() arguments.
Add ValueError handling when try to add_signal_handler()
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file nix-sitemap-generator-0.10.1.tar.gz
.
File metadata
- Download URL: nix-sitemap-generator-0.10.1.tar.gz
- Upload date:
- Size: 13.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.11.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5c448ead092a83ae14c61cfb95b1dfe3f8b627e8e8401a01e69750f91f3fab9b |
|
MD5 | 69e8368411be6ccb83b3bf01473e3164 |
|
BLAKE2b-256 | a9509a4861b00146dfb984a6bed78c7e5bbc07b127aa2b554a38db6f11d21889 |
File details
Details for the file nix_sitemap_generator-0.10.1-py3-none-any.whl
.
File metadata
- Download URL: nix_sitemap_generator-0.10.1-py3-none-any.whl
- Upload date:
- Size: 16.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.11.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0f28d09d7aafa0e61a23846a1ab5e01b37fe894c442209e99c3f80a56e75ba75 |
|
MD5 | d40d552f4a9a8a717adc6f1dbb48f75d |
|
BLAKE2b-256 | 03c7762813012d878fb9a59dd0918578327df55eea004034a7b102e12586f9a1 |