web crawler and sitemap generator.
Project description
Sitemap generator
installing
pip install sitemap-generator
requirements
asyncio aiofile aiohttp
example
import sys import logging from pysitemap import crawler from pysitemap.parsers.lxml_parser import Parser if __name__ == '__main__': if '--iocp' in sys.argv: from asyncio import events, windows_events sys.argv.remove('--iocp') logging.info('using iocp') el = windows_events.ProactorEventLoop() events.set_event_loop(el) # root_url = sys.argv[1] root_url = 'https://www.haikson.com' crawler( root_url, out_file='debug/sitemap.xml', exclude_urls=[".pdf", ".jpg", ".zip"], http_request_options={"ssl": False}, parser=Parser )
TODO
big sites with count of pages more then 100K will use more then 100MB memory. Move queue and done lists into database. Write Queue and Done backend classes based on
Lists
SQLite database
Redis
Write api for extending by user backends
changelog
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
sitemap-generator-0.9.13.tar.gz
(13.5 kB
view details)
Built Distribution
File details
Details for the file sitemap-generator-0.9.13.tar.gz
.
File metadata
- Download URL: sitemap-generator-0.9.13.tar.gz
- Upload date:
- Size: 13.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 62ed54b45e7d3c3380a10bc877f3f213a4b13ac188d168da7d7aae10902c9327 |
|
MD5 | c1e13d2fc27e433f344217e84de5f148 |
|
BLAKE2b-256 | 7cd9a67678449c608eba9ad2eebe9c4189d18cb529f1a3f889abc246b2666631 |
File details
Details for the file sitemap_generator-0.9.13-py3-none-any.whl
.
File metadata
- Download URL: sitemap_generator-0.9.13-py3-none-any.whl
- Upload date:
- Size: 15.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1eb690631895f5940269747f08e966c3fcc0efebd8a3d934ce2549aaceef0885 |
|
MD5 | 07cab60bcb0733c510eec8bf3c2282a9 |
|
BLAKE2b-256 | 4448855480617478c341732174421c69e9350fdec7efd7d6df8d203ee89fb14c |