web crawler and sitemap generator.
Project description
Sitemap generator
installing
pip install sitemap-generator
requirements
asyncio aiofile aiohttp
example
import sys import logging from pysitemap import crawler if __name__ == '__main__': if '--iocp' in sys.argv: from asyncio import events, windows_events sys.argv.remove('--iocp') logging.info('using iocp') el = windows_events.ProactorEventLoop() events.set_event_loop(el) # root_url = sys.argv[1] root_url = 'https://www.haikson.com' crawler(root_url, out_file='sitemap.xml')
TODO
- big sites with count of pages more then 100K will use more then 100MB memory. Move queue and done lists into database. Write Queue and Done backend classes based on
- Lists
- SQLite database
- Redis
- Write api for extending by user backends
changelog
v. 0.9.2
- todo queue and done list backends
- created very slowest sqlite backend for todo queue and done lists (1000 url writing for 3 minutes)
- tests for sqlite_todo backend
v. 0.9.1
- extended readme
- docstrings and code commentaries
v. 0.9.0
- since this version package supports only python version >=3.7
- all functions recreated but api saved. If You use this package, then just update it, install requirements and run process
- all requests works asynchronously
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Filename, size | File type | Python version | Upload date | Hashes |
---|---|---|---|---|
Filename, size sitemap_generator-0.9.4-py3-none-any.whl (14.6 kB) | File type Wheel | Python version py3 | Upload date | Hashes View |
Filename, size sitemap-generator-0.9.4.tar.gz (8.8 kB) | File type Source | Python version None | Upload date | Hashes View |
Close
Hashes for sitemap_generator-0.9.4-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 66f08d764589747c512a8fb6cc8eaf8b5a7cc36cc9fc5e175914aeb9f11b9511 |
|
MD5 | 5d9d7d8969916771ebe9cabcd3eeece7 |
|
BLAKE2-256 | 4746a73f26e0a7175c421505f52893ededadab0ceb2db822e14bdda3cc5caebc |