web crawler and sitemap generator.
Project description
Sitemap generator
installing
pip install sitemap-generator
requirements
asyncio aiofile aiohttp
example
import sys import logging from pysitemap import crawler if __name__ == '__main__': if '--iocp' in sys.argv: from asyncio import events, windows_events sys.argv.remove('--iocp') logging.info('using iocp') el = windows_events.ProactorEventLoop() events.set_event_loop(el) # root_url = sys.argv[1] root_url = 'https://www.haikson.com' crawler(root_url, out_file='sitemap.xml')
TODO
big sites with count of pages more then 100K will use more then 100MB memory. Move queue and done lists into database. Write Queue and Done backend classes based on
Lists
SQLite database
Redis
Write api for extending by user backends
changelog
v. 0.9.2
todo queue and done list backends
created very slowest sqlite backend for todo queue and done lists (1000 url writing for 3 minutes)
tests for sqlite_todo backend
v. 0.9.1
extended readme
docstrings and code commentaries
v. 0.9.0
since this version package supports only python version >=3.7
all functions recreated but api saved. If You use this package, then just update it, install requirements and run process
all requests works asynchronously
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
File details
Details for the file sitemap-generator-0.9.3.tar.gz
.
File metadata
- Download URL: sitemap-generator-0.9.3.tar.gz
- Upload date:
- Size: 8.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/51.1.1 requests-toolbelt/0.9.1 tqdm/4.57.0 CPython/3.8.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 062c9594d4156bed5a1c568cd7e84f9dd0e19b1aea5f297ba0eb7f7400f23476 |
|
MD5 | 6b06a42f08e29a4ee47f35b7ef041969 |
|
BLAKE2b-256 | f8b6d612ada7a7a70c7307b4eef4d65351c82a4ced1f07773df895b7315e0e7f |
File details
Details for the file sitemap_generator-0.9.3-py3.8.egg
.
File metadata
- Download URL: sitemap_generator-0.9.3-py3.8.egg
- Upload date:
- Size: 22.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/51.1.1 requests-toolbelt/0.9.1 tqdm/4.57.0 CPython/3.8.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | fa3f6933e6451c7e646464b56845d652c1b2284aad3609ebf7713bc3556d12de |
|
MD5 | 33c5247be4ea8d4911db8a37c34ca19a |
|
BLAKE2b-256 | 639d2611838ee4ba9682f6495e02a1dff182dca7a67ccdc476040e2170a3f8e8 |
File details
Details for the file sitemap_generator-0.9.3-py3-none-any.whl
.
File metadata
- Download URL: sitemap_generator-0.9.3-py3-none-any.whl
- Upload date:
- Size: 14.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/51.1.1 requests-toolbelt/0.9.1 tqdm/4.57.0 CPython/3.8.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 08b9463f599c95da6623571ef839792f58c7db4329df54be90211b6407f39a98 |
|
MD5 | 3323883cac27358817ad03530bb620a6 |
|
BLAKE2b-256 | 151390c600708e1b42244970fa59b4d5ac68c5d7b6f12fe68c8047858c965bcc |