Skip to main content

web crawler and sitemap generator.

Project description

Sitemap generator

installing

pip install sitemap-generator

requirements

asyncio
aiofile
aiohttp

example

import sys
import logging
from pysitemap import crawler

if __name__ == '__main__':
    if '--iocp' in sys.argv:
        from asyncio import events, windows_events
        sys.argv.remove('--iocp')
        logging.info('using iocp')
        el = windows_events.ProactorEventLoop()
        events.set_event_loop(el)

    # root_url = sys.argv[1]
    root_url = 'https://www.haikson.com'
    crawler(root_url, out_file='sitemap.xml')

TODO

  • big sites with count of pages more then 100K will use more then 100MB memory. Move queue and done lists into database. Write Queue and Done backend classes based on

  • Lists

  • SQLite database

  • Redis

  • Write api for extending by user backends

changelog

v. 0.9.2

  • todo queue and done list backends

  • created very slowest sqlite backend for todo queue and done lists (1000 url writing for 3 minutes)

  • tests for sqlite_todo backend

v. 0.9.1

  • extended readme

  • docstrings and code commentaries

v. 0.9.0

  • since this version package supports only python version >=3.7

  • all functions recreated but api saved. If You use this package, then just update it, install requirements and run process

  • all requests works asynchronously

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sitemap-generator-0.9.3.tar.gz (8.8 kB view details)

Uploaded Source

Built Distributions

sitemap_generator-0.9.3-py3.8.egg (22.6 kB view details)

Uploaded Source

sitemap_generator-0.9.3-py3-none-any.whl (14.6 kB view details)

Uploaded Python 3

File details

Details for the file sitemap-generator-0.9.3.tar.gz.

File metadata

  • Download URL: sitemap-generator-0.9.3.tar.gz
  • Upload date:
  • Size: 8.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/51.1.1 requests-toolbelt/0.9.1 tqdm/4.57.0 CPython/3.8.5

File hashes

Hashes for sitemap-generator-0.9.3.tar.gz
Algorithm Hash digest
SHA256 062c9594d4156bed5a1c568cd7e84f9dd0e19b1aea5f297ba0eb7f7400f23476
MD5 6b06a42f08e29a4ee47f35b7ef041969
BLAKE2b-256 f8b6d612ada7a7a70c7307b4eef4d65351c82a4ced1f07773df895b7315e0e7f

See more details on using hashes here.

File details

Details for the file sitemap_generator-0.9.3-py3.8.egg.

File metadata

  • Download URL: sitemap_generator-0.9.3-py3.8.egg
  • Upload date:
  • Size: 22.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/51.1.1 requests-toolbelt/0.9.1 tqdm/4.57.0 CPython/3.8.5

File hashes

Hashes for sitemap_generator-0.9.3-py3.8.egg
Algorithm Hash digest
SHA256 fa3f6933e6451c7e646464b56845d652c1b2284aad3609ebf7713bc3556d12de
MD5 33c5247be4ea8d4911db8a37c34ca19a
BLAKE2b-256 639d2611838ee4ba9682f6495e02a1dff182dca7a67ccdc476040e2170a3f8e8

See more details on using hashes here.

File details

Details for the file sitemap_generator-0.9.3-py3-none-any.whl.

File metadata

  • Download URL: sitemap_generator-0.9.3-py3-none-any.whl
  • Upload date:
  • Size: 14.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/51.1.1 requests-toolbelt/0.9.1 tqdm/4.57.0 CPython/3.8.5

File hashes

Hashes for sitemap_generator-0.9.3-py3-none-any.whl
Algorithm Hash digest
SHA256 08b9463f599c95da6623571ef839792f58c7db4329df54be90211b6407f39a98
MD5 3323883cac27358817ad03530bb620a6
BLAKE2b-256 151390c600708e1b42244970fa59b4d5ac68c5d7b6f12fe68c8047858c965bcc

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page