Skip to main content

Web crawler and sitemap generator.

Project description

Sitemap generator library for python. Fork from https://github.com/Haikson/sitemap-generator.

Installing

pip install nix-sitemap-generator

Usage

1. Import crawler from pysitemap

from pysitemap import crawler

2. Call crawler()

crawler(
    'https//site.com', out_file='debug/sitemap.xml', exclude_urls=[".pdf", ".jpg", ".zip"],
    http_request_options={"ssl": False}, parser=Parser
)

Example

import sys
import logging
from pysitemap import crawler
from pysitemap.parsers.lxml_parser import Parser

if __name__ == '__main__':
    if '--iocp' in sys.argv:
        from asyncio import events, windows_events
        sys.argv.remove('--iocp')
        logging.info('using iocp')
        el = windows_events.ProactorEventLoop()
        events.set_event_loop(el)

    # root_url = sys.argv[1]
    root_url = 'https://www.haikson.com'
    crawler(
        root_url, out_file='debug/sitemap.xml', exclude_urls=[".pdf", ".jpg", ".zip"],
        http_request_options={"ssl": False}, parser=Parser
    )

Changes

v. 0.10.1

  • Refactored the code to make it more readable.

  • Removed prints() calls from code.

  • Added verbose mode to crawler().

  • Added type hints to crawler() arguments.

  • Add ValueError handling when try to add_signal_handler()

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nix-sitemap-generator-0.10.1.tar.gz (13.1 kB view details)

Uploaded Source

Built Distribution

nix_sitemap_generator-0.10.1-py3-none-any.whl (16.1 kB view details)

Uploaded Python 3

File details

Details for the file nix-sitemap-generator-0.10.1.tar.gz.

File metadata

File hashes

Hashes for nix-sitemap-generator-0.10.1.tar.gz
Algorithm Hash digest
SHA256 5c448ead092a83ae14c61cfb95b1dfe3f8b627e8e8401a01e69750f91f3fab9b
MD5 69e8368411be6ccb83b3bf01473e3164
BLAKE2b-256 a9509a4861b00146dfb984a6bed78c7e5bbc07b127aa2b554a38db6f11d21889

See more details on using hashes here.

File details

Details for the file nix_sitemap_generator-0.10.1-py3-none-any.whl.

File metadata

File hashes

Hashes for nix_sitemap_generator-0.10.1-py3-none-any.whl
Algorithm Hash digest
SHA256 0f28d09d7aafa0e61a23846a1ab5e01b37fe894c442209e99c3f80a56e75ba75
MD5 d40d552f4a9a8a717adc6f1dbb48f75d
BLAKE2b-256 03c7762813012d878fb9a59dd0918578327df55eea004034a7b102e12586f9a1

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page