Skip to main content

Web crawler and sitemap generator.

Project description

Sitemap generator library for python. Fork from https://github.com/Haikson/sitemap-generator.

Installing

pip install nix-sitemap-generator

Usage

1. Import crawler from pysitemap

from pysitemap import crawler

2. Call crawler()

crawler(
    'https//site.com', out_file='debug/sitemap.xml', exclude_urls=[".pdf", ".jpg", ".zip"],
    http_request_options={"ssl": False}, parser=Parser
)

Example

import sys
import logging
from pysitemap import crawler
from pysitemap.parsers.lxml_parser import Parser

if __name__ == '__main__':
    if '--iocp' in sys.argv:
        from asyncio import events, windows_events
        sys.argv.remove('--iocp')
        logging.info('using iocp')
        el = windows_events.ProactorEventLoop()
        events.set_event_loop(el)

    # root_url = sys.argv[1]
    root_url = 'https://www.haikson.com'
    crawler(
        root_url, out_file='debug/sitemap.xml', exclude_urls=[".pdf", ".jpg", ".zip"],
        http_request_options={"ssl": False}, parser=Parser
    )

Changes

v. 0.10.1

  • Refactored the code to make it more readable.

  • Removed prints() calls from code.

  • Added verbose mode to crawler().

  • Added type hints to crawler() arguments.

  • Add ValueError handling when try to add_signal_handler()

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nix-sitemap-generator-0.10.1.tar.gz (13.1 kB view hashes)

Uploaded Source

Built Distribution

nix_sitemap_generator-0.10.1-py3-none-any.whl (16.1 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page