Skip to main content

Simple, powerful web crawler

Project description

codecov

Web Crawler

Performant, extensible and lean web crawler, utilizes all available CPUs by default.

Uses event loop for I/O and processes for analyzing the pages.

Batteries included

  • Basic httpx page downloader
  • S3 page storage
  • Local filesystem page storage

Usage

  • Have a look at tests/integration/test_crawl.py
  • Implement your own PageAnalyzer and PageDownloader classes
  • Optionally customize structlog logging, see configuration
  • Have fun!

Customization

All classes in the modules folder can be replaced with your custom implementation.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datek_web_crawler-0.1.0.tar.gz (26.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

datek_web_crawler-0.1.0-py3-none-any.whl (13.9 kB view details)

Uploaded Python 3

File details

Details for the file datek_web_crawler-0.1.0.tar.gz.

File metadata

  • Download URL: datek_web_crawler-0.1.0.tar.gz
  • Upload date:
  • Size: 26.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.7.3

File hashes

Hashes for datek_web_crawler-0.1.0.tar.gz
Algorithm Hash digest
SHA256 aefea624b4b28a319ff2d182390178dc545e6c62cbe7836e3936c278dc803a93
MD5 ec48a5445c3577ae4f90a098b20faba5
BLAKE2b-256 1a21756a1d000e8d73ac0cef3b675cafe6701a034aae49819d21f1c8d327d1db

See more details on using hashes here.

File details

Details for the file datek_web_crawler-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for datek_web_crawler-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d9e55fb35a5c9cd5c31e731951a3dac06f44473386a74252b83cd37bec9ef644
MD5 46c4d42751485fd4037c20cd16b77f3b
BLAKE2b-256 8861c9c867dfd218462f48a46cf686b3e6198dd23eafa33ce69c17c0611ac382

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page