Simple, powerful web crawler
Project description
Web Crawler
Performant, extensible and lean web crawler, utilizes all available CPUs by default.
Uses event loop for I/O and processes for analyzing the pages.
Batteries included
- Basic
httpxpage downloader S3page storage- Local filesystem page storage
Usage
- Have a look at
tests/integration/test_crawl.py - Implement your own
PageAnalyzerandPageDownloaderclasses - Optionally customize
structloglogging, see configuration - Have fun!
Customization
All classes in the modules folder can be replaced with your custom implementation.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
datek_web_crawler-0.1.0.tar.gz
(26.6 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file datek_web_crawler-0.1.0.tar.gz.
File metadata
- Download URL: datek_web_crawler-0.1.0.tar.gz
- Upload date:
- Size: 26.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.7.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
aefea624b4b28a319ff2d182390178dc545e6c62cbe7836e3936c278dc803a93
|
|
| MD5 |
ec48a5445c3577ae4f90a098b20faba5
|
|
| BLAKE2b-256 |
1a21756a1d000e8d73ac0cef3b675cafe6701a034aae49819d21f1c8d327d1db
|
File details
Details for the file datek_web_crawler-0.1.0-py3-none-any.whl.
File metadata
- Download URL: datek_web_crawler-0.1.0-py3-none-any.whl
- Upload date:
- Size: 13.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.7.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d9e55fb35a5c9cd5c31e731951a3dac06f44473386a74252b83cd37bec9ef644
|
|
| MD5 |
46c4d42751485fd4037c20cd16b77f3b
|
|
| BLAKE2b-256 |
8861c9c867dfd218462f48a46cf686b3e6198dd23eafa33ce69c17c0611ac382
|