Fast async web crawler & scraping framework, supporting deduplication, and extensible middleware.
Project description
qcrawl is a fast async web crawling & scraping framework for Python to extract structured data from web-pages.
It is cross-platform and easy to install via pip or conda.
Follow the documentation.
qCrawl features
- Async architecture - High-performance concurrent crawling based on asyncio
- Performance optimized - Queue backend on Redis with direct delivery, messagepack serialization, connection pooling, DNS caching
- Powerful parsing - CSS/XPath selectors with lxml
- Middleware system - Customizable request/response processing
- Flexible export - Multiple output formats including JSON, CSV, XML
- Flexible queue backends - Memory or Redis-based (+disk) schedulers for different scale requirements
- Item pipelines - Data transformation, validation, and processing pipeline
- Pluggable downloaders - HTTP (aiohttp), Camoufox (stealth browser) for JavaScript rendering and anti-bot evasion
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
qcrawl-0.3.4.tar.gz
(225.7 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
qcrawl-0.3.4-py3-none-any.whl
(118.2 kB
view details)
File details
Details for the file qcrawl-0.3.4.tar.gz.
File metadata
- Download URL: qcrawl-0.3.4.tar.gz
- Upload date:
- Size: 225.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
babf19b505e7f8cd54c7346861e76336a5625999663f3c24da12e2713965bbdb
|
|
| MD5 |
76bdcc68725532348297ab590af53961
|
|
| BLAKE2b-256 |
c2c510939e90adc5b9ac90925c9d59831405011c6a565d56efcdb8cecf9c98ef
|
File details
Details for the file qcrawl-0.3.4-py3-none-any.whl.
File metadata
- Download URL: qcrawl-0.3.4-py3-none-any.whl
- Upload date:
- Size: 118.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
baeac0bf013253ccc57cf0c605c1f85d8ed041b72def9a14547da68be352d512
|
|
| MD5 |
b36906bb2ada8c6878a2234d605f0ad8
|
|
| BLAKE2b-256 |
48eb62782c0f4d21075bf7e165908d5045001476555578309b316f7280473b37
|