Fast async web crawler & scraping framework, supporting deduplication, and extensible middleware.
Project description
qcrawl is a fast async web crawling & scraping framework for Python to extract structured data from web-pages.
It is cross-platform and easy to install via pip or conda.
Follow the documentation.
qCrawl features
- Async architecture - High-performance concurrent crawling based on asyncio
- Performance optimized - Queue backend on Redis with direct delivery, messagepack serialization, connection pooling, DNS caching
- Powerful parsing - CSS/XPath selectors with lxml
- Middleware system - Customizable request/response processing
- Flexible export - Multiple output formats including JSON, CSV, XML
- Flexible queue backends - Memory or Redis-based (+disk) schedulers for different scale requirements
- Item pipelines - Data transformation, validation, and processing pipeline
- Pluggable downloaders - HTTP (aiohttp), Camoufox (stealth browser) for JavaScript rendering and anti-bot evasion
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
qcrawl-0.3.5.tar.gz
(230.6 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
qcrawl-0.3.5-py3-none-any.whl
(121.6 kB
view details)
File details
Details for the file qcrawl-0.3.5.tar.gz.
File metadata
- Download URL: qcrawl-0.3.5.tar.gz
- Upload date:
- Size: 230.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
be79bd1701265114d829bf61739b68813f5ef1d4b8ab3febe51f3219b0f53e56
|
|
| MD5 |
fef474895e6e97cb13827520cdf6d002
|
|
| BLAKE2b-256 |
38a238aa6ad237f037a02598a43652b79ae42f4c79757f25e7508375901cddcb
|
File details
Details for the file qcrawl-0.3.5-py3-none-any.whl.
File metadata
- Download URL: qcrawl-0.3.5-py3-none-any.whl
- Upload date:
- Size: 121.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ea86a7518a3ebd754ed95ad9305ad27243566fe622f8025b2959f162b6303e4d
|
|
| MD5 |
150456303d7460e5423aecca50279efe
|
|
| BLAKE2b-256 |
aa671b5b9cbf94bbf2d1662edcb172386a7ed06d22cd6395e45b5af61aa59d23
|