Fast async web crawler & scraping framework, supporting deduplication, and extensible middleware.
Project description
qcrawl is a fast async web crawling & scraping framework for Python to extract structured data from web-pages.
It is cross-platform and easy to install via pip, conda, or OS packages.
Follow the documentation.
Libraries comparison
| Attribute | qCrawl ⭐ | Scrapy | Playwright | Colly |
|---|---|---|---|---|
| Language | Python | Python | Node.js, Python, Java | Go |
| Concurrency model | Asyncio native with threads for I/O work | Evented (Twisted) with non‑blocking I/O | Isolated contexts within browser instance + multiple browser instances | Goroutines (lightweight threads) |
| Queue | Priority queue with FIFO tiebreak, memory, [disk,] redis backends | Priority queue with FIFO/LIFO tiebreak, memory and disk backends | No built-in crawl queue (user-managed) | FIFO with memory and file backends |
| Middleware & hooks | Downloader + Spider middlewares; signal-driven lifecycle hooks | Downloader + Spider middlewares; signal-driven lifecycle hooks | Hooks and interception API; not pipeline-centric | Middleware-style callbacks |
| Crawl throttling | Per-domain concurrency with configurable delay | Per-domain concurrency with configurable delay | Controlled via browser sessions | Per-host concurrency |
| Strengths | Lightweight, high-throughput, easy to extend | Very mature ecosystem and community, easy to extend | Real browser rendering, JS support, robust for SPA sites | Extremely high throughput, low memory use |
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
qcrawl-0.3.2.tar.gz
(147.8 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
qcrawl-0.3.2-py3-none-any.whl
(104.9 kB
view details)
File details
Details for the file qcrawl-0.3.2.tar.gz.
File metadata
- Download URL: qcrawl-0.3.2.tar.gz
- Upload date:
- Size: 147.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
13e4fd1efc06c392085924323cc18f999cafc2cdf0eac41ea56f06e80a48e7c0
|
|
| MD5 |
70ff0d890731ce2bdbe13c07e4035c24
|
|
| BLAKE2b-256 |
7789c41e577acbad84d66aeb92767d110337dd43f97cf882cbe8d898f2053192
|
File details
Details for the file qcrawl-0.3.2-py3-none-any.whl.
File metadata
- Download URL: qcrawl-0.3.2-py3-none-any.whl
- Upload date:
- Size: 104.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
19225bb6ea3ebc7d7d66ac2ebaa76183206963b03910f354baf58f26339482fa
|
|
| MD5 |
8bb1d810a25f3c00467538bfbf38f7f7
|
|
| BLAKE2b-256 |
0d4651a04f10dfac1b4660992d4ed6ec0a87703ce451dc1eeb9d3b7838ff6808
|