Fast async web crawler & scraping framework, supporting deduplication, and extensible middleware.
Project description
qcrawl is a fast async web crawling & scraping framework for Python to extract structured data from web-pages.
It is cross-platform and easy to install via pip, conda, or OS packages.
Follow the documentation.
Libraries comparison
| Attribute | qCrawl ⭐ | Scrapy | Playwright | Colly |
|---|---|---|---|---|
| Language | Python | Python | Node.js, Python, Java | Go |
| Concurrency model | Asyncio native with threads for I/O work | Evented (Twisted) with non‑blocking I/O | Isolated contexts within browser instance + multiple browser instances | Goroutines (lightweight threads) |
| Queue | Priority queue with FIFO tiebreak, memory, [disk,] redis backends | Priority queue with FIFO/LIFO tiebreak, memory and disk backends | No built-in crawl queue (user-managed) | FIFO with memory and file backends |
| Middleware & hooks | Downloader + Spider middlewares; signal-driven lifecycle hooks | Downloader + Spider middlewares; signal-driven lifecycle hooks | Hooks and interception API; not pipeline-centric | Middleware-style callbacks |
| Crawl throttling | Per-domain concurrency with configurable delay | Per-domain concurrency with configurable delay | Controlled via browser sessions | Per-host concurrency |
| Strengths | Lightweight, high-throughput, easy to extend | Very mature ecosystem and community, easy to extend | Real browser rendering, JS support, robust for SPA sites | Extremely high throughput, low memory use |
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
qcrawl-0.3.3.tar.gz
(148.3 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
qcrawl-0.3.3-py3-none-any.whl
(105.0 kB
view details)
File details
Details for the file qcrawl-0.3.3.tar.gz.
File metadata
- Download URL: qcrawl-0.3.3.tar.gz
- Upload date:
- Size: 148.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
eeaac877cbe3d751b379622ada3bc5c0ff8328c292cb2670fd5e9f91ce3fe603
|
|
| MD5 |
5c0520972895f51f623b9830a54f49a0
|
|
| BLAKE2b-256 |
7ddba1d41fe23ab3862063e46ac69c0aa1312ebf620c6d4a147f15bd97827a1e
|
File details
Details for the file qcrawl-0.3.3-py3-none-any.whl.
File metadata
- Download URL: qcrawl-0.3.3-py3-none-any.whl
- Upload date:
- Size: 105.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7114b3afda951b9fdb85021f48d0af8f9257df87e7d37ba446f14860f5f86ac6
|
|
| MD5 |
835e4f3be5522f59bbffbcb13a57fbca
|
|
| BLAKE2b-256 |
c35893e1cc630ad9a990d757ea86e84483fef43bbce0b184b3aa390d53548358
|