Fast async web crawler & scraping framework, supporting deduplication, and extensible middleware.
Project description
qcrawl is a fast async web crawling & scraping framework for Python to extract structured data from web-pages.
It is cross-platform and easy to install via pip, conda, or OS packages.
Follow the documentation.
Libraries comparison
| Attribute | qCrawl ⭐ | Scrapy | Playwright | Colly |
|---|---|---|---|---|
| Language | Python | Python | Node.js, Python, Java | Go |
| Concurrency model | Asyncio native with threads for I/O work | Evented (Twisted) with non‑blocking I/O | Isolated contexts within browser instance + multiple browser instances | Goroutines (lightweight threads) |
| Queue | Priority queue with FIFO tiebreak, memory, [disk,] redis backends | Priority queue with FIFO/LIFO tiebreak, memory and disk backends | No built-in crawl queue (user-managed) | FIFO with memory and file backends |
| Middleware & hooks | Downloader + Spider middlewares; signal-driven lifecycle hooks | Downloader + Spider middlewares; signal-driven lifecycle hooks | Hooks and interception API; not pipeline-centric | Middleware-style callbacks |
| Crawl throttling | Per-domain concurrency with configurable delay | Per-domain concurrency with configurable delay | Controlled via browser sessions | Per-host concurrency |
| Strengths | Lightweight, high-throughput, easy to extend | Very mature ecosystem and community, easy to extend | Real browser rendering, JS support, robust for SPA sites | Extremely high throughput, low memory use |
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
qcrawl-0.3.1.tar.gz
(111.3 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
qcrawl-0.3.1-py3-none-any.whl
(104.6 kB
view details)
File details
Details for the file qcrawl-0.3.1.tar.gz.
File metadata
- Download URL: qcrawl-0.3.1.tar.gz
- Upload date:
- Size: 111.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bf931ba6239920615dc1b00b9e9b004d06c29a2a5899594f33273f6476ae9f0e
|
|
| MD5 |
596874b44c584afe3b6bc9eab75e0f1e
|
|
| BLAKE2b-256 |
241d8d991c85ac7e1d027edf62dcadd1801c815f55f4f9a8a50515afc23cef1a
|
File details
Details for the file qcrawl-0.3.1-py3-none-any.whl.
File metadata
- Download URL: qcrawl-0.3.1-py3-none-any.whl
- Upload date:
- Size: 104.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
06246aab6a6368411c0b803518ed2736d54f0284da4651fff9b2eeacc561ffb1
|
|
| MD5 |
7aed64ff6d668e28f3f72e1d5879f474
|
|
| BLAKE2b-256 |
cf26f67bcc49858e36e9792eee2e2796229c1843ee63d065298bea36845317c4
|