Incubator for Scrapy download handlers
Project description
Overview
This is a collection of semi-official download handlers for Scrapy. See the Scrapy download handler documentation for more information.
They should work and some of them may be later promoted to the official status, but here they are provided as-is and no support or stability promises are given. The documentation, including limitations and unsupported features, is also provided as-is and may be incomplete.
As this code intentionally uses private Scrapy APIs, it specifies a tight dependency on Scrapy. This version of the package only supports Scrapy 2.15.x.
Features overview
The baseline for these handlers is the default Scrapy handler, HTTP11DownloadHandler, which uses Twisted and supports HTTP/1.1. Feature parity with it is an explicit goal but it’s not always possible and not all possible features are implemented in all handlers (which may change in the future). Certain popular features not supported by HTTP11DownloadHandler, like HTTP/2 support, and features unique to some handlers, may or may not be implemented. Please see the sections for individual handlers for more details.
The following table summarizes the most important differences:
Handler |
HTTP/2 |
HTTP/3 |
Proxies |
TLS logging |
Impersonation |
TLS version limits |
|---|---|---|---|---|---|---|
(HTTP11DownloadHandler) |
Not possible |
Not possible |
Yes |
Yes |
Not possible |
No |
AiohttpDownloadHandler |
Not possible |
Not possible |
Yes |
Partial |
Not possible |
No |
CurlCffiDownloadHandler |
Yes |
Yes (not tested) |
Yes |
Not possible |
No |
Not possible |
HttpxDownloadHandler |
Yes |
Not possible |
Yes |
Yes |
Not possible |
No |
NiquestsDownloadHandler |
Yes |
No |
Yes |
Yes |
Not possible |
Not possible |
PyreqwestDownloadHandler |
Yes |
Not possible |
Not possible |
Not possible |
Not possible |
No |
The following basic features are supported by all handlers unless mentioned in their docs:
Native asyncio integration without requiring a Twisted reactor
HTTP/1.1 for http and https schemes
Unified download handler exceptions
Proxies, including HTTP and HTTPS proxies for HTTP and HTTPS destinations
Proxy authentication via HttpProxyMiddleware
IPv6 destinations
DOWNLOAD_MAXSIZE, DOWNLOAD_WARNSIZE and the respective request meta keys
DOWNLOAD_TIMEOUT and the respective request meta key
DOWNLOAD_FAIL_ON_DATALOSS and the "dataloss" flag
Setting the download_latency request meta
DOWNLOAD_BIND_ADDRESS
DOWNLOAD_VERIFY_CERTIFICATES
headers_received and bytes_received signals
Not reading the proxy configuration from the environment variables
Not handling cookies, redirects, compression and other things handled by Scrapy itself
Handlers
AiohttpDownloadHandler
This handler supports HTTP/1.1 and uses the aiohttp library.
Install it with:
pip install scrapy-download-handlers-incubator[aiohttp]
Enable it with:
DOWNLOAD_HANDLERS = {
"http": "scrapy_download_handlers_incubator.AiohttpDownloadHandler",
"https": "scrapy_download_handlers_incubator.AiohttpDownloadHandler",
}
Features and limitations
Proxies |
Yes (HTTPS proxies for HTTPS destinations are not supported on Python < 3.11) |
HTTP/2 |
No (not supported by the library) |
TLS verbose logging |
Partial (skipped for small responses) |
response.ip_address |
Partial (skipped for small responses) |
response.certificate |
Partial (DER bytes; skipped for small responses) |
Per-request bindaddress |
No (not supported by the library) |
Proxy certificate verification |
Follows DOWNLOAD_VERIFY_CERTIFICATES |
Notable features supported by the library but not implemented:
DNS resolving settings
Custom DNS resolvers
CurlCffiDownloadHandler
This handler supports HTTP/1.1 and HTTP/2 and uses the curl_cffi library.
Install it with:
pip install scrapy-download-handlers-incubator[curl-cffi]
Enable it with:
DOWNLOAD_HANDLERS = {
"http": "scrapy_download_handlers_incubator.CurlCffiDownloadHandler",
"https": "scrapy_download_handlers_incubator.CurlCffiDownloadHandler",
}
Features and limitations
Proxies |
Yes |
HTTP/2 |
Yes |
HTTP/3 |
Yes (but not tested) |
TLS verbose logging |
No (not supported by the library) |
response.ip_address |
Yes |
response.certificate |
No (not supported by the library) |
Per-request bindaddress |
No (not supported by the library) |
Proxy certificate verification |
Follows DOWNLOAD_VERIFY_CERTIFICATES |
Notable features supported by the library but not implemented:
Impersonation
Advanced libcurl tunables
Settings
CURL_CFFI_HTTP_VERSION (str, default: "v1", corresponding to “Enforce HTTP/1.1”): The HTTP version to use. The value is passed directly to the library so the possible values are set by curl_cffi.requests.utils.normalize_http_version() and the meanings of the underlying constants can be seen in libcurl docs (CURLOPT_HTTP_VERSION). Set this to "v2tls" or "v2" to enable HTTP/2 for HTTPS requests or for all requests respectively. Set this to "v3" to enable HTTP/3.
HttpxDownloadHandler
This is an updated copy of the official scrapy.core.downloader.handlers._httpx.HttpxDownloadHandler handler. It supports HTTP/1.1 and HTTP/2 and uses the httpx library.
Install it with:
pip install scrapy-download-handlers-incubator[httpx]
Enable it with:
DOWNLOAD_HANDLERS = {
"http": "scrapy_download_handlers_incubator.HttpxDownloadHandler",
"https": "scrapy_download_handlers_incubator.HttpxDownloadHandler",
}
Features and limitations
Proxies |
Yes (separate connection pool per proxy) |
HTTP/2 |
Yes |
HTTP/3 |
No (not supported by the library) |
TLS verbose logging |
Yes |
response.ip_address |
Yes |
response.certificate |
Yes (DER bytes) |
Per-request bindaddress |
No (not supported by the library) |
Proxy certificate verification |
Follows DOWNLOAD_VERIFY_CERTIFICATES |
Notable features supported by the library but not implemented:
SOCKS5 proxies
Alternative transports
Limiting the number of per-proxy connection pool to save resources
Settings
HTTPX_HTTP2_ENABLED (bool, default: False): Whether to enable HTTP/2.
NiquestsDownloadHandler
This handler supports HTTP/1.1 and HTTP/2 and uses the niquests library.
Install it with:
pip install scrapy-download-handlers-incubator[niquests]
Enable it with:
DOWNLOAD_HANDLERS = {
"http": "scrapy_download_handlers_incubator.NiquestsDownloadHandler",
"https": "scrapy_download_handlers_incubator.NiquestsDownloadHandler",
}
Features and limitations
Proxies |
Yes |
HTTP/2 |
Yes |
HTTP/3 |
No (not implemented) |
TLS verbose logging |
Yes |
response.ip_address |
Yes |
response.certificate |
Yes (DER bytes) |
Per-request bindaddress |
No (not supported by the library) |
Proxy certificate verification |
Follows DOWNLOAD_VERIFY_CERTIFICATES |
Notable features supported by the library but not implemented:
Custom DNS resolvers
SOCKS5 proxies
HTTP/2 tunables
Settings
NIQUESTS_HTTP2_ENABLED (bool, default: False): Whether to enable HTTP/2.
PyreqwestDownloadHandler
This handler supports HTTP/1.1 and HTTP/2 and uses the pyreqwest library.
Install it with:
pip install scrapy-download-handlers-incubator[pyreqwest]
Enable it with:
DOWNLOAD_HANDLERS = {
"http": "scrapy_download_handlers_incubator.PyreqwestDownloadHandler",
"https": "scrapy_download_handlers_incubator.PyreqwestDownloadHandler",
}
Features and limitations
Proxies |
No (not supported by the library) |
HTTP/2 |
Yes |
HTTP/3 |
No (not supported by the library) |
TLS verbose logging |
No (not supported by the library) |
response.ip_address |
No (not supported by the library) |
response.certificate |
No (not supported by the library) |
Per-request bindaddress |
No (not supported by the library) |
Notable features supported by the library but not implemented:
HTTP/2 tunables
Settings
PYREQWEST_HTTP2_ENABLED (bool, default: False): Whether to enable HTTP/2.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file scrapy_download_handlers_incubator-0.1.1.tar.gz.
File metadata
- Download URL: scrapy_download_handlers_incubator-0.1.1.tar.gz
- Upload date:
- Size: 45.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a1f781b36134fd5e7dd816a62dc24459ea93fef77a1a93babcb9a3196e049cb0
|
|
| MD5 |
d5582ce65dd730ad2d4311a4982b0fe8
|
|
| BLAKE2b-256 |
24c4d9fb4e6bc848b2993e68e264ccb6343cefa132874c2dc17015bb92fa9e7f
|
Provenance
The following attestation bundles were made for scrapy_download_handlers_incubator-0.1.1.tar.gz:
Publisher:
publish.yml on scrapy-plugins/scrapy-download-handlers-incubator
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
scrapy_download_handlers_incubator-0.1.1.tar.gz -
Subject digest:
a1f781b36134fd5e7dd816a62dc24459ea93fef77a1a93babcb9a3196e049cb0 - Sigstore transparency entry: 1339625578
- Sigstore integration time:
-
Permalink:
scrapy-plugins/scrapy-download-handlers-incubator@b19c81eaf64d00fa730c0b16d54510b7d3e7d03e -
Branch / Tag:
refs/tags/0.1.1 - Owner: https://github.com/scrapy-plugins
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@b19c81eaf64d00fa730c0b16d54510b7d3e7d03e -
Trigger Event:
push
-
Statement type:
File details
Details for the file scrapy_download_handlers_incubator-0.1.1-py3-none-any.whl.
File metadata
- Download URL: scrapy_download_handlers_incubator-0.1.1-py3-none-any.whl
- Upload date:
- Size: 22.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
178c3e16bfc82c8970a4974cfa232bf4bdc5dc85fe882500d6a53524414461ad
|
|
| MD5 |
422fe9f48c8315b7d236b38269c5ac9e
|
|
| BLAKE2b-256 |
47743be2e2af53a182649574658f87147a804af4afe35aae15d2335d2dca5df7
|
Provenance
The following attestation bundles were made for scrapy_download_handlers_incubator-0.1.1-py3-none-any.whl:
Publisher:
publish.yml on scrapy-plugins/scrapy-download-handlers-incubator
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
scrapy_download_handlers_incubator-0.1.1-py3-none-any.whl -
Subject digest:
178c3e16bfc82c8970a4974cfa232bf4bdc5dc85fe882500d6a53524414461ad - Sigstore transparency entry: 1339625584
- Sigstore integration time:
-
Permalink:
scrapy-plugins/scrapy-download-handlers-incubator@b19c81eaf64d00fa730c0b16d54510b7d3e7d03e -
Branch / Tag:
refs/tags/0.1.1 - Owner: https://github.com/scrapy-plugins
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@b19c81eaf64d00fa730c0b16d54510b7d3e7d03e -
Trigger Event:
push
-
Statement type: