Skip to main content

Incubator for Scrapy download handlers

Project description

PyPI version Supported Python versions Tests - Ubuntu Tests - macOS Tests - Windows Coverage

Overview

This is a collection of semi-official download handlers for Scrapy. See the Scrapy download handler documentation for more information.

They should work and some of them may be later promoted to the official status, but here they are provided as-is and no support or stability promises are given. The documentation, including limitations and unsupported features, is also provided as-is and may be incomplete.

As this code intentionally uses private Scrapy APIs, it specifies a tight dependency on Scrapy. This version of the package only supports Scrapy 2.15.x.

Features overview

The baseline for these handlers is the default Scrapy handler, HTTP11DownloadHandler, which uses Twisted and supports HTTP/1.1. Feature parity with it is an explicit goal but it’s not always possible and not all possible features are implemented in all handlers (which may change in the future). Certain popular features not supported by HTTP11DownloadHandler, like HTTP/2 support, and features unique to some handlers, may or may not be implemented. Please see the sections for individual handlers for more details.

The following table summarizes the most important differences:

Handler

HTTP/2

HTTP/3

Proxies

TLS logging

Impersonation

TLS version limits

(HTTP11DownloadHandler)

Not possible

Not possible

Yes

Yes

Not possible

No

AiohttpDownloadHandler

Not possible

Not possible

Yes

Partial

Not possible

No

CurlCffiDownloadHandler

Yes

Yes (not tested)

Yes

Not possible

No

Not possible

HttpxDownloadHandler

Yes

Not possible

Yes

Yes

Not possible

No

NiquestsDownloadHandler

Yes

No

Yes

Yes

Not possible

Not possible

PyreqwestDownloadHandler

Yes

Not possible

Not possible

Not possible

Not possible

No

The following basic features are supported by all handlers unless mentioned in their docs:

  • Native asyncio integration without requiring a Twisted reactor

  • HTTP/1.1 for http and https schemes

  • Unified download handler exceptions

  • Proxies, including HTTP and HTTPS proxies for HTTP and HTTPS destinations

  • Proxy authentication via HttpProxyMiddleware

  • IPv6 destinations

  • DOWNLOAD_MAXSIZE, DOWNLOAD_WARNSIZE and the respective request meta keys

  • DOWNLOAD_TIMEOUT and the respective request meta key

  • DOWNLOAD_FAIL_ON_DATALOSS and the "dataloss" flag

  • Setting the download_latency request meta

  • DOWNLOAD_BIND_ADDRESS

  • DOWNLOAD_VERIFY_CERTIFICATES

  • headers_received and bytes_received signals

  • Not reading the proxy configuration from the environment variables

  • Not handling cookies, redirects, compression and other things handled by Scrapy itself

Handlers

AiohttpDownloadHandler

This handler supports HTTP/1.1 and uses the aiohttp library.

Install it with:

pip install scrapy-download-handlers-incubator[aiohttp]

Enable it with:

DOWNLOAD_HANDLERS = {
    "http": "scrapy_download_handlers_incubator.AiohttpDownloadHandler",
    "https": "scrapy_download_handlers_incubator.AiohttpDownloadHandler",
}

Features and limitations

Proxies

Yes (HTTPS proxies for HTTPS destinations are not supported on Python < 3.11)

HTTP/2

No (not supported by the library)

TLS verbose logging

Partial (skipped for small responses)

response.ip_address

Partial (skipped for small responses)

response.certificate

Partial (DER bytes; skipped for small responses)

Per-request bindaddress

No (not supported by the library)

Proxy certificate verification

Follows DOWNLOAD_VERIFY_CERTIFICATES

Notable features supported by the library but not implemented:

  • DNS resolving settings

  • Custom DNS resolvers

CurlCffiDownloadHandler

This handler supports HTTP/1.1 and HTTP/2 and uses the curl_cffi library.

Install it with:

pip install scrapy-download-handlers-incubator[curl-cffi]

Enable it with:

DOWNLOAD_HANDLERS = {
    "http": "scrapy_download_handlers_incubator.CurlCffiDownloadHandler",
    "https": "scrapy_download_handlers_incubator.CurlCffiDownloadHandler",
}

Features and limitations

Proxies

Yes

HTTP/2

Yes

HTTP/3

Yes (but not tested)

TLS verbose logging

No (not supported by the library)

response.ip_address

Yes

response.certificate

No (not supported by the library)

Per-request bindaddress

No (not supported by the library)

Proxy certificate verification

Follows DOWNLOAD_VERIFY_CERTIFICATES

Notable features supported by the library but not implemented:

  • Impersonation

  • Advanced libcurl tunables

Settings

  • CURL_CFFI_HTTP_VERSION (str, default: "v1", corresponding to “Enforce HTTP/1.1”): The HTTP version to use. The value is passed directly to the library so the possible values are set by curl_cffi.requests.utils.normalize_http_version() and the meanings of the underlying constants can be seen in libcurl docs (CURLOPT_HTTP_VERSION). Set this to "v2tls" or "v2" to enable HTTP/2 for HTTPS requests or for all requests respectively. Set this to "v3" to enable HTTP/3.

HttpxDownloadHandler

This is an updated copy of the official scrapy.core.downloader.handlers._httpx.HttpxDownloadHandler handler. It supports HTTP/1.1 and HTTP/2 and uses the httpx library.

Install it with:

pip install scrapy-download-handlers-incubator[httpx]

Enable it with:

DOWNLOAD_HANDLERS = {
    "http": "scrapy_download_handlers_incubator.HttpxDownloadHandler",
    "https": "scrapy_download_handlers_incubator.HttpxDownloadHandler",
}

Features and limitations

Proxies

Yes (separate connection pool per proxy)

HTTP/2

Yes

HTTP/3

No (not supported by the library)

TLS verbose logging

Yes

response.ip_address

Yes

response.certificate

Yes (DER bytes)

Per-request bindaddress

No (not supported by the library)

Proxy certificate verification

Follows DOWNLOAD_VERIFY_CERTIFICATES

Notable features supported by the library but not implemented:

  • SOCKS5 proxies

  • Alternative transports

  • Limiting the number of per-proxy connection pool to save resources

Settings

  • HTTPX_HTTP2_ENABLED (bool, default: False): Whether to enable HTTP/2.

NiquestsDownloadHandler

This handler supports HTTP/1.1 and HTTP/2 and uses the niquests library.

Install it with:

pip install scrapy-download-handlers-incubator[niquests]

Enable it with:

DOWNLOAD_HANDLERS = {
    "http": "scrapy_download_handlers_incubator.NiquestsDownloadHandler",
    "https": "scrapy_download_handlers_incubator.NiquestsDownloadHandler",
}

Features and limitations

Proxies

Yes

HTTP/2

Yes

HTTP/3

No (not implemented)

TLS verbose logging

Yes

response.ip_address

Yes

response.certificate

Yes (DER bytes)

Per-request bindaddress

No (not supported by the library)

Proxy certificate verification

Follows DOWNLOAD_VERIFY_CERTIFICATES

Notable features supported by the library but not implemented:

  • Custom DNS resolvers

  • SOCKS5 proxies

  • HTTP/2 tunables

Settings

  • NIQUESTS_HTTP2_ENABLED (bool, default: False): Whether to enable HTTP/2.

PyreqwestDownloadHandler

This handler supports HTTP/1.1 and HTTP/2 and uses the pyreqwest library.

Install it with:

pip install scrapy-download-handlers-incubator[pyreqwest]

Enable it with:

DOWNLOAD_HANDLERS = {
    "http": "scrapy_download_handlers_incubator.PyreqwestDownloadHandler",
    "https": "scrapy_download_handlers_incubator.PyreqwestDownloadHandler",
}

Features and limitations

Proxies

No (not supported by the library)

HTTP/2

Yes

HTTP/3

No (not supported by the library)

TLS verbose logging

No (not supported by the library)

response.ip_address

No (not supported by the library)

response.certificate

No (not supported by the library)

Per-request bindaddress

No (not supported by the library)

Notable features supported by the library but not implemented:

  • HTTP/2 tunables

Settings

  • PYREQWEST_HTTP2_ENABLED (bool, default: False): Whether to enable HTTP/2.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scrapy_download_handlers_incubator-0.1.1.tar.gz (45.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file scrapy_download_handlers_incubator-0.1.1.tar.gz.

File metadata

File hashes

Hashes for scrapy_download_handlers_incubator-0.1.1.tar.gz
Algorithm Hash digest
SHA256 a1f781b36134fd5e7dd816a62dc24459ea93fef77a1a93babcb9a3196e049cb0
MD5 d5582ce65dd730ad2d4311a4982b0fe8
BLAKE2b-256 24c4d9fb4e6bc848b2993e68e264ccb6343cefa132874c2dc17015bb92fa9e7f

See more details on using hashes here.

Provenance

The following attestation bundles were made for scrapy_download_handlers_incubator-0.1.1.tar.gz:

Publisher: publish.yml on scrapy-plugins/scrapy-download-handlers-incubator

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file scrapy_download_handlers_incubator-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for scrapy_download_handlers_incubator-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 178c3e16bfc82c8970a4974cfa232bf4bdc5dc85fe882500d6a53524414461ad
MD5 422fe9f48c8315b7d236b38269c5ac9e
BLAKE2b-256 47743be2e2af53a182649574658f87147a804af4afe35aae15d2335d2dca5df7

See more details on using hashes here.

Provenance

The following attestation bundles were made for scrapy_download_handlers_incubator-0.1.1-py3-none-any.whl:

Publisher: publish.yml on scrapy-plugins/scrapy-download-handlers-incubator

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page