A composite Scrapy download handler that integrates cloudscraper, curl_cffi (via scrapy-impersonate), and Twisted HTTP/1.1 into one handler with per-request routing via request.meta.

These details have not been verified by PyPI

Project links

Project description

scrapy-common-downloadhandler

A composite Scrapy download handler that integrates cloudscraper, curl_cffi (via scrapy-impersonate), and Twisted HTTP/1.1 into a single handler with per-request routing via request.meta.

Inheritance Chain

HTTP11DownloadHandler             <- Twisted HTTP/1.1 (fallback)
  └── ImpersonateDownloadHandler  <- curl_cffi (when meta["impersonate"] is set)
        └── CommonDownloadHandler <- cloudscraper (when meta["use_cloudscraper"] is True)

Installation

pip install scrapy-common-downloadhandler

Quick Start

1. Configure the download handler

In your project's settings.py or spider's custom_settings:

DOWNLOAD_HANDLERS = {
    "http": "scrapy_common_downloadhandler.CommonDownloadHandler",
    "https": "scrapy_common_downloadhandler.CommonDownloadHandler",
}
USER_AGENT = ""

USER_AGENT must be set to an empty string. This prevents Scrapy's UserAgentMiddleware from injecting a default User-Agent header (e.g. Scrapy/x.x.x), which would conflict with the browser User-Agent that curl_cffi automatically provides during impersonation — resulting in a TLS fingerprint / User-Agent mismatch detectable by anti-bot systems.

No other additional settings or flags are needed. All three download modes are available once the handler is configured.

2. Use in your spider

import scrapy

class MySpider(scrapy.Spider):
    name = "example"

    def start_requests(self):
        # cloudscraper
        yield scrapy.Request(url, meta={"use_cloudscraper": True}, callback=self.parse)

        # curl_cffi impersonate
        yield scrapy.Request(url, meta={"impersonate": "chrome"}, callback=self.parse)

        # default Twisted HTTP/1.1
        yield scrapy.Request(url, callback=self.parse)

Usage

cloudscraper Requests

# Basic
yield scrapy.Request(url, meta={"use_cloudscraper": True}, callback=self.parse)

# With create_scraper() parameter passthrough
yield scrapy.Request(url, meta={
    "use_cloudscraper": True,
    "cloudscraper_args": {
        "browser": {"browser": "chrome", "mobile": False, "platform": "windows"},
        "delay": 10,
        "interpreter": "nodejs",
    },
}, callback=self.parse)

All keys in cloudscraper_args are passed directly to cloudscraper.create_scraper(**args).

curl_cffi impersonate Requests

# Basic
yield scrapy.Request(url, meta={"impersonate": "chrome"}, callback=self.parse)

# With parameter passthrough
yield scrapy.Request(url, meta={
    "impersonate": "chrome",
    "impersonate_args": {"timeout": 30},
}, callback=self.parse)

See scrapy-impersonate for full details on impersonate_args.

Default Twisted HTTP/1.1 Requests

# No special meta needed
yield scrapy.Request(url, callback=self.parse)

Parameter Passthrough Reference

Mode	meta flag	passthrough key	passthrough target
cloudscraper	`use_cloudscraper: True`	`cloudscraper_args: {}`	`cloudscraper.create_scraper(**args)`
curl_cffi	`impersonate: "chrome"`	`impersonate_args: {}`	curl_cffi request method
Twisted	(none)	(none)	Scrapy default settings

Proxy Support

Proxy middlewares that set request.meta["proxy"] work seamlessly:

cloudscraper: converts to proxies={"http": proxy, "https": proxy}
curl_cffi: read by ImpersonateDownloadHandler's RequestParser
Twisted: handled by Scrapy's built-in HttpProxyMiddleware

scrapy-redis Compatibility

Fully compatible. scrapy-redis only handles scheduling and deduplication, which is independent of the download handler layer.

Response Flags

Responses carry a flag indicating which download mode was used:

"cloudscraper" in response.flags — downloaded via cloudscraper
"impersonate" in response.flags — downloaded via curl_cffi
Neither — downloaded via Twisted HTTP/1.1

Notes

USER_AGENT = "" is required. Without it, Scrapy's UserAgentMiddleware will set the User-Agent header before the request reaches the download handler, overriding the browser-matched User-Agent that curl_cffi provides during impersonation.
cloudscraper is a synchronous library (based on requests). The handler uses deferToThread to run it in a thread pool, avoiding reactor blocking.
Internal redirects are disabled (allow_redirects=False) in cloudscraper mode. Redirects are handled by Scrapy's RedirectMiddleware.
The Content-Encoding header is stripped from cloudscraper responses. Decompression is handled by Scrapy's HttpCompressionMiddleware.
Scrapy's default reactor is AsyncioSelectorReactor. No additional TWISTED_REACTOR configuration is needed.

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

Feb 27, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scrapy_common_downloadhandler-0.1.0.tar.gz (87.2 kB view details)

Uploaded Feb 27, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

scrapy_common_downloadhandler-0.1.0-py3-none-any.whl (5.6 kB view details)

Uploaded Feb 27, 2026 Python 3

File details

Details for the file scrapy_common_downloadhandler-0.1.0.tar.gz.

File metadata

Download URL: scrapy_common_downloadhandler-0.1.0.tar.gz
Upload date: Feb 27, 2026
Size: 87.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for scrapy_common_downloadhandler-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`5e9c39a4328b44c71ffbaba23825d248ae87a859a481a5351e13989ace058506`
MD5	`cebc8d9771614b49020981ead030f0cd`
BLAKE2b-256	`87a0d0592d229a7fd074cb4e825795b24fa8ce028a0d91960c4be61a0c33ccc4`

See more details on using hashes here.

File details

Details for the file scrapy_common_downloadhandler-0.1.0-py3-none-any.whl.

File metadata

Download URL: scrapy_common_downloadhandler-0.1.0-py3-none-any.whl
Upload date: Feb 27, 2026
Size: 5.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for scrapy_common_downloadhandler-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`3d3ddd3fb455fa458a257bc02afc36b0b00d2001c4c21e5f25b40d1528d0e850`
MD5	`e9e35ffa9e40f7c3642cf5aeb70f1943`
BLAKE2b-256	`f69a105399d7222e175a6e22be3f7e9e8a34581ec23efbc11f9f802909e6f7c0`

See more details on using hashes here.

scrapy-common-downloadhandler 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

scrapy-common-downloadhandler

Inheritance Chain

Installation

Quick Start

1. Configure the download handler

2. Use in your spider

Usage

cloudscraper Requests

curl_cffi impersonate Requests

Default Twisted HTTP/1.1 Requests

Parameter Passthrough Reference

Proxy Support

scrapy-redis Compatibility

Response Flags

Notes

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes