Skip to main content

Scrapy integration with curl_cffi (curl-impersonate).

Project description

scrapy-curl-cffi

Scrapy integration with curl_cffi (curl-impersonate).

Installation

pip install scrapy-curl-cffi

Another option, to enable Scrapy's support for modern HTTP compression protocols:

pip install scrapy-curl-cffi[compression]

Configuration

Update your Scrapy project settings as follows:

"DOWNLOAD_HANDLERS": {
    "http": "scrapy_curl_cffi.handler.CurlCffiDownloadHandler",
    "https": "scrapy_curl_cffi.handler.CurlCffiDownloadHandler",
}

"DOWNLOADER_MIDDLEWARES": {
    "scrapy_curl_cffi.middlewares.CurlCffiMiddleware": 200,
    "scrapy_curl_cffi.middlewares.DefaultHeadersMiddleware": 400,
    "scrapy_curl_cffi.middlewares.UserAgentMiddleware": 500,
    "scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware": None,
    "scrapy.downloadermiddlewares.useragent.UserAgentMiddleware": None,
}

"TWISTED_REACTOR": "twisted.internet.asyncioreactor.AsyncioSelectorReactor"

Usage

To download a scrapy.Request with curl_cffi, add the curl_cffi_options special key to the Request.meta attribute. The value should be a dict with any of the following options:

  • impersonate (str) - which browser version to impersonate
  • default_headers (bool) - whether to set default browser headers when impersonating (default: True)
  • ja3 (str) - ja3 string to impersonate
  • akamai (str) - akamai string to impersonate
  • extra_fp (str) - extra fingerprints options, in complement to ja3 and akamai strings
  • verify (bool) - whether to verify https certs (default: False)

See the curl_cffi documentation for more info on these options.

Alternatively, you can use the curl_cffi_options spider attribute or the CURL_CFFI_OPTIONS setting to automatically assign the curl_cffi_options meta for all requests.

Example spider

class FingerprintsSpider(scrapy.Spider):
    name = "fingerprints"
    start_urls = ["https://tls.browserleaks.com/json"]
    curl_cffi_options = {"impersonate": "chrome"}

    def parse(self, response):
        yield response.json()

Similar projects

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scrapy_curl_cffi-0.1.0.tar.gz (5.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

scrapy_curl_cffi-0.1.0-py3-none-any.whl (5.7 kB view details)

Uploaded Python 3

File details

Details for the file scrapy_curl_cffi-0.1.0.tar.gz.

File metadata

  • Download URL: scrapy_curl_cffi-0.1.0.tar.gz
  • Upload date:
  • Size: 5.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.7.6

File hashes

Hashes for scrapy_curl_cffi-0.1.0.tar.gz
Algorithm Hash digest
SHA256 c3ea95d26119947b4783f7abffe7c163d6f867920420c602acc8a87493269728
MD5 08eba693a0547fa2bdbc9630032a8566
BLAKE2b-256 883f6351753814abc97f9b0761ea5607e73d05f57af9dbea620c555557ef6391

See more details on using hashes here.

File details

Details for the file scrapy_curl_cffi-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for scrapy_curl_cffi-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5d0d54257bf86de0a7603095859bb405c7cc80c8be21dba25295ead8a02e312c
MD5 6c5126e77ab21064b487572c3bc6ea74
BLAKE2b-256 cd67b4dabbea45f0064881037293e401e5fa2926d5963a7040bc8c9807e990bb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page