Skip to main content

Scrapy integration with curl_cffi (curl-impersonate).

Project description

scrapy-curl-cffi

Scrapy integration with curl_cffi (curl-impersonate).

Installation

pip install scrapy-curl-cffi

Another option, to enable Scrapy's support for modern HTTP compression protocols:

pip install scrapy-curl-cffi[compression]

Configuration

Update your Scrapy project settings as follows:

DOWNLOAD_HANDLERS = {
    "http": "scrapy_curl_cffi.handler.CurlCffiDownloadHandler",
    "https": "scrapy_curl_cffi.handler.CurlCffiDownloadHandler",
}

DOWNLOADER_MIDDLEWARES = {
    "scrapy_curl_cffi.middlewares.CurlCffiMiddleware": 200,
    "scrapy_curl_cffi.middlewares.DefaultHeadersMiddleware": 400,
    "scrapy_curl_cffi.middlewares.UserAgentMiddleware": 500,
    "scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware": None,
    "scrapy.downloadermiddlewares.useragent.UserAgentMiddleware": None,
}

TWISTED_REACTOR = "twisted.internet.asyncioreactor.AsyncioSelectorReactor"

Usage

To download a scrapy.Request with curl_cffi, add the curl_cffi_options special key to the Request.meta attribute. The value should be a dict with any of the following options:

  • impersonate (str) - which browser version to impersonate
  • default_headers (bool) - whether to set default browser headers when impersonating (default: True)
  • ja3 (str) - ja3 string to impersonate
  • akamai (str) - akamai string to impersonate
  • extra_fp (str) - extra fingerprints options, in complement to ja3 and akamai strings
  • verify (bool) - whether to verify https certs (default: False)

See the curl_cffi documentation for more info on these options.

Alternatively, you can use the curl_cffi_options spider attribute or the CURL_CFFI_OPTIONS setting to automatically assign the curl_cffi_options meta for all requests.

Example spider

class FingerprintsSpider(scrapy.Spider):
    name = "fingerprints"
    start_urls = ["https://tls.browserleaks.com/json"]
    curl_cffi_options = {"impersonate": "chrome"}

    def parse(self, response):
        yield response.json()

curl_cffi interop

scrapy-curl-cffi strives to adhere to established Scrapy conventions, ensuring that most Scrapy settings, spider attributes, request/response attributes and meta keys configure the crawler's behavior in an expected manner.

Similar projects

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scrapy_curl_cffi-0.2.0.tar.gz (5.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

scrapy_curl_cffi-0.2.0-py3-none-any.whl (5.8 kB view details)

Uploaded Python 3

File details

Details for the file scrapy_curl_cffi-0.2.0.tar.gz.

File metadata

  • Download URL: scrapy_curl_cffi-0.2.0.tar.gz
  • Upload date:
  • Size: 5.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.7.6

File hashes

Hashes for scrapy_curl_cffi-0.2.0.tar.gz
Algorithm Hash digest
SHA256 480a3a96cbb33b348ba18e9c5bd7f92435361d6813c383d0e895ee1bdbb22992
MD5 f6c37f05cc1f846fb6fed6a9d5f530a0
BLAKE2b-256 a5eea262c2df83c5574509eef61a35a4a83088d870a94eb1705129cbc8f0144a

See more details on using hashes here.

File details

Details for the file scrapy_curl_cffi-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for scrapy_curl_cffi-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 2ec78cc171edfcea9b6b37b049698bd2042226547cbe711b3f49c81258a5c49e
MD5 7678c9c5c3a832100c0e3e1505011b3c
BLAKE2b-256 9f977c0d951e7278357911398cf001b59b249f3abd9b0f836104d62a615a8013

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page