Skip to main content

Scrapy download handler that can impersonate browser fingerprints

Project description

scrapy-impersonate

version

scrapy-impersonate is a Scrapy download handler. This project integrates curl_cffi to perform HTTP requests, so it can impersonate browsers' TLS signatures or JA3 fingerprints.

Installation

pip install scrapy-impersonate

Activation

To use this package, replace the default http and https Download Handlers by updating the DOWNLOAD_HANDLERS setting:

DOWNLOAD_HANDLERS = {
    "http": "scrapy_impersonate.ImpersonateDownloadHandler",
    "https": "scrapy_impersonate.ImpersonateDownloadHandler",
}

By setting USER_AGENT = None, curl_cffi will automatically choose the appropriate User-Agent based on the impersonated browser:

USER_AGENT = ""

Also, be sure to install the asyncio-based Twisted reactor for proper asynchronous execution:

TWISTED_REACTOR = "twisted.internet.asyncioreactor.AsyncioSelectorReactor"

Usage

Set the impersonate Request.meta key to download a request using curl_cffi:

import scrapy


class ImpersonateSpider(scrapy.Spider):
    name = "impersonate_spider"
    custom_settings = {
        "TWISTED_REACTOR": "twisted.internet.asyncioreactor.AsyncioSelectorReactor",
        "USER_AGENT": "",
        "DOWNLOAD_HANDLERS": {
            "http": "scrapy_impersonate.ImpersonateDownloadHandler",
            "https": "scrapy_impersonate.ImpersonateDownloadHandler",
        },
        "DOWNLOADER_MIDDLEWARES": {
            "scrapy_impersonate.RandomBrowserMiddleware": 1000,
        },
    }

    def start_requests(self):
        for _ in range(5):
            yield scrapy.Request(
                "https://tls.browserleaks.com/json",
                dont_filter=True,
            )

    def parse(self, response):
        # ja3_hash: 98cc085d47985d3cca9ec1415bbbf0d1 (chrome133a)
        # ja3_hash: 2d692a4485ca2f5f2b10ecb2d2909ad3 (firefox133)
        # ja3_hash: c11ab92a9db8107e2a0b0486f35b80b9 (chrome124)
        # ja3_hash: 773906b0efdefa24a7f2b8eb6985bf37 (safari15_5)
        # ja3_hash: cd08e31494f9531f560d64c695473da9 (chrome99_android)

        yield {"ja3_hash": response.json()["ja3_hash"]}

impersonate-args

You can pass any necessary arguments to curl_cffi through impersonate_args. For example:

yield scrapy.Request(
    "https://tls.browserleaks.com/json",
    dont_filter=True,
    meta={
        "impersonate": browser,
        "impersonate_args": {
            "verify": False,
            "timeout": 10,
        },
    },
)

Supported browsers

The following browsers can be impersonated (curl_cffi >= 0.15.0):

Browser Version OS Name HTTP/3
Chrome 99 Windows 10 chrome99
Chrome 99 Android 12 chrome99_android
Chrome 100 Windows 10 chrome100
Chrome 101 Windows 10 chrome101
Chrome 104 Windows 10 chrome104
Chrome 107 Windows 10 chrome107
Chrome 110 Windows 10 chrome110
Chrome 116 Windows 10 chrome116
Chrome 119 macOS Sonoma chrome119
Chrome 120 macOS Sonoma chrome120
Chrome 123 macOS Sonoma chrome123
Chrome 124 macOS Sonoma chrome124
Chrome 131 macOS Sonoma chrome131
Chrome 131 Android 14 chrome131_android
Chrome 133 macOS Sequoia chrome133a
Chrome 136 macOS Sequoia chrome136
Chrome 142 macOS Tahoe chrome142
Chrome 145 macOS Tahoe chrome145
Chrome 146 macOS Tahoe chrome146
Edge 99 Windows 10 edge99
Edge 101 Windows 10 edge101
Safari 15.3 macOS Big Sur safari153
Safari 15.5 macOS Monterey safari155
Safari 17.0 macOS Sonoma safari170
Safari 17.2 iOS 17.2 safari172_ios
Safari 18.0 macOS Sequoia safari180
Safari 18.0 iOS 18.0 safari180_ios
Safari 18.4 macOS Sequoia safari184
Safari 18.4 iOS 18.4 safari184_ios
Safari 26.0 macOS Tahoe safari260
Safari 26.0 iOS 26.0 safari260_ios
Safari 26.0.1 macOS Tahoe safari2601
Firefox 133.0 macOS Sonoma firefox133
Firefox 135.0 macOS Sonoma firefox135
Firefox 144.0 macOS Tahoe firefox144
Firefox 147.0 macOS Tahoe firefox147
Tor 14.5 macOS Sonoma tor145

Notes:

  1. The old Safari target names (safari15_3, safari15_5, safari17_0, safari17_2_ios, safari18_0, safari18_0_ios) are kept as deprecated aliases. Prefer the new names (safari153, safari155, …) for new code.
  2. You can also pass a floating value like impersonate="chrome", "safari", "safari_ios" or "firefox" to let curl_cffi pick the most recent fingerprint without pinning a version.
  3. RandomBrowserMiddleware rotates across chrome, firefox, safari, edge and tor by default. Override with IMPERSONATE_BROWSERS to narrow the set (e.g. IMPERSONATE_BROWSERS = ["chrome", "firefox"]).

Thanks

This project is inspired by the following projects:

  • curl_cffi - Python binding for curl-impersonate via cffi. A http client that can impersonate browser tls/ja3/http2 fingerprints.
  • curl-impersonate - A special build of curl that can impersonate Chrome & Firefox
  • scrapy-playwright - Playwright integration for Scrapy

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scrapy_impersonate-1.7.0.tar.gz (7.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

scrapy_impersonate-1.7.0-py3-none-any.whl (7.6 kB view details)

Uploaded Python 3

File details

Details for the file scrapy_impersonate-1.7.0.tar.gz.

File metadata

  • Download URL: scrapy_impersonate-1.7.0.tar.gz
  • Upload date:
  • Size: 7.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.5

File hashes

Hashes for scrapy_impersonate-1.7.0.tar.gz
Algorithm Hash digest
SHA256 e5f9c1f3206317dfbca9a88847ea2cd193228900456a4b0be48d2e34840c06c8
MD5 240926d03956c5970b9b9981476f0be6
BLAKE2b-256 a9eec6f9f23c78acf44aa84e81dd765614eed8a78026d074c21bda603929f544

See more details on using hashes here.

File details

Details for the file scrapy_impersonate-1.7.0-py3-none-any.whl.

File metadata

File hashes

Hashes for scrapy_impersonate-1.7.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0b6e4d0b6cfc6eb3ab1fc6636de27d51a13d82fe585882a00064b52914255fb1
MD5 2a649c1b45b70f848fab3f09b71e708b
BLAKE2b-256 2a1ca43eaea203e5162faebcb88bf0e251786920f743d1ee9f2c44cdc7f67716

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page