Skip to main content

Scrapy download handler that can impersonate browser fingerprints

Project description

scrapy-impersonate

version

scrapy-impersonate is a Scrapy download handler. This project integrates curl_cffi to perform HTTP requests, so it can impersonate browsers' TLS signatures or JA3 fingerprints.

Installation

pip install scrapy-impersonate

Activation

To use this package, replace the default http and https Download Handlers by updating the DOWNLOAD_HANDLERS setting:

DOWNLOAD_HANDLERS = {
    "http": "scrapy_impersonate.ImpersonateDownloadHandler",
    "https": "scrapy_impersonate.ImpersonateDownloadHandler",
}

By setting USER_AGENT = None, curl_cffi will automatically choose the appropriate User-Agent based on the impersonated browser:

USER_AGENT = ""

Also, be sure to install the asyncio-based Twisted reactor for proper asynchronous execution:

TWISTED_REACTOR = "twisted.internet.asyncioreactor.AsyncioSelectorReactor"

Usage

Set the impersonate Request.meta key to download a request using curl_cffi:

import scrapy


class ImpersonateSpider(scrapy.Spider):
    name = "impersonate_spider"
    custom_settings = {
        "TWISTED_REACTOR": "twisted.internet.asyncioreactor.AsyncioSelectorReactor",
        "USER_AGENT": "",
        "DOWNLOAD_HANDLERS": {
            "http": "scrapy_impersonate.ImpersonateDownloadHandler",
            "https": "scrapy_impersonate.ImpersonateDownloadHandler",
        },
        "DOWNLOADER_MIDDLEWARES": {
            "scrapy_impersonate.RandomBrowserMiddleware": 1000,
        },
    }

    def start_requests(self):
        for _ in range(5):
            yield scrapy.Request(
                "https://tls.browserleaks.com/json",
                dont_filter=True,
            )

    def parse(self, response):
        # ja3_hash: 98cc085d47985d3cca9ec1415bbbf0d1 (chrome133a)
        # ja3_hash: 2d692a4485ca2f5f2b10ecb2d2909ad3 (firefox133)
        # ja3_hash: c11ab92a9db8107e2a0b0486f35b80b9 (chrome124)
        # ja3_hash: 773906b0efdefa24a7f2b8eb6985bf37 (safari15_5)
        # ja3_hash: cd08e31494f9531f560d64c695473da9 (chrome99_android)

        yield {"ja3_hash": response.json()["ja3_hash"]}

impersonate-args

You can pass any necessary arguments to curl_cffi through impersonate_args. For example:

yield scrapy.Request(
    "https://tls.browserleaks.com/json",
    dont_filter=True,
    meta={
        "impersonate": browser,
        "impersonate_args": {
            "verify": False,
            "timeout": 10,
        },
    },
)

Supported browsers

The following browsers can be impersonated

Browser Version Build OS Name
Chrome 99 99.0.4844.51 Windows 10 chrome99
Chrome 99 99.0.4844.73 Android 12 chrome99_android
Chrome 100 100.0.4896.75 Windows 10 chrome100
Chrome 101 101.0.4951.67 Windows 10 chrome101
Chrome 104 104.0.5112.81 Windows 10 chrome104
Chrome 107 107.0.5304.107 Windows 10 chrome107
Chrome 110 110.0.5481.177 Windows 10 chrome110
Chrome 116 116.0.5845.180 Windows 10 chrome116
Chrome 119 119.0.6045.199 macOS Sonoma chrome119
Chrome 120 120.0.6099.109 macOS Sonoma chrome120
Chrome 123 123.0.6312.124 macOS Sonoma chrome123
Chrome 124 124.0.6367.60 macOS Sonoma chrome124
Chrome 131 131.0.6778.86 macOS Sonoma chrome131
Chrome 131 131.0.6778.81 Android 14 chrome131_android
Chrome 133 133.0.6943.55 macOS Sequoia chrome133a
Edge 99 99.0.1150.30 Windows 10 edge99
Edge 101 101.0.1210.47 Windows 10 edge101
Safari 15.3 16612.4.9.1.8 MacOS Big Sur safari15_3
Safari 15.5 17613.2.7.1.8 MacOS Monterey safari15_5
Safari 17.0 unclear MacOS Sonoma safari17_0
Safari 17.2 unclear iOS 17.2 safari17_2_ios
Safari 18.0 unclear MacOS Sequoia safari18_0
Safari 18.0 unclear iOS 18.0 safari18_0_ios
Firefox 133.0 133.0.3 macOS Sonoma firefox133
Firefox 135.0 135.0.1 macOS Sonoma firefox135

Thanks

This project is inspired by the following projects:

  • curl_cffi - Python binding for curl-impersonate via cffi. A http client that can impersonate browser tls/ja3/http2 fingerprints.
  • curl-impersonate - A special build of curl that can impersonate Chrome & Firefox
  • scrapy-playwright - Playwright integration for Scrapy

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scrapy_impersonate-1.6.3.tar.gz (6.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

scrapy_impersonate-1.6.3-py3-none-any.whl (7.3 kB view details)

Uploaded Python 3

File details

Details for the file scrapy_impersonate-1.6.3.tar.gz.

File metadata

  • Download URL: scrapy_impersonate-1.6.3.tar.gz
  • Upload date:
  • Size: 6.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for scrapy_impersonate-1.6.3.tar.gz
Algorithm Hash digest
SHA256 bf20edcb88b73b759d78bf0047b8acc91513bd44a8490cd600c5fa934fd0aa3c
MD5 5ec4bef72ec895719ca967799784080e
BLAKE2b-256 043ffa97b40cec0c601255bcdffc73a1916a70dd01bfc03620444861d779e529

See more details on using hashes here.

File details

Details for the file scrapy_impersonate-1.6.3-py3-none-any.whl.

File metadata

File hashes

Hashes for scrapy_impersonate-1.6.3-py3-none-any.whl
Algorithm Hash digest
SHA256 8f07cfa3d54332225c02f4c17e4af271d04ad3b1d547ca11db39bebf109239d1
MD5 1b2f0335950ccc7cef6d7630f6e4a126
BLAKE2b-256 31e20f8f02654dca836aa0901c634196e9d1df8978a0261448c34f7fd28f2383

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page