Skip to main content

Scrapy download handler that can impersonate browser fingerprints

Project description

scrapy-impersonate

version

scrapy-impersonate is a Scrapy download handler. This project integrates curl_cffi to perform HTTP requests, so it can impersonate browsers' TLS signatures or JA3 fingerprints.

Installation

pip install scrapy-impersonate

Activation

To use this package, replace the default http and https Download Handlers by updating the DOWNLOAD_HANDLERS setting:

DOWNLOAD_HANDLERS = {
    "http": "scrapy_impersonate.ImpersonateDownloadHandler",
    "https": "scrapy_impersonate.ImpersonateDownloadHandler",
}

By setting USER_AGENT = None, curl_cffi will automatically choose the appropriate User-Agent based on the impersonated browser:

USER_AGENT = ""

Also, be sure to install the asyncio-based Twisted reactor for proper asynchronous execution:

TWISTED_REACTOR = "twisted.internet.asyncioreactor.AsyncioSelectorReactor"

Usage

Set the impersonate Request.meta key to download a request using curl_cffi:

import scrapy


class ImpersonateSpider(scrapy.Spider):
    name = "impersonate_spider"
    custom_settings = {
        "TWISTED_REACTOR": "twisted.internet.asyncioreactor.AsyncioSelectorReactor",
        "USER_AGENT": "",
        "DOWNLOAD_HANDLERS": {
            "http": "scrapy_impersonate.ImpersonateDownloadHandler",
            "https": "scrapy_impersonate.ImpersonateDownloadHandler",
        },
        "DOWNLOADER_MIDDLEWARES": {
            "scrapy_impersonate.RandomBrowserMiddleware": 1000,
        },
    }

    def start_requests(self):
        for _ in range(5):
            yield scrapy.Request(
                "https://tls.browserleaks.com/json",
                dont_filter=True,
            )

    def parse(self, response):
        # ja3_hash: 98cc085d47985d3cca9ec1415bbbf0d1 (chrome133a)
        # ja3_hash: 2d692a4485ca2f5f2b10ecb2d2909ad3 (firefox133)
        # ja3_hash: c11ab92a9db8107e2a0b0486f35b80b9 (chrome124)
        # ja3_hash: 773906b0efdefa24a7f2b8eb6985bf37 (safari15_5)
        # ja3_hash: cd08e31494f9531f560d64c695473da9 (chrome99_android)

        yield {"ja3_hash": response.json()["ja3_hash"]}

impersonate-args

You can pass any necessary arguments to curl_cffi through impersonate_args. For example:

yield scrapy.Request(
    "https://tls.browserleaks.com/json",
    dont_filter=True,
    meta={
        "impersonate": browser,
        "impersonate_args": {
            "verify": False,
            "timeout": 10,
        },
    },
)

Supported browsers

The following browsers can be impersonated

Browser Version Build OS Name
Chrome 99 99.0.4844.51 Windows 10 chrome99
Chrome 99 99.0.4844.73 Android 12 chrome99_android
Chrome 100 100.0.4896.75 Windows 10 chrome100
Chrome 101 101.0.4951.67 Windows 10 chrome101
Chrome 104 104.0.5112.81 Windows 10 chrome104
Chrome 107 107.0.5304.107 Windows 10 chrome107
Chrome 110 110.0.5481.177 Windows 10 chrome110
Chrome 116 116.0.5845.180 Windows 10 chrome116
Chrome 119 119.0.6045.199 macOS Sonoma chrome119
Chrome 120 120.0.6099.109 macOS Sonoma chrome120
Chrome 123 123.0.6312.124 macOS Sonoma chrome123
Chrome 124 124.0.6367.60 macOS Sonoma chrome124
Chrome 131 131.0.6778.86 macOS Sonoma chrome131
Chrome 131 131.0.6778.81 Android 14 chrome131_android
Chrome 133 133.0.6943.55 macOS Sequoia chrome133a
Edge 99 99.0.1150.30 Windows 10 edge99
Edge 101 101.0.1210.47 Windows 10 edge101
Safari 15.3 16612.4.9.1.8 MacOS Big Sur safari15_3
Safari 15.5 17613.2.7.1.8 MacOS Monterey safari15_5
Safari 17.0 unclear MacOS Sonoma safari17_0
Safari 17.2 unclear iOS 17.2 safari17_2_ios
Safari 18.0 unclear MacOS Sequoia safari18_0
Safari 18.0 unclear iOS 18.0 safari18_0_ios
Firefox 133.0 133.0.3 macOS Sonoma firefox133
Firefox 135.0 135.0.1 macOS Sonoma firefox135

Thanks

This project is inspired by the following projects:

  • curl_cffi - Python binding for curl-impersonate via cffi. A http client that can impersonate browser tls/ja3/http2 fingerprints.
  • curl-impersonate - A special build of curl that can impersonate Chrome & Firefox
  • scrapy-playwright - Playwright integration for Scrapy

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scrapy_impersonate-1.6.4.tar.gz (6.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

scrapy_impersonate-1.6.4-py3-none-any.whl (7.3 kB view details)

Uploaded Python 3

File details

Details for the file scrapy_impersonate-1.6.4.tar.gz.

File metadata

  • Download URL: scrapy_impersonate-1.6.4.tar.gz
  • Upload date:
  • Size: 6.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.5

File hashes

Hashes for scrapy_impersonate-1.6.4.tar.gz
Algorithm Hash digest
SHA256 b7ed9082fb4c51e083765bd48dc98ee93720f9e099c9782f10975f1d3ce78b5a
MD5 db066bb939e50b77a1408a2547845faa
BLAKE2b-256 e9250181cf949728ddff50c24aee45dde5a21a4ca1f059d2639ec92bfc03ed0c

See more details on using hashes here.

File details

Details for the file scrapy_impersonate-1.6.4-py3-none-any.whl.

File metadata

File hashes

Hashes for scrapy_impersonate-1.6.4-py3-none-any.whl
Algorithm Hash digest
SHA256 325181cf2ccc99e778d0f1f7301d0ad4c72f11d9612cb727c62aae2845b60156
MD5 9645a46eabd9827fe75a3a4a94f7e6d2
BLAKE2b-256 da27b41e1acbd10a162882e5dddff4363a3815714f5125150110fd23d1cf606a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page