Skip to main content

Scrapy download handler that can impersonate browser fingerprints

Project description

scrapy-impersonate

version

scrapy-impersonate is a Scrapy download handler. This project integrates curl_cffi to perform HTTP requests, so it can impersonate browsers' TLS signatures or JA3 fingerprints.

Installation

pip install scrapy-impersonate

Activation

Replace the default http and/or https Download Handlers through DOWNLOAD_HANDLERS

DOWNLOAD_HANDLERS = {
    "http": "scrapy_impersonate.ImpersonateDownloadHandler",
    "https": "scrapy_impersonate.ImpersonateDownloadHandler",
}

Also, be sure to install the asyncio-based Twisted reactor:

TWISTED_REACTOR = "twisted.internet.asyncioreactor.AsyncioSelectorReactor"

Basic usage

Set the impersonate Request.meta key to download a request using curl_cffi:

import scrapy


class ImpersonateSpider(scrapy.Spider):
    name = "impersonate_spider"
    custom_settings = {
        "DOWNLOAD_HANDLERS": {
            "http": "scrapy_impersonate.ImpersonateDownloadHandler",
            "https": "scrapy_impersonate.ImpersonateDownloadHandler",
        },
        "TWISTED_REACTOR": "twisted.internet.asyncioreactor.AsyncioSelectorReactor",
    }

    def start_requests(self):
        for browser in ["chrome110", "edge99", "safari15_5"]:
            yield scrapy.Request(
                "https://tls.browserleaks.com/json",
                dont_filter=True,
                meta={"impersonate": browser},
            )

    def parse(self, response):
        # ja3_hash: 773906b0efdefa24a7f2b8eb6985bf37
        # ja3_hash: cd08e31494f9531f560d64c695473da9
        # ja3_hash: 2fe1311860bc318fc7f9196556a2a6b9
        yield {"ja3_hash": response.json()["ja3_hash"]}

Supported browsers

The following browsers can be impersonated

Browser Version Build OS Name
Chrome 99 99.0.4844.51 Windows 10 chrome99
Chrome 99 99.0.4844.73 Android 12 chrome99_android
Chrome 100 100.0.4896.75 Windows 10 chrome100
Chrome 101 101.0.4951.67 Windows 10 chrome101
Chrome 104 104.0.5112.81 Windows 10 chrome104
Chrome 107 107.0.5304.107 Windows 10 chrome107
Chrome 110 110.0.5481.177 Windows 10 chrome110
Chrome 116 116.0.5845.180 Windows 10 chrome116
Chrome 119 119.0.6045.199 macOS Sonoma chrome119
Chrome 120 120.0.6099.109 macOS Sonoma chrome120
Chrome 123 123.0.6312.124 macOS Sonoma chrome123
Chrome 124 124.0.6367.60 macOS Sonoma chrome124
Edge 99 99.0.1150.30 Windows 10 edge99
Edge 101 101.0.1210.47 Windows 10 edge101
Safari 15.3 16612.4.9.1.8 MacOS Big Sur safari15_3
Safari 15.5 17613.2.7.1.8 MacOS Monterey safari15_5
Safari 17.0 unclear MacOS Sonoma safari17_0
Safari 17.2 unclear iOS 17.2 safari17_2_ios

Thanks

This project is inspired by the following projects:

  • curl_cffi - Python binding for curl-impersonate via cffi. A http client that can impersonate browser tls/ja3/http2 fingerprints.
  • curl-impersonate - A special build of curl that can impersonate Chrome & Firefox
  • scrapy-playwright - Playwright integration for Scrapy

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scrapy_impersonate-1.4.1.tar.gz (5.2 kB view details)

Uploaded Source

Built Distribution

scrapy_impersonate-1.4.1-py3-none-any.whl (5.8 kB view details)

Uploaded Python 3

File details

Details for the file scrapy_impersonate-1.4.1.tar.gz.

File metadata

  • Download URL: scrapy_impersonate-1.4.1.tar.gz
  • Upload date:
  • Size: 5.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.4

File hashes

Hashes for scrapy_impersonate-1.4.1.tar.gz
Algorithm Hash digest
SHA256 095068eb65ce1e6fdd4cb2e939b49f2f35504a695a5d4ed12bd23d11dd6f7f35
MD5 00e9a2ad6bdf63b3bc3ff25aa544c440
BLAKE2b-256 58648602b0da738bd9c176f6b0f493f0498fa82cc63120b8e07233f9e527f795

See more details on using hashes here.

File details

Details for the file scrapy_impersonate-1.4.1-py3-none-any.whl.

File metadata

File hashes

Hashes for scrapy_impersonate-1.4.1-py3-none-any.whl
Algorithm Hash digest
SHA256 ae3ed0d7bf474014b2b8a96876455fa5e669c5c1878539f9d29d8692bad8dedd
MD5 4271dc983b12dc8ded2869ca7ed2b29d
BLAKE2b-256 4967e52acbdf5f1367097aac54b1f39726a68c873f8cad387660c37fc58bb685

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page