Scrapy download handler that can impersonate browser fingerprints
Project description
scrapy-impersonate
scrapy-impersonate
is a Scrapy download handler. This project integrates curl_cffi to perform HTTP requests, so it can impersonate browsers' TLS signatures or JA3 fingerprints.
Installation
pip install scrapy-impersonate
Activation
Replace the default http
and/or https
Download Handlers through DOWNLOAD_HANDLERS
DOWNLOAD_HANDLERS = {
"http": "scrapy_impersonate.ImpersonateDownloadHandler",
"https": "scrapy_impersonate.ImpersonateDownloadHandler",
}
Also, be sure to install the asyncio-based Twisted reactor:
TWISTED_REACTOR = "twisted.internet.asyncioreactor.AsyncioSelectorReactor"
Basic usage
Set the impersonate
Request.meta key to download a request using curl_cffi
:
import scrapy
class ImpersonateSpider(scrapy.Spider):
name = "impersonate_spider"
custom_settings = {
"DOWNLOAD_HANDLERS": {
"http": "scrapy_impersonate.ImpersonateDownloadHandler",
"https": "scrapy_impersonate.ImpersonateDownloadHandler",
},
"TWISTED_REACTOR": "twisted.internet.asyncioreactor.AsyncioSelectorReactor",
}
def start_requests(self):
for browser in ["chrome110", "edge99", "safari15_5"]:
yield scrapy.Request(
"https://tls.browserleaks.com/json",
dont_filter=True,
meta={"impersonate": browser},
)
def parse(self, response):
# ja3_hash: 773906b0efdefa24a7f2b8eb6985bf37
# ja3_hash: cd08e31494f9531f560d64c695473da9
# ja3_hash: 2fe1311860bc318fc7f9196556a2a6b9
yield {"ja3_hash": response.json()["ja3_hash"]}
Supported browsers
The following browsers can be impersonated
Browser | Version | Build | OS | Name |
---|---|---|---|---|
99 | 99.0.4844.51 | Windows 10 | chrome99 |
|
99 | 99.0.4844.73 | Android 12 | chrome99_android |
|
100 | 100.0.4896.75 | Windows 10 | chrome100 |
|
101 | 101.0.4951.67 | Windows 10 | chrome101 |
|
104 | 104.0.5112.81 | Windows 10 | chrome104 |
|
107 | 107.0.5304.107 | Windows 10 | chrome107 |
|
110 | 110.0.5481.177 | Windows 10 | chrome110 |
|
116 | 116.0.5845.180 | Windows 10 | chrome116 |
|
119 | 119.0.6045.199 | macOS Sonoma | chrome119 |
|
120 | 120.0.6099.109 | macOS Sonoma | chrome120 |
|
123 | 123.0.6312.124 | macOS Sonoma | chrome123 |
|
124 | 124.0.6367.60 | macOS Sonoma | chrome124 |
|
99 | 99.0.1150.30 | Windows 10 | edge99 |
|
101 | 101.0.1210.47 | Windows 10 | edge101 |
|
15.3 | 16612.4.9.1.8 | MacOS Big Sur | safari15_3 |
|
15.5 | 17613.2.7.1.8 | MacOS Monterey | safari15_5 |
|
17.0 | unclear | MacOS Sonoma | safari17_0 |
|
17.2 | unclear | iOS 17.2 | safari17_2_ios |
Thanks
This project is inspired by the following projects:
- curl_cffi - Python binding for curl-impersonate via cffi. A http client that can impersonate browser tls/ja3/http2 fingerprints.
- curl-impersonate - A special build of curl that can impersonate Chrome & Firefox
- scrapy-playwright - Playwright integration for Scrapy
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file scrapy_impersonate-1.4.1.tar.gz
.
File metadata
- Download URL: scrapy_impersonate-1.4.1.tar.gz
- Upload date:
- Size: 5.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 095068eb65ce1e6fdd4cb2e939b49f2f35504a695a5d4ed12bd23d11dd6f7f35 |
|
MD5 | 00e9a2ad6bdf63b3bc3ff25aa544c440 |
|
BLAKE2b-256 | 58648602b0da738bd9c176f6b0f493f0498fa82cc63120b8e07233f9e527f795 |
File details
Details for the file scrapy_impersonate-1.4.1-py3-none-any.whl
.
File metadata
- Download URL: scrapy_impersonate-1.4.1-py3-none-any.whl
- Upload date:
- Size: 5.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ae3ed0d7bf474014b2b8a96876455fa5e669c5c1878539f9d29d8692bad8dedd |
|
MD5 | 4271dc983b12dc8ded2869ca7ed2b29d |
|
BLAKE2b-256 | 4967e52acbdf5f1367097aac54b1f39726a68c873f8cad387660c37fc58bb685 |