Scrapy download handler that can impersonate browser fingerprints
Project description
scrapy-impersonate
scrapy-impersonate
is a Scrapy download handler. This project integrates curl_cffi to perform HTTP requests, so it can impersonate browsers' TLS signatures or JA3 fingerprints.
Installation
pip install scrapy-impersonate
Activation
Replace the default http
and/or https
Download Handlers through DOWNLOAD_HANDLERS
DOWNLOAD_HANDLERS = {
"http": "scrapy_impersonate.ImpersonateDownloadHandler",
"https": "scrapy_impersonate.ImpersonateDownloadHandler",
}
Also, be sure to install the asyncio-based Twisted reactor:
TWISTED_REACTOR = "twisted.internet.asyncioreactor.AsyncioSelectorReactor"
Basic usage
Set the impersonate
Request.meta key to download a request using curl_cffi
:
import scrapy
class ImpersonateSpider(scrapy.Spider):
name = "impersonate_spider"
custom_settings = {
"DOWNLOAD_HANDLERS": {
"http": "scrapy_impersonate.ImpersonateDownloadHandler",
"https": "scrapy_impersonate.ImpersonateDownloadHandler",
},
"TWISTED_REACTOR": "twisted.internet.asyncioreactor.AsyncioSelectorReactor",
}
def start_requests(self):
for browser in ["chrome110", "edge99", "safari15_5"]:
yield scrapy.Request(
"https://tls.browserleaks.com/json",
dont_filter=True,
meta={"impersonate": browser},
)
def parse(self, response):
# ja3_hash: 773906b0efdefa24a7f2b8eb6985bf37
# ja3_hash: cd08e31494f9531f560d64c695473da9
# ja3_hash: 2fe1311860bc318fc7f9196556a2a6b9
yield {"ja3_hash": response.json()["ja3_hash"]}
Supported browsers
The following browsers can be impersonated
Browser | Version | Build | OS | Name |
---|---|---|---|---|
99 | 99.0.4844.51 | Windows 10 | chrome99 |
|
99 | 99.0.4844.73 | Android 12 | chrome99_android |
|
100 | 100.0.4896.75 | Windows 10 | chrome100 |
|
101 | 101.0.4951.67 | Windows 10 | chrome101 |
|
104 | 104.0.5112.81 | Windows 10 | chrome104 |
|
107 | 107.0.5304.107 | Windows 10 | chrome107 |
|
110 | 110.0.5481.177 | Windows 10 | chrome110 |
|
116 | 116.0.5845.180 | Windows 10 | chrome116 |
|
119 | 119.0.6045.199 | macOS Sonoma | chrome119 |
|
120 | 120.0.6099.109 | macOS Sonoma | chrome120 |
|
123 | 123.0.6312.124 | macOS Sonoma | chrome123 |
|
124 | 124.0.6367.60 | macOS Sonoma | chrome124 |
|
99 | 99.0.1150.30 | Windows 10 | edge99 |
|
101 | 101.0.1210.47 | Windows 10 | edge101 |
|
15.3 | 16612.4.9.1.8 | MacOS Big Sur | safari15_3 |
|
15.5 | 17613.2.7.1.8 | MacOS Monterey | safari15_5 |
|
17.0 | unclear | MacOS Sonoma | safari17_0 |
|
17.2 | unclear | iOS 17.2 | safari17_2_ios |
Thanks
This project is inspired by the following projects:
- curl_cffi - Python binding for curl-impersonate via cffi. A http client that can impersonate browser tls/ja3/http2 fingerprints.
- curl-impersonate - A special build of curl that can impersonate Chrome & Firefox
- scrapy-playwright - Playwright integration for Scrapy
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Close
Hashes for scrapy_impersonate-1.3.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | cd2d36ed7b3ac04730c3a895c54065b828a142020228ebeeb3b04496a5e4c3c2 |
|
MD5 | 9c239d4da3c4964a061388960892e8be |
|
BLAKE2b-256 | 2c3fa075bded270e4a7ea7dd8fca37c8b2de1ae4c463c238165494c35b12c427 |