Scrapy download handler that can impersonate browser fingerprints
Project description
scrapy-impersonate
scrapy-impersonate is a Scrapy download handler. This project integrates curl_cffi to perform HTTP requests, so it can impersonate browsers' TLS signatures or JA3 fingerprints.
Installation
pip install scrapy-impersonate
Activation
To use this package, replace the default http and https Download Handlers by updating the DOWNLOAD_HANDLERS setting:
DOWNLOAD_HANDLERS = {
"http": "scrapy_impersonate.ImpersonateDownloadHandler",
"https": "scrapy_impersonate.ImpersonateDownloadHandler",
}
By setting USER_AGENT = None, curl_cffi will automatically choose the appropriate User-Agent based on the impersonated browser:
USER_AGENT = ""
Also, be sure to install the asyncio-based Twisted reactor for proper asynchronous execution:
TWISTED_REACTOR = "twisted.internet.asyncioreactor.AsyncioSelectorReactor"
Usage
Set the impersonate Request.meta key to download a request using curl_cffi:
import scrapy
class ImpersonateSpider(scrapy.Spider):
name = "impersonate_spider"
custom_settings = {
"TWISTED_REACTOR": "twisted.internet.asyncioreactor.AsyncioSelectorReactor",
"USER_AGENT": "",
"DOWNLOAD_HANDLERS": {
"http": "scrapy_impersonate.ImpersonateDownloadHandler",
"https": "scrapy_impersonate.ImpersonateDownloadHandler",
},
"DOWNLOADER_MIDDLEWARES": {
"scrapy_impersonate.RandomBrowserMiddleware": 1000,
},
}
def start_requests(self):
for _ in range(5):
yield scrapy.Request(
"https://tls.browserleaks.com/json",
dont_filter=True,
)
def parse(self, response):
# ja3_hash: 98cc085d47985d3cca9ec1415bbbf0d1 (chrome133a)
# ja3_hash: 2d692a4485ca2f5f2b10ecb2d2909ad3 (firefox133)
# ja3_hash: c11ab92a9db8107e2a0b0486f35b80b9 (chrome124)
# ja3_hash: 773906b0efdefa24a7f2b8eb6985bf37 (safari15_5)
# ja3_hash: cd08e31494f9531f560d64c695473da9 (chrome99_android)
yield {"ja3_hash": response.json()["ja3_hash"]}
impersonate-args
You can pass any necessary arguments to curl_cffi through impersonate_args. For example:
yield scrapy.Request(
"https://tls.browserleaks.com/json",
dont_filter=True,
meta={
"impersonate": browser,
"impersonate_args": {
"verify": False,
"timeout": 10,
},
},
)
Supported browsers
The following browsers can be impersonated
| Browser | Version | Build | OS | Name |
|---|---|---|---|---|
| 99 | 99.0.4844.51 | Windows 10 | chrome99 |
|
| 99 | 99.0.4844.73 | Android 12 | chrome99_android |
|
| 100 | 100.0.4896.75 | Windows 10 | chrome100 |
|
| 101 | 101.0.4951.67 | Windows 10 | chrome101 |
|
| 104 | 104.0.5112.81 | Windows 10 | chrome104 |
|
| 107 | 107.0.5304.107 | Windows 10 | chrome107 |
|
| 110 | 110.0.5481.177 | Windows 10 | chrome110 |
|
| 116 | 116.0.5845.180 | Windows 10 | chrome116 |
|
| 119 | 119.0.6045.199 | macOS Sonoma | chrome119 |
|
| 120 | 120.0.6099.109 | macOS Sonoma | chrome120 |
|
| 123 | 123.0.6312.124 | macOS Sonoma | chrome123 |
|
| 124 | 124.0.6367.60 | macOS Sonoma | chrome124 |
|
| 131 | 131.0.6778.86 | macOS Sonoma | chrome131 |
|
| 131 | 131.0.6778.81 | Android 14 | chrome131_android |
|
| 133 | 133.0.6943.55 | macOS Sequoia | chrome133a |
|
| 99 | 99.0.1150.30 | Windows 10 | edge99 |
|
| 101 | 101.0.1210.47 | Windows 10 | edge101 |
|
| 15.3 | 16612.4.9.1.8 | MacOS Big Sur | safari15_3 |
|
| 15.5 | 17613.2.7.1.8 | MacOS Monterey | safari15_5 |
|
| 17.0 | unclear | MacOS Sonoma | safari17_0 |
|
| 17.2 | unclear | iOS 17.2 | safari17_2_ios |
|
| 18.0 | unclear | MacOS Sequoia | safari18_0 |
|
| 18.0 | unclear | iOS 18.0 | safari18_0_ios |
|
| 133.0 | 133.0.3 | macOS Sonoma | firefox133 |
|
| 135.0 | 135.0.1 | macOS Sonoma | firefox135 |
Thanks
This project is inspired by the following projects:
- curl_cffi - Python binding for curl-impersonate via cffi. A http client that can impersonate browser tls/ja3/http2 fingerprints.
- curl-impersonate - A special build of curl that can impersonate Chrome & Firefox
- scrapy-playwright - Playwright integration for Scrapy
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file scrapy_impersonate-1.6.3.tar.gz.
File metadata
- Download URL: scrapy_impersonate-1.6.3.tar.gz
- Upload date:
- Size: 6.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bf20edcb88b73b759d78bf0047b8acc91513bd44a8490cd600c5fa934fd0aa3c
|
|
| MD5 |
5ec4bef72ec895719ca967799784080e
|
|
| BLAKE2b-256 |
043ffa97b40cec0c601255bcdffc73a1916a70dd01bfc03620444861d779e529
|
File details
Details for the file scrapy_impersonate-1.6.3-py3-none-any.whl.
File metadata
- Download URL: scrapy_impersonate-1.6.3-py3-none-any.whl
- Upload date:
- Size: 7.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8f07cfa3d54332225c02f4c17e4af271d04ad3b1d547ca11db39bebf109239d1
|
|
| MD5 |
1b2f0335950ccc7cef6d7630f6e4a126
|
|
| BLAKE2b-256 |
31e20f8f02654dca836aa0901c634196e9d1df8978a0261448c34f7fd28f2383
|