Scrapy integration with curl_cffi (curl-impersonate).
Project description
scrapy-curl-cffi
Scrapy integration with curl_cffi (curl-impersonate).
Installation
pip install scrapy-curl-cffi
Another option, to enable Scrapy's support for modern HTTP compression protocols:
pip install scrapy-curl-cffi[compression]
Configuration
Update your Scrapy project settings as follows:
"DOWNLOAD_HANDLERS": {
"http": "scrapy_curl_cffi.handler.CurlCffiDownloadHandler",
"https": "scrapy_curl_cffi.handler.CurlCffiDownloadHandler",
}
"DOWNLOADER_MIDDLEWARES": {
"scrapy_curl_cffi.middlewares.CurlCffiMiddleware": 200,
"scrapy_curl_cffi.middlewares.DefaultHeadersMiddleware": 400,
"scrapy_curl_cffi.middlewares.UserAgentMiddleware": 500,
"scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware": None,
"scrapy.downloadermiddlewares.useragent.UserAgentMiddleware": None,
}
"TWISTED_REACTOR": "twisted.internet.asyncioreactor.AsyncioSelectorReactor"
Usage
To download a scrapy.Request with curl_cffi, add the
curl_cffi_options special key to the Request.meta attribute. The value
should be a dict with any of the following options:
impersonate(str) - which browser version to impersonatedefault_headers(bool) - whether to set default browser headers when impersonating (default:True)ja3(str) - ja3 string to impersonateakamai(str) - akamai string to impersonateextra_fp(str) - extra fingerprints options, in complement to ja3 and akamai stringsverify(bool) - whether to verify https certs (default:False)
See the curl_cffi documentation for more info on these options.
Alternatively, you can use the curl_cffi_options spider attribute or the
CURL_CFFI_OPTIONS setting to automatically assign the curl_cffi_options meta
for all requests.
Example spider
class FingerprintsSpider(scrapy.Spider):
name = "fingerprints"
start_urls = ["https://tls.browserleaks.com/json"]
curl_cffi_options = {"impersonate": "chrome"}
def parse(self, response):
yield response.json()
Similar projects
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file scrapy_curl_cffi-0.1.0.tar.gz.
File metadata
- Download URL: scrapy_curl_cffi-0.1.0.tar.gz
- Upload date:
- Size: 5.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.7.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c3ea95d26119947b4783f7abffe7c163d6f867920420c602acc8a87493269728
|
|
| MD5 |
08eba693a0547fa2bdbc9630032a8566
|
|
| BLAKE2b-256 |
883f6351753814abc97f9b0761ea5607e73d05f57af9dbea620c555557ef6391
|
File details
Details for the file scrapy_curl_cffi-0.1.0-py3-none-any.whl.
File metadata
- Download URL: scrapy_curl_cffi-0.1.0-py3-none-any.whl
- Upload date:
- Size: 5.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.7.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5d0d54257bf86de0a7603095859bb405c7cc80c8be21dba25295ead8a02e312c
|
|
| MD5 |
6c5126e77ab21064b487572c3bc6ea74
|
|
| BLAKE2b-256 |
cd67b4dabbea45f0064881037293e401e5fa2926d5963a7040bc8c9807e990bb
|