tls client downloader middleware for scrapy, send request by tls client.
Project description
Scrapy Tls Client Downloader Middleware
This package will make scrapy support tls_client. Everything is same with tls_client, but needed to specify in settings.py.
Installation
pip3 install scrapy-tls-client
you also need to enable TlsClientDownloaderMiddleware in DOWNLOADER_MIDDLEWARES:
DOWNLOADER_MIDDLEWARES = {
'scrapy_tls_client.downloaderMiddleware.TlsClientDownloaderMiddleware': 543,
}
Be Attention, you must specify User-Agent, Otherwise all request gonna be blocked by Cloudflare if there is detection,
and compression error may occured. For request with headers, just specify headers is ok,
for the one don't need, close default User-Agent middleware.
DOWNLOADER_MIDDLEWARES = {
'scrapy_tls_client.downloaderMiddleware.TlsClientDownloaderMiddleware': 543,
"scrapy.downloadermiddlewares.useragent.UserAgentMiddleware": None,
}
Also, if there is any compression error, you can choose to shut down the default HttpCompressionMiddleware.
DOWNLOADER_MIDDLEWARES = {
'scrapy_tls_client.downloaderMiddleware.TlsClientDownloaderMiddleware': 543,
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware': None
}
Usage
After add this middleware, all requests will be sent by tls_client.
The usage is very simple, for tls client session, just add params in settings.py in scrapy project, for request, specify params in meta.
PLEASE NOTE YOU DO NOT NEED TO SPECIFY ALL PARAMS SHOWS BELOW, JUST SPECIFY REQUIRED.
Settings for Tls_Client Session
For the preset usage of tls_client:
CLIENT_IDENTIFIER = 'chrome_112'
RANDOM_TLS_EXTENSION_ORDER = True
FORCE_HTTP1 = False #default False
CATCH_PANICS = False #default False
RAW_RESPONSE_TYPE = 'HtmlResponse' #HtmlResponse or TextResponse, default HtmlResponse
or
RANDOM_CHROME_IDENTIFIER = True
RANDOM_TLS_EXTENSION_ORDER = True
FORCE_HTTP1 = False #default False
CATCH_PANICS = False #default False
RAW_RESPONSE_TYPE = 'HtmlResponse' #HtmlResponse or TextResponse, default HtmlResponse
RANDOM_APP_IDENTIFIER = True
RANDOM_TLS_EXTENSION_ORDER = True
FORCE_HTTP1 = False #default False
CATCH_PANICS = False #default False
RAW_RESPONSE_TYPE = 'HtmlResponse' #HtmlResponse or TextResponse, default HtmlResponse
For the custom usage:
JA3_STRING = '771,4865-4866-4867-49195-49199-49196-49200-52393-52392-49171-49172-156-157-47-53,0-23-65281-10-11-35-16-5-13-18-51-45-43-27-17513,29-23-24,0'
H2_SETTINGS = {
"HEADER_TABLE_SIZE": 65536,
"MAX_CONCURRENT_STREAMS": 1000,
"INITIAL_WINDOW_SIZE": 6291456,
"MAX_HEADER_LIST_SIZE": 262144
}
H2_SETTINGS_ORDER = [
"HEADER_TABLE_SIZE",
"MAX_CONCURRENT_STREAMS",
"INITIAL_WINDOW_SIZE",
"MAX_HEADER_LIST_SIZE"
]
SUPPORTED_SIGNATURE_ALGORITHMS = [
"ECDSAWithP256AndSHA256",
"PSSWithSHA256",
"PKCS1WithSHA256",
"ECDSAWithP384AndSHA384",
"PSSWithSHA384",
"PKCS1WithSHA384",
"PSSWithSHA512",
"PKCS1WithSHA512",
]
SUPPORTED_DELEGATED_CREDENTIALS_ALGORITHMS = [
"ECDSAWithP256AndSHA256",
"PSSWithSHA256",
"PKCS1WithSHA256",
"ECDSAWithP384AndSHA384",
"PSSWithSHA384",
"PKCS1WithSHA384",
"PSSWithSHA512",
"PKCS1WithSHA512",
]
SUPPORTED_VERSIONS = [
"GREASE",
"1.3",
"1.2"
]
KEY_SHARE_CURVES = [
"GREASE",
"X25519"
]
CERT_COMPRESSION_ALGO = 'brotli'
ADDITIONAL_DECODE = 'gzip'
PSEUDO_HEADER_ORDER = [
":method",
":authority",
":scheme",
":path"
]
CONNECTION_FLOW = 15663105
PRIORITY_FRAMES = [
{
"streamID": 3,
"priorityParam": {
"weight": 201,
"streamDep": 0,
"exclusive": False
}
},
{
"streamID": 5,
"priorityParam": {
"weight": 101,
"streamDep": False,
"exclusive": 0
}
}
]
HEADER_ORDER = [
"accept",
"user-agent",
"accept-encoding",
"accept-language"
]
HEADER_PRIORITY = {
"streamDep": 1,
"exclusive": True,
"weight": 1
}
FORCE_HTTP1 = False #default False
CATCH_PANICS = False #default False
RAW_RESPONSE_TYPE = 'HtmlResponse' #HtmlResponse or TextResponse, default HtmlResponse
Settings for Request
params = {
'key1': 'value1',
'key2': 'value2',
}
data = {
'key1': 'value1',
'key2': 'value2',
}
# turn cookie jar into dict, and remove the " mark, use ' mark
cookies = {
'key1': 'value1',
'key2': 'value2',
}
payload = {
'key1': 'value1',
'key2': 'value2'
}
proxy = 'http://username:password@ip:port' # https also works
or
proxy = [
'http://username:password@ip:port',
'http://username:password@ip:port',
] # if the type of proxy is list, every request will get a random proxy in the list
meta_data = {
'params': params,
'data': data,
'cookies': cookies,
'json': payload,
'allow_redirects': False,
'insecure_skip_verify': False,
'timeout_seconds': 10,
'proxy': proxy
}
yield scrapy.Request(url=url, headers=headers, meta=meta_data)
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file scrapy-tls-client-0.0.4.tar.gz.
File metadata
- Download URL: scrapy-tls-client-0.0.4.tar.gz
- Upload date:
- Size: 10.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.16
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b902b682c9d7725b2ce21b39515615f74fc17dfd2f698f0aad7be9c712b41e9a
|
|
| MD5 |
c754288501dd82ae3ebc19789c8e808b
|
|
| BLAKE2b-256 |
c634c7759e73dbf17fdce3e1e76b4667173c06ef9d57977173278605782ef2ab
|
File details
Details for the file scrapy_tls_client-0.0.4-py3-none-any.whl.
File metadata
- Download URL: scrapy_tls_client-0.0.4-py3-none-any.whl
- Upload date:
- Size: 9.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.16
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a6ea29a4c57eb9e295ccef9be048a22c936e24a060aabf25904f81f5b6964a42
|
|
| MD5 |
8e17a6f010841fc02fcf0825ac66af1f
|
|
| BLAKE2b-256 |
c5bb4d480ad32839ede2e89637fb50bdece561b1b1e71f8c9d09f87b86d302bc
|