Skip to main content

tls client downloader middleware for scrapy, send request by tls client.

Project description

Scrapy Tls Client Downloader Middleware

This package will make scrapy support tls_client. Everything is same with tls_client, but needed to specify in settings.py.

Installation

pip3 install scrapy-tls-client

you also need to enable TlsClientDownloaderMiddleware in DOWNLOADER_MIDDLEWARES:

DOWNLOADER_MIDDLEWARES = {
    'scrapy_tls_client.downloaderMiddleware.TlsClientDownloaderMiddleware': 543,
}

Be Attention, you must specify User-Agent, Otherwise all request gonna be blocked by Cloudflare if there is detection,

and compression error may occured. For request with headers, just specify headers is ok,

for the one don't need, close default User-Agent middleware.

DOWNLOADER_MIDDLEWARES = {
    'scrapy_tls_client.downloaderMiddleware.TlsClientDownloaderMiddleware': 543,
    "scrapy.downloadermiddlewares.useragent.UserAgentMiddleware": None,
}

Also, if there is any compression error, you can choose to shut down the default HttpCompressionMiddleware.

DOWNLOADER_MIDDLEWARES = {
    'scrapy_tls_client.downloaderMiddleware.TlsClientDownloaderMiddleware': 543,
    'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware': None
}

Usage

After add this middleware, all requests will be sent by tls_client.

The usage is very simple, for tls client session, just add params in settings.py in scrapy project, for request, specify params in meta.

PLEASE NOTE YOU DO NOT NEED TO SPECIFY ALL PARAMS SHOWS BELOW, JUST SPECIFY REQUIRED.

Settings for Tls_Client Session

For the preset usage of tls_client:

CLIENT_IDENTIFIER = 'chrome_112'
RANDOM_TLS_EXTENSION_ORDER = True
FORCE_HTTP1 = False #default False
CATCH_PANICS = False #default False
RAW_RESPONSE_TYPE = 'HtmlResponse' #HtmlResponse or TextResponse, default HtmlResponse

or

RANDOM_CHROME_IDENTIFIER = True
RANDOM_TLS_EXTENSION_ORDER = True
FORCE_HTTP1 = False #default False
CATCH_PANICS = False #default False
RAW_RESPONSE_TYPE = 'HtmlResponse' #HtmlResponse or TextResponse, default HtmlResponse
RANDOM_APP_IDENTIFIER = True
RANDOM_TLS_EXTENSION_ORDER = True
FORCE_HTTP1 = False #default False
CATCH_PANICS = False #default False
RAW_RESPONSE_TYPE = 'HtmlResponse' #HtmlResponse or TextResponse, default HtmlResponse

For the custom usage:

JA3_STRING = '771,4865-4866-4867-49195-49199-49196-49200-52393-52392-49171-49172-156-157-47-53,0-23-65281-10-11-35-16-5-13-18-51-45-43-27-17513,29-23-24,0'
H2_SETTINGS = {
    "HEADER_TABLE_SIZE": 65536,
    "MAX_CONCURRENT_STREAMS": 1000,
    "INITIAL_WINDOW_SIZE": 6291456,
    "MAX_HEADER_LIST_SIZE": 262144
}
H2_SETTINGS_ORDER = [
    "HEADER_TABLE_SIZE",
    "MAX_CONCURRENT_STREAMS",
    "INITIAL_WINDOW_SIZE",
    "MAX_HEADER_LIST_SIZE"
]
SUPPORTED_SIGNATURE_ALGORITHMS = [
    "ECDSAWithP256AndSHA256",
    "PSSWithSHA256",
    "PKCS1WithSHA256",
    "ECDSAWithP384AndSHA384",
    "PSSWithSHA384",
    "PKCS1WithSHA384",
    "PSSWithSHA512",
    "PKCS1WithSHA512",
]
SUPPORTED_DELEGATED_CREDENTIALS_ALGORITHMS = [
    "ECDSAWithP256AndSHA256",
    "PSSWithSHA256",
    "PKCS1WithSHA256",
    "ECDSAWithP384AndSHA384",
    "PSSWithSHA384",
    "PKCS1WithSHA384",
    "PSSWithSHA512",
    "PKCS1WithSHA512",
]
SUPPORTED_VERSIONS = [
    "GREASE",
    "1.3",
    "1.2"
]
KEY_SHARE_CURVES = [
    "GREASE",
    "X25519"
]
CERT_COMPRESSION_ALGO = 'brotli'
ADDITIONAL_DECODE = 'gzip'
PSEUDO_HEADER_ORDER = [
    ":method",
    ":authority",
    ":scheme",
    ":path"
]
CONNECTION_FLOW = 15663105
PRIORITY_FRAMES = [
  {
    "streamID": 3,
    "priorityParam": {
      "weight": 201,
      "streamDep": 0,
      "exclusive": False
    }
  },
  {
    "streamID": 5,
    "priorityParam": {
      "weight": 101,
      "streamDep": False,
      "exclusive": 0
    }
  }
]
HEADER_ORDER = [
        "accept",
        "user-agent",
        "accept-encoding",
        "accept-language"
    ]
HEADER_PRIORITY = {
  "streamDep": 1,
  "exclusive": True,
  "weight": 1
}
FORCE_HTTP1 = False #default False
CATCH_PANICS = False #default False
RAW_RESPONSE_TYPE = 'HtmlResponse' #HtmlResponse or TextResponse, default HtmlResponse

Settings for Request

import json

params = {
    'key1': 'value1',
    'key2': 'value2',
}
data = {
    'key1': 'value1',
    'key2': 'value2',
}
# turn cookie jar into dict, and remove the " mark, use ' mark
cookies = {
    'key1': 'value1',
    'key2': 'value2',
}
payload = {
    'key1': 'value1',
    'key2': 'value2'
}
proxy_ = 'http://username:password@ip:port' # https also works
or 
proxy_ = [
    'http://username:password@ip:port',
    'http://username:password@ip:port',
] # if the type of proxy is list, every request will get a random proxy in the list
meta_data = {
    'params': params,
    'data': data,
    'cookies': cookies,
    'json': payload,
    'allow_redirects': False,
    'insecure_skip_verify': False,
    'timeout_seconds': 10,
    'proxy_': proxy_
}
or 
meta_data = {
    'params': json.dumps(params),
    'data': json.dumps(data),
    'cookies': json.dumps(cookies),
    'json': json.dumps(payload),
    'allow_redirects': False,
    'insecure_skip_verify': False,
    'timeout_seconds': 10,
    'proxy_': json.dumps(proxy_)
}
yield scrapy.Request(url=url, headers=headers, meta=meta_data)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scrapy-tls-client-0.0.5.tar.gz (10.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

scrapy_tls_client-0.0.5-py3-none-any.whl (9.5 kB view details)

Uploaded Python 3

File details

Details for the file scrapy-tls-client-0.0.5.tar.gz.

File metadata

  • Download URL: scrapy-tls-client-0.0.5.tar.gz
  • Upload date:
  • Size: 10.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.16

File hashes

Hashes for scrapy-tls-client-0.0.5.tar.gz
Algorithm Hash digest
SHA256 4749233852959a6deef176b13d0c07f0f8adee80fee02002dcd5b9f497d39e2d
MD5 35d90333d1051782baa5765caf21a1f7
BLAKE2b-256 c427ee3598d151b7238b6f33fddaf421fb249d235cc68cc2c2a331c5f89b3a0c

See more details on using hashes here.

File details

Details for the file scrapy_tls_client-0.0.5-py3-none-any.whl.

File metadata

File hashes

Hashes for scrapy_tls_client-0.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 3f6d621adb7c8d068f7d8f2ccae16cd1e05dab49f363ea30cf78bc4cd0e6c0d7
MD5 f83621e2ed1c9b4c532ae4558b27187b
BLAKE2b-256 1f2d1c32f80fc9bc1779c0a9885ec9f3e755aa8402b5dae6af2da0cc4f57c7dd

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page