Skip to main content

tls client downloader middleware for scrapy, send request by tls client.

Project description

Scrapy Tls Client Downloader Middleware

This package will make scrapy support tls_client. Everything is same with tls_client, but needed to specify in settings.py.

Installation

pip3 install scrapy-tls-client

Usage

After add this middleware, all requests will be sent by tls_client.

The usage is very simple, for tls client session, just add params in settings.py in scrapy project, for request, specify params in meta.

PLEASE NOTE YOU DO NOT NEED TO SPECIFY ALL PARAMS SHOWS BELOW, JUST SPECIFY REQUIRED.

Settings for Tls_Client Session

For the preset usage of tls_client:

CLIENT_IDENTIFIER = 'chrome_112'
RANDOM_TLS_EXTENSION_ORDER = True
FORCE_HTTP1 = False #default False
CATCH_PANICS = False #default False
RAW_RESPONSE_TYPE = 'HtmlResponse' #HtmlResponse or TextResponse, default HtmlResponse

or

RANDOM_CHROME_IDENTIFIER = True
RANDOM_TLS_EXTENSION_ORDER = True
FORCE_HTTP1 = False #default False
CATCH_PANICS = False #default False
RAW_RESPONSE_TYPE = 'HtmlResponse' #HtmlResponse or TextResponse, default HtmlResponse
RANDOM_APP_IDENTIFIER = True
RANDOM_TLS_EXTENSION_ORDER = True
FORCE_HTTP1 = False #default False
CATCH_PANICS = False #default False
RAW_RESPONSE_TYPE = 'HtmlResponse' #HtmlResponse or TextResponse, default HtmlResponse

For the custom usage:

JA3_STRING = '771,4865-4866-4867-49195-49199-49196-49200-52393-52392-49171-49172-156-157-47-53,0-23-65281-10-11-35-16-5-13-18-51-45-43-27-17513,29-23-24,0'
H2_SETTINGS = {
    "HEADER_TABLE_SIZE": 65536,
    "MAX_CONCURRENT_STREAMS": 1000,
    "INITIAL_WINDOW_SIZE": 6291456,
    "MAX_HEADER_LIST_SIZE": 262144
}
H2_SETTINGS_ORDER = [
    "HEADER_TABLE_SIZE",
    "MAX_CONCURRENT_STREAMS",
    "INITIAL_WINDOW_SIZE",
    "MAX_HEADER_LIST_SIZE"
]
SUPPORTED_SIGNATURE_ALGORITHMS = [
    "ECDSAWithP256AndSHA256",
    "PSSWithSHA256",
    "PKCS1WithSHA256",
    "ECDSAWithP384AndSHA384",
    "PSSWithSHA384",
    "PKCS1WithSHA384",
    "PSSWithSHA512",
    "PKCS1WithSHA512",
]
SUPPORTED_DELEGATED_CREDENTIALS_ALGORITHMS = [
    "ECDSAWithP256AndSHA256",
    "PSSWithSHA256",
    "PKCS1WithSHA256",
    "ECDSAWithP384AndSHA384",
    "PSSWithSHA384",
    "PKCS1WithSHA384",
    "PSSWithSHA512",
    "PKCS1WithSHA512",
]
SUPPORTED_VERSIONS = [
    "GREASE",
    "1.3",
    "1.2"
]
KEY_SHARE_CURVES = [
    "GREASE",
    "X25519"
]
CERT_COMPRESSION_ALGO = 'brotli'
ADDITIONAL_DECODE = 'gzip'
PSEUDO_HEADER_ORDER = [
    ":method",
    ":authority",
    ":scheme",
    ":path"
]
CONNECTION_FLOW = 15663105
PRIORITY_FRAMES = [
  {
    "streamID": 3,
    "priorityParam": {
      "weight": 201,
      "streamDep": 0,
      "exclusive": False
    }
  },
  {
    "streamID": 5,
    "priorityParam": {
      "weight": 101,
      "streamDep": False,
      "exclusive": 0
    }
  }
]
HEADER_ORDER = [
        "accept",
        "user-agent",
        "accept-encoding",
        "accept-language"
    ]
HEADER_PRIORITY = {
  "streamDep": 1,
  "exclusive": True,
  "weight": 1
}
FORCE_HTTP1 = False #default False
CATCH_PANICS = False #default False
RAW_RESPONSE_TYPE = 'HtmlResponse' #HtmlResponse or TextResponse, default HtmlResponse

Settings for Request

params = {
    'key1': 'value1',
    'key2': 'value2',
}
data = {
    'key1': 'value1',
    'key2': 'value2',
}
# turn cookie jar into dict, and remove the " mark, use ' mark
cookies = {
    'key1': 'value1',
    'key2': 'value2',
}
payload = {
    'key1': 'value1',
    'key2': 'value2'
}
proxy = 'http://username:password@ip:port' # https also works
or 
proxy = [
    'http://username:password@ip:port',
    'http://username:password@ip:port',
] # if the type of proxy is list, every request will get a random proxy in the list
meta_data = {
    'params': params,
    'data': data,
    'cookies': cookies,
    'json': payload,
    'allow_redirects': False,
    'insecure_skip_verify': False,
    'timeout_seconds': 10,
    'proxy': proxy
}
yield scrapy.Request(url=url, headers=headers, meta=meta_data)

And you also need to enable TlsClientDownloaderMiddleware in DOWNLOADER_MIDDLEWARES:

DOWNLOADER_MIDDLEWARES = {
    'scrapy_tls_client.downloaderMiddleware.TlsClientDownloaderMiddleware': 543,
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scrapy-tls-client-0.0.2.tar.gz (9.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

scrapy_tls_client-0.0.2-py3-none-any.whl (9.2 kB view details)

Uploaded Python 3

File details

Details for the file scrapy-tls-client-0.0.2.tar.gz.

File metadata

  • Download URL: scrapy-tls-client-0.0.2.tar.gz
  • Upload date:
  • Size: 9.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.16

File hashes

Hashes for scrapy-tls-client-0.0.2.tar.gz
Algorithm Hash digest
SHA256 15b4a1a39b58f4f97a025dd58162f8b4d2eb44696b7dd731cdbf9775e37397f8
MD5 f17677c17017a54a10dd05efcec86261
BLAKE2b-256 5ca5e59cf6c8c68e5228cacec145e0f7d43e9366b2c157f7e126d3684f6d604e

See more details on using hashes here.

File details

Details for the file scrapy_tls_client-0.0.2-py3-none-any.whl.

File metadata

File hashes

Hashes for scrapy_tls_client-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 35a3cc2bd02be8956e2cbd898a10e927d97e0b72c4bf7bf6f019e816ce132a90
MD5 22654dab872eb8d97419d0a45eab34be
BLAKE2b-256 c0096680b62e75467768cbb308988214670b4cdace58dc9df7d4fe16d97d3d05

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page