Skip to main content

implement QOS(TokenBucket) in scrapy download middleware

Project description

Scrapy-QOS

QOS components for Scrapy

Usage

Active the QosDownloaderMiddleware in settings.py

DOWNLOADER_MIDDLEWARES = {
    "scrapy_qos.QosDownloaderMiddleware": 543
}

Config following option in settings.py

  • QOS_IOPS_ENABLED
    • default False
    • set True to enable IOPS limiter
  • QOS_IOPS_CAPACITY
    • default 1
    • burst IO count per seconds
  • QOS_IOPS_LIMIT
    • default 1 / s
    • how many requests sent per seconds
  • QOS_BPS_ENABLED
    • default False
    • set True to enable BPS limiter
  • QOS_BPS_CAPACITY
    • default 1048576 Bytes
    • burst IO Bytes per seconds
  • QOS_BPS_LIMIT
    • default 1048576 Bytes / s
    • how many response Bytes receive per seconds
  • QOS_SMALL_RESPONSE_SIZE
    • default 1048576 Bytes
    • guess next response size filter response less than this value

Requirements

  • Python 3.7+
  • Scrapy >= 2.0
  • asyncio

Installation

From pip

pip install scrapy-qos

From Gitee

git clone https://gitee.com/hgdsdq/scrapy_qos.git
cd scrapy_qos
python setup.py install

Implementation

  • Basic implement QOS with Token Bucket Algorithm
  • For scrapy, QosDownloaderMiddleware will guess next response body size that used for BPS limiter
α = 0.8
guess_response_size = (1 - α) * guess_response_size + α * guess_response_size

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scrapy-qos-0.0.2.tar.gz (3.5 kB view details)

Uploaded Source

Built Distribution

scrapy_qos-0.0.2-py3-none-any.whl (3.7 kB view details)

Uploaded Python 3

File details

Details for the file scrapy-qos-0.0.2.tar.gz.

File metadata

  • Download URL: scrapy-qos-0.0.2.tar.gz
  • Upload date:
  • Size: 3.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.6

File hashes

Hashes for scrapy-qos-0.0.2.tar.gz
Algorithm Hash digest
SHA256 b2dc10b98f12ad64bb6754061845854f2e07283aa695a4b4f2f9b63734b1a17d
MD5 9b43bada1ca8839e4fee28431ddabd35
BLAKE2b-256 984a35c02a605db99de581645acacb4b5708468bc3ec5ca3e43b429bb5b5052c

See more details on using hashes here.

File details

Details for the file scrapy_qos-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: scrapy_qos-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 3.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.6

File hashes

Hashes for scrapy_qos-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 97051473db34c37e2b38ec4fb1e8d3da3a2b1390f735cd4dfed71acd0cfe0c43
MD5 4f65ff4930034fb5d2f21f9a6f025d25
BLAKE2b-256 0feb7c3dd1a218abb9463e058aadba796881fea99f554ff56a656f567192cb4f

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page