Skip to main content

implement QOS(TokenBucket) in scrapy download middleware

Project description

Scrapy-QOS

QOS components for Scrapy

Usage

Active the QosDownloaderMiddleware in settings.py

DOWNLOADER_MIDDLEWARES = {
    "scrapy_qos.QosDownloaderMiddleware": 543
}

Config following option in settings.py

  • QOS_IOPS_ENABLED
    • default False
    • set True to enable IOPS limiter
  • QOS_IOPS_CAPACITY
    • default 1
    • burst IO count per seconds
  • QOS_IOPS_LIMIT
    • default 1 / s
    • how many requests sent per seconds
  • QOS_BPS_ENABLED
    • default False
    • set True to enable BPS limiter
  • QOS_BPS_CAPACITY
    • default 1048576 Bytes
    • burst IO Bytes per seconds
  • QOS_BPS_LIMIT
    • default 1048576 Bytes / s
    • how many response Bytes receive per seconds
  • QOS_SMALL_RESPONSE_SIZE
    • default 1048576 Bytes
    • guess next response size filter response less than this value

Requirements

  • Python 3.7+
  • Scrapy >= 2.0
  • asyncio

Installation

From pip

pip install scrapy-qos

From Gitee

git clone https://gitee.com/hgdsdq/scrapy_qos.git
cd scrapy_qos
python setup.py install

Implementation

  • Basic implement QOS with Token Bucket Algorithm
  • For scrapy, QosDownloaderMiddleware will guess next response body size that used for BPS limiter
α = 0.8
guess_response_size = (1 - α) * guess_response_size + α * guess_response_size

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scrapy-qos-0.0.2.tar.gz (3.5 kB view hashes)

Uploaded Source

Built Distribution

scrapy_qos-0.0.2-py3-none-any.whl (3.7 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page