Skip to main content

A simple accessory tools for Scrapy.

Project description

Scrapy Accessory


Useful accessory utilities for Scrapy.


  • middleware
  • item pipeline
  • feed exporter storage backend


pip install scrapy-accessory




Add random user-agent to requests.

In add

# USER_AGENT_LIST_FILE = 'path-to-files'

    'scrapy_accessory.middlewares.RandomUserAgentDownloadMiddleware': 200,

You can use either USER_AGENT_LIST_FILE or USER_AGENT_LIST to configure user-agents. USER_AGENT_LIST_FILE points to a text file containing one user-agent per line. USER_AGENT_LIST is a list or tuple of user-agents.


Add http or https proxy for requests.

In add

PROXY_ENABLED = True  # True to use proxy, default is False
# PROXY_HOST = 'localhost:8080'  # default static proxy, format: <ip>:<port>, default empty
PROXY_CACHE = 'redis://localhost:6379/0'  # cache for proxy, use redis://<host>:<port>/<db> to use redis cache, default dict in memory
PROXY_TTL = 30 # proxy cache ttl in seconds, default 30s
CHANGE_PROXY_STATUS = [429]  # a list of status codes that force to change proxy if received, default [429]

Default is a static proxy configured in, you can add dynamic proxy from API or others. Just need to extend the ProxyDownloadMiddleware class and implement the generate_proxy method.


class DynamicProxyDownloadMiddleware(ProxyDownloadMiddleware):

    api = 'http://api-to-get-proxy-ip'

    def generate_proxy(self):
        res = requests.get(self.api)
        if res.status_code < 300:
            return res.text  # return format <ip>:<port>
        return None

Feed exporter storage backend


Feed exporter storage backend for huawei cloud OBS.

Install obs sdk first

pip install esdk-obs-python

Configure in

    'obs': 'scrapy_accessory.feedexporter.ObsFeedStorage',
HUAWEI_ACCESS_KEY_ID = '<your access key id>'
HUAWEI_SECRET_ACCESS_KEY = '<your secret access key>'
HUAWEI_OBS_ENDPOINT = '<your obs bucket endpoint> ex:'

Output to OBS by obs schema -o obs://<bucket>/<key>


Feed exporter storage backend for ali cloud OSS.

Install oss sdk first

pip install oss2

Configure in

    'oss': 'scrapy_accessory.feedexporter.OssFeedStorage',
ALI_ACCESS_KEY_ID = '<your access key id>'
ALI_SECRET_ACCESS_KEY = '<your secret access key>'
ALI_OSS_ENDPOINT = '<your oss bucket endpoint> ex:'

Output to OSS by oss schema -o oss://<bucket>/<key>

Item Pipeline


Export items to redis list.

Install redis first.

pip install redis

Configure in

REDIS_CONNECTION_URL = 'redis://localhost:6379/0'  # required
REDIS_DEFAULT_QUEUE = 'test'  # use spider's queue attribute to override it
REDIS_MAX_RETRY = 5  # default 5

Add scrapy_accessory.pipelines.RedisListPipeline to your ITEM_PIPELINES settings.

    'scrapy_accessory.pipelines.RedisListPipeline': 1

Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for scrapy-accessory, version 0.2.1
Filename, size File type Python version Upload date Hashes
Filename, size scrapy_accessory-0.2.1-py3-none-any.whl (8.4 kB) File type Wheel Python version py3 Upload date Hashes View
Filename, size scrapy-accessory-0.2.1.tar.gz (5.5 kB) File type Source Python version None Upload date Hashes View

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Huawei Huawei PSF Sponsor Microsoft Microsoft PSF Sponsor NVIDIA NVIDIA PSF Sponsor Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page