Skip to main content

A simple accessory tools for Scrapy.

Project description

Scrapy Accessory

Installation

pip install scrapy-accessory

Usage

RandomUserAgentDownloadMiddleware

Add random user-agent to requests.

In settings.py add

# USER_AGENT_LIST_FILE = 'path-to-files'
USER_AGENT_LIST = [
    'Mozilla/5.0(compatible;MSIE9.0;WindowsNT6.1;Trident/5.0',
    'Mozilla/5.0(WindowsNT6.1;rv:2.0.1)Gecko/20100101Firefox/4.0.1',
]

DOWNLOADER_MIDDLEWARES = {
    'scrapy_accessory.middlewares.RandomUserAgentDownloadMiddleware': 200,
}

You can use either USER_AGENT_LIST_FILE or USER_AGENT_LIST to configure user-agents. USER_AGENT_LIST_FILE points to a text file containing one user-agent per line. USER_AGENT_LIST is a list or tuple of user-agents.

ProxyDownloadMiddleware

Add http or https proxy for requests.

In settings.py add

PROXY_ENABLED = True  # True to use proxy, default is False
# PROXY_HOST = 'localhost:8080'  # default static proxy, format: <ip>:<port>, default empty
PROXY_CACHE = 'redis://localhost:6379/0'  # cache for proxy, use redis://<host>:<port>/<db> to use redis cache, default dict in memory
PROXY_TTL = 30 # proxy cache ttl in seconds, default 30s
CHANGE_PROXY_STATUS = [429]  # a list of status codes that force to change proxy if received, default [429]

Default is a static proxy configured in settings.py, you can add dynamic proxy from API or others. Just need to extend the ProxyDownloadMiddleware class and implement the generate_proxy method.

Example:

class DynamicProxyDownloadMiddleware(ProxyDownloadMiddleware):

    api = 'http://api-to-get-proxy-ip'

    def generate_proxy(self):
        res = requests.get(self.api)
        if res.status_code < 300:
            return res.text  # return format <ip>:<port>
        return None

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scrapy-accessory-0.1.0.tar.gz (3.7 kB view details)

Uploaded Source

Built Distribution

scrapy_accessory-0.1.0-py3-none-any.whl (6.1 kB view details)

Uploaded Python 3

File details

Details for the file scrapy-accessory-0.1.0.tar.gz.

File metadata

  • Download URL: scrapy-accessory-0.1.0.tar.gz
  • Upload date:
  • Size: 3.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/44.0.0.post20200106 requests-toolbelt/0.9.1 tqdm/4.41.1 CPython/3.7.6

File hashes

Hashes for scrapy-accessory-0.1.0.tar.gz
Algorithm Hash digest
SHA256 3b81863376466f64c48232ea974381c930d3e1c1d5ae2ea7b075c6cadc297902
MD5 ce7f44d8c3753268483eb19667a5d6e4
BLAKE2b-256 1d589743fb030482aa4d1f3b97b9c096789aa5a3fe73ec44fd0902ec85619d71

See more details on using hashes here.

File details

Details for the file scrapy_accessory-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: scrapy_accessory-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 6.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/44.0.0.post20200106 requests-toolbelt/0.9.1 tqdm/4.41.1 CPython/3.7.6

File hashes

Hashes for scrapy_accessory-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ff691d2bb8e7528fb468b80cbdb1a606bcf69e489f963cb465b15ee09021eb6e
MD5 fe4729ff722cceb6cbca62cb7099fe23
BLAKE2b-256 c94fa7aa665ce89a178a640cf8d04ae6c47458cc747763dc7f1fe29f6ab419ea

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page