Skip to main content

A middleware to change proxy rotated for Scrapy

Project description

PyPI Version Build Status

Overview

Scrapy-Rotated-Proxy is a middleware to dynamically configure Request proxy for Scrapy. It can used when you have multi proxy ip, and need to attach rotated proxy to each Request. Scrapy-Rotated-Proxy support multi backend storage, you can provide proxy ip list through Spider Settings, File or MongoDB.

Requirements

  • Python 2.7 or Python 3.3+
  • Works on Linux, Windows, Mac OSX, BSD

Install

The quick way:

pip install scrapy-rotated-proxy

OR copy this middleware to your scrapy project.

Documentation

In settings.py, for example:

# -----------------------------------------------------------------------------
# ROTATED PROXY SETTINGS (Spider Settings Backend)
# -----------------------------------------------------------------------------
DOWNLOADER_MIDDLEWARES.update({
    'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware': None,
    'scrapy_rotated_proxy.downloadmiddlwares.proxy.RotatedProxy': 750,
})
ROTATED_PROXY_ENABLED = True
PROXY_STORAGE = 'scrapy_rotated_proxy.extensions.file_storage.FileProxyStorage'
# When set PROXY_FILE_PATH='', scrapy-rotated-proxy
# will use proxy in Spider Settings default.
PROXY_FILE_PATH = ''
HTTP_PROXIES = [
    'http://proxy0:8888',
    'http://user:pass@proxy1:8888',
    'https://user:pass@proxy1:8888',
]
HTTPS_PROXIES = [
    'http://proxy0:8888',
    'http://user:pass@proxy1:8888',
    'https://user:pass@proxy1:8888',
]


# -----------------------------------------------------------------------------
# ROTATED PROXY SETTINGS (Local File Backend)
# -----------------------------------------------------------------------------
DOWNLOADER_MIDDLEWARES.update({
    'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware': None,
    'scrapy_rotated_proxy.downloadmiddlwares.proxy.RotatedProxy': 750,
})
ROTATED_PROXY_ENABLED = True
PROXY_STORAGE = 'scrapy_rotated_proxy.extensions.file_storage.FileProxyStorage'
PROXY_FILE_PATH = 'file_path/proxy.txt'
# proxy file content, must conform to json format, otherwise will cause json
# load error
HTTP_PROXIES = [
    'http://proxy0:8888',
    'http://user:pass@proxy1:8888',
    'https://user:pass@proxy1:8888'
]
HTTPS_PROXIES = [
    'http://proxy0:8888',
    'http://user:pass@proxy1:8888',
    'https://user:pass@proxy1:8888'
]


# -----------------------------------------------------------------------------
# ROTATED PROXY SETTINGS (MongoDB Backend)
# -----------------------------------------------------------------------------
# mongodb document required field: scheme, ip, port, username, password
# document example: {'scheme': 'http', 'ip': '10.0.0.1', 'port': 8080,
# 'username':'user', 'password':'password'}
DOWNLOADER_MIDDLEWARES.update({
    'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware': None,
    'scrapy_rotated_proxy.downloadmiddlwares.proxy.RotatedProxy': 750,
})
ROTATED_PROXY_ENABLED = True
PROXY_STORAGE = 'scrapy_rotated_proxy.extensions.mongodb_storage.MongoDBProxyStorage'
PROXY_MONGODB_HOST = HOST_OR_IP
PROXY_MONGODB_PORT = 27017
PROXY_MONGODB_USERNAME = USERNAME_OR_NONE
PROXY_MONGODB_PASSWORD = PASSWORD_OR_NONE
PROXY_MONGODB_STORAGE_URI = 'mongodb://{auth}{host}:{port}'.format(
    auth='{}:{}@'.format(PROXY_MONGODB_USERNAME, PROXY_MONGODB_PASSWORD)
    if PROXY_MONGODB_USERNAME else '',
    host=PROXY_MONGODB_HOST,
    port=PROXY_MONGODB_PORT
)
PROXY_MONGODB_STORAGE_DB = 'vps_management'
PROXY_MONGODB_STORAGE_COLL = 'service'

Project details


Release history Release notifications

History Node

0.1.0

History Node

0.0.9

History Node

0.0.8

History Node

0.0.7

History Node

0.0.6

History Node

0.0.5

This version
History Node

0.0.4

History Node

0.0.3

History Node

0.0.2

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Filename, size & hash SHA256 hash help File type Python version Upload date
scrapy-rotated-proxy-0.0.4.tar.gz (10.0 kB) Copy SHA256 hash SHA256 Source None Oct 13, 2017

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging CloudAMQP CloudAMQP RabbitMQ AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page