A middleware to change proxy rotated for Scrapy
Project description
Overview
Scrapy-Rotated-Proxy is a middleware to dynamically configure Request proxy for Scrapy. It can used when you have multi proxy ip, and need to attach rotated proxy to each Request. Scrapy-Rotated-Proxy support multi backend storage, you can provide proxy ip list through Spider Settings, File or MongoDB.
Requirements
Python 2.7 or Python 3.3+
Works on Linux, Windows, Mac OSX, BSD
Install
The quick way:
pip install scrapy-rotated-proxy
OR copy this middleware to your scrapy project.
Documentation
In settings.py, for example:
# -----------------------------------------------------------------------------
# ROTATED PROXY SETTINGS (Spider Settings Backend)
# -----------------------------------------------------------------------------
DOWNLOADER_MIDDLEWARES.update({
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware': None,
'scrapy_rotated_proxy.downloadmiddlewares.proxy.RotatedProxyMiddleware': 750,
})
ROTATED_PROXY_ENABLED = True
PROXY_STORAGE = 'scrapy_rotated_proxy.extensions.file_storage.FileProxyStorage'
# When set PROXY_FILE_PATH='', scrapy-rotated-proxy
# will use proxy in Spider Settings default.
PROXY_FILE_PATH = ''
HTTP_PROXIES = [
'http://proxy0:8888',
'http://user:pass@proxy1:8888',
'https://user:pass@proxy1:8888',
]
HTTPS_PROXIES = [
'http://proxy0:8888',
'http://user:pass@proxy1:8888',
'https://user:pass@proxy1:8888',
]
# -----------------------------------------------------------------------------
# ROTATED PROXY SETTINGS (Local File Backend)
# -----------------------------------------------------------------------------
DOWNLOADER_MIDDLEWARES.update({
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware': None,
'scrapy_rotated_proxy.downloadmiddlewares.proxy.RotatedProxyMiddleware': 750,
})
ROTATED_PROXY_ENABLED = True
PROXY_STORAGE = 'scrapy_rotated_proxy.extensions.file_storage.FileProxyStorage'
PROXY_FILE_PATH = 'file_path/proxy.txt'
# proxy file content, must conform to json format, otherwise will cause json
# load error
HTTP_PROXIES = [
'http://proxy0:8888',
'http://user:pass@proxy1:8888',
'https://user:pass@proxy1:8888'
]
HTTPS_PROXIES = [
'http://proxy0:8888',
'http://user:pass@proxy1:8888',
'https://user:pass@proxy1:8888'
]
# -----------------------------------------------------------------------------
# ROTATED PROXY SETTINGS (MongoDB Backend)
# -----------------------------------------------------------------------------
# mongodb document required field: scheme, ip, port, username, password
# document example: {'scheme': 'http', 'ip': '10.0.0.1', 'port': 8080,
# 'username':'user', 'password':'password'}
DOWNLOADER_MIDDLEWARES.update({
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware': None,
'scrapy_rotated_proxy.downloadmiddlewares.proxy.RotatedProxyMiddleware': 750,
})
ROTATED_PROXY_ENABLED = True
PROXY_STORAGE = 'scrapy_rotated_proxy.extensions.mongodb_storage.MongoDBProxyStorage'
PROXY_MONGODB_HOST = HOST_OR_IP
PROXY_MONGODB_PORT = 27017
PROXY_MONGODB_USERNAME = USERNAME_OR_NONE
PROXY_MONGODB_PASSWORD = PASSWORD_OR_NONE
PROXY_MONGODB_DB = 'vps_management'
PROXY_MONGODB_COLL = 'service'
# -----------------------------------------------------------------------------
# OTHER SETTINGS (Optional)
# -----------------------------------------------------------------------------
# if you want to wait proxy become valid after all proxies are blocked,
# you can set wait flag `PROXY_SPIDER_CLOSE_WHEN_NO_PROXY` to False and interval to wait.
# otherwise, you can set `PROXY_SPIDER_CLOSE_WHEN_NO_PROXY` to True, it's default setting.
PROXY_SLEEP_INTERVAL = 60
PROXY_SPIDER_CLOSE_WHEN_NO_PROXY = False
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file scrapy-rotated-proxy-0.0.7.tar.gz.
File metadata
- Download URL: scrapy-rotated-proxy-0.0.7.tar.gz
- Upload date:
- Size: 10.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a80d3dbd071ccc75286f9bfa8568ec0459b2b9df6a7c1599657a3431ac5b8f79
|
|
| MD5 |
3f18cdd30e70201608a24bd796916766
|
|
| BLAKE2b-256 |
1835395e301da5314b72fc6352e1f7ff3512cf03fb85a71d9c5e67108a242017
|