A middleware to change proxy rotated for Scrapy
Project description
Overview
Scrapy-Rotated-Proxy is a middleware to dynamically configure Request proxy for Scrapy. It can used when you have multi proxy ip, and need to attach rotated proxy to each Request. Scrapy-Rotated-Proxy support multi backend storage, you can provide proxy ip list through Spider Settings, File or MongoDB.
Requirements
Python 2.7 or Python 3.3+
Works on Linux, Windows, Mac OSX, BSD
Install
The quick way:
pip install scrapy-rotated-proxy
OR copy this middleware to your scrapy project.
Documentation
In settings.py, for example:
# ----------------------------------------------------------------------------- # ROTATED PROXY SETTINGS (Spider Settings Backend) # ----------------------------------------------------------------------------- DOWNLOADER_MIDDLEWARES.update({ 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware': None, 'scrapy_rotated_proxy.downloadmiddlewares.proxy.RotatedProxyMiddleware': 750, }) ROTATED_PROXY_ENABLED = True PROXY_STORAGE = 'scrapy_rotated_proxy.extensions.file_storage.FileProxyStorage' # When set PROXY_FILE_PATH='', scrapy-rotated-proxy # will use proxy in Spider Settings default. PROXY_FILE_PATH = '' HTTP_PROXIES = [ 'http://proxy0:8888', 'http://user:pass@proxy1:8888', 'https://user:pass@proxy1:8888', ] HTTPS_PROXIES = [ 'http://proxy0:8888', 'http://user:pass@proxy1:8888', 'https://user:pass@proxy1:8888', ] # ----------------------------------------------------------------------------- # ROTATED PROXY SETTINGS (Local File Backend) # ----------------------------------------------------------------------------- DOWNLOADER_MIDDLEWARES.update({ 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware': None, 'scrapy_rotated_proxy.downloadmiddlewares.proxy.RotatedProxyMiddleware': 750, }) ROTATED_PROXY_ENABLED = True PROXY_STORAGE = 'scrapy_rotated_proxy.extensions.file_storage.FileProxyStorage' PROXY_FILE_PATH = 'file_path/proxy.txt' # proxy file content, must conform to json format, otherwise will cause json # load error HTTP_PROXIES = [ 'http://proxy0:8888', 'http://user:pass@proxy1:8888', 'https://user:pass@proxy1:8888' ] HTTPS_PROXIES = [ 'http://proxy0:8888', 'http://user:pass@proxy1:8888', 'https://user:pass@proxy1:8888' ] # ----------------------------------------------------------------------------- # ROTATED PROXY SETTINGS (MongoDB Backend) # ----------------------------------------------------------------------------- # mongodb document required field: scheme, ip, port, username, password # document example: {'scheme': 'http', 'ip': '10.0.0.1', 'port': 8080, # 'username':'user', 'password':'password'} DOWNLOADER_MIDDLEWARES.update({ 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware': None, 'scrapy_rotated_proxy.downloadmiddlewares.proxy.RotatedProxyMiddleware': 750, }) ROTATED_PROXY_ENABLED = True PROXY_STORAGE = 'scrapy_rotated_proxy.extensions.mongodb_storage.MongoDBProxyStorage' PROXY_MONGODB_HOST = HOST_OR_IP PROXY_MONGODB_PORT = 27017 PROXY_MONGODB_USERNAME = USERNAME_OR_NONE PROXY_MONGODB_PASSWORD = PASSWORD_OR_NONE PROXY_MONGODB_DB = 'vps_management' PROXY_MONGODB_COLL = 'service' # ----------------------------------------------------------------------------- # OTHER SETTINGS (Optional) # ----------------------------------------------------------------------------- # if you want to wait proxy become valid after all proxies are blocked, # you can set wait flag `PROXY_SPIDER_CLOSE_WHEN_NO_PROXY` to False and interval to wait. # otherwise, you can set `PROXY_SPIDER_CLOSE_WHEN_NO_PROXY` to True, it's default setting. PROXY_SLEEP_INTERVAL = 60 PROXY_SPIDER_CLOSE_WHEN_NO_PROXY = False
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
scrapy-rotated-proxy-0.0.7.tar.gz
(10.2 kB
view hashes)
Close
Hashes for scrapy-rotated-proxy-0.0.7.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | a80d3dbd071ccc75286f9bfa8568ec0459b2b9df6a7c1599657a3431ac5b8f79 |
|
MD5 | 3f18cdd30e70201608a24bd796916766 |
|
BLAKE2b-256 | 1835395e301da5314b72fc6352e1f7ff3512cf03fb85a71d9c5e67108a242017 |