A middleware to change proxy rotated for Scrapy
Project description
Overview
Scrapy-Rotated-Proxy is a middleware to dynamically configure Request proxy for Scrapy. It can used when you have multi proxy ip, and need to attach rotated proxy to each Request. Scrapy-Rotated-Proxy support multi backend storage, you can provide proxy ip list through Spider Settings, File or MongoDB.
Requirements
Python 2.7 or Python 3.3+
Works on Linux, Windows, Mac OSX, BSD
Install
The quick way:
pip install scrapy-rotated-proxy
OR copy this middleware to your scrapy project.
Documentation
In settings.py, for example:
# ----------------------------------------------------------------------------- # ROTATED PROXY SETTINGS (Spider Settings Backend) # ----------------------------------------------------------------------------- DOWNLOADER_MIDDLEWARES.update({ 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware': None, 'scrapy_rotated_proxy.downloadmiddlewares.proxy.RotatedProxyMiddleware': 750, }) ROTATED_PROXY_ENABLED = True PROXY_STORAGE = 'scrapy_rotated_proxy.extensions.file_storage.FileProxyStorage' # When set PROXY_FILE_PATH='', scrapy-rotated-proxy # will use proxy in Spider Settings default. PROXY_FILE_PATH = '' HTTP_PROXIES = [ 'http://proxy0:8888', 'http://user:pass@proxy1:8888', 'https://user:pass@proxy1:8888', ] HTTPS_PROXIES = [ 'http://proxy0:8888', 'http://user:pass@proxy1:8888', 'https://user:pass@proxy1:8888', ] # ----------------------------------------------------------------------------- # ROTATED PROXY SETTINGS (Local File Backend) # ----------------------------------------------------------------------------- DOWNLOADER_MIDDLEWARES.update({ 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware': None, 'scrapy_rotated_proxy.downloadmiddlewares.proxy.RotatedProxyMiddleware': 750, }) ROTATED_PROXY_ENABLED = True PROXY_STORAGE = 'scrapy_rotated_proxy.extensions.file_storage.FileProxyStorage' PROXY_FILE_PATH = 'file_path/proxy.txt' # proxy file content, must conform to json format, otherwise will cause json # load error HTTP_PROXIES = [ 'http://proxy0:8888', 'http://user:pass@proxy1:8888', 'https://user:pass@proxy1:8888' ] HTTPS_PROXIES = [ 'http://proxy0:8888', 'http://user:pass@proxy1:8888', 'https://user:pass@proxy1:8888' ] # ----------------------------------------------------------------------------- # ROTATED PROXY SETTINGS (MongoDB Backend) # ----------------------------------------------------------------------------- # mongodb document required field: scheme, ip, port, username, password # document example: {'scheme': 'http', 'ip': '10.0.0.1', 'port': 8080, # 'username':'user', 'password':'password'} DOWNLOADER_MIDDLEWARES.update({ 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware': None, 'scrapy_rotated_proxy.downloadmiddlewares.proxy.RotatedProxyMiddleware': 750, }) ROTATED_PROXY_ENABLED = True PROXY_STORAGE = 'scrapy_rotated_proxy.extensions.mongodb_storage.MongoDBProxyStorage' PROXY_MONGODB_HOST = HOST_OR_IP PROXY_MONGODB_PORT = 27017 PROXY_MONGODB_USERNAME = USERNAME_OR_NONE PROXY_MONGODB_PASSWORD = PASSWORD_OR_NONE PROXY_MONGODB_DB = 'vps_management' PROXY_MONGODB_COLL = 'service' # ----------------------------------------------------------------------------- # OTHER SETTINGS (Optional) # ----------------------------------------------------------------------------- # default, spider will close when run out of all proxies # and proxy sleep interval is 24 hours (blocked proxy will disuse in 24 hours), # if you want to wait proxy become valid after all proxies are blocked, # you can set wait flag `PROXY_SPIDER_CLOSE_WHEN_NO_PROXY` to False and interval to wait. PROXY_SLEEP_INTERVAL = 60*60*24 PROXY_SPIDER_CLOSE_WHEN_NO_PROXY = False
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
scrapy-rotated-proxy-0.0.8.tar.gz
(10.2 kB
view hashes)
Close
Hashes for scrapy-rotated-proxy-0.0.8.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | ffc44f88e2d30e6301ffde761930bdcf34abf468b1f6a090c423165ba3fa942c |
|
MD5 | e8d8c9e6624213333fc912a93f830784 |
|
BLAKE2b-256 | 5d1eace56eaaa880090ffcfe2fc1c44fae0b6340ed55d3c6f4401d32835fc38c |