A middleware to change proxy rotated for Scrapy
Project description
Overview
Scrapy-Rotated-Proxy is a middleware to dynamically configure Request proxy for Scrapy. It can used when you have multi proxy ip, and need to attach rotated proxy to each Request. Scrapy-Rotated-Proxy support multi backend storage, you can provide proxy ip list through Spider Settings, File or MongoDB.
Requirements
Python 2.7 or Python 3.3+
Works on Linux, Windows, Mac OSX, BSD
Install
The quick way:
pip install scrapy-rotated-proxy
OR copy this middleware to your scrapy project.
Documentation
In settings.py, for example:
# ----------------------------------------------------------------------------- # ROTATED PROXY SETTINGS (Spider Settings Backend) # ----------------------------------------------------------------------------- DOWNLOADER_MIDDLEWARES.update({ 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware': None, 'scrapy_rotated_proxy.downloadmiddlwares.proxy.RotatedProxyMiddleware': 750, }) ROTATED_PROXY_ENABLED = True PROXY_STORAGE = 'scrapy_rotated_proxy.extensions.file_storage.FileProxyStorage' # When set PROXY_FILE_PATH='', scrapy-rotated-proxy # will use proxy in Spider Settings default. PROXY_FILE_PATH = '' HTTP_PROXIES = [ 'http://proxy0:8888', 'http://user:pass@proxy1:8888', 'https://user:pass@proxy1:8888', ] HTTPS_PROXIES = [ 'http://proxy0:8888', 'http://user:pass@proxy1:8888', 'https://user:pass@proxy1:8888', ] # ----------------------------------------------------------------------------- # ROTATED PROXY SETTINGS (Local File Backend) # ----------------------------------------------------------------------------- DOWNLOADER_MIDDLEWARES.update({ 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware': None, 'scrapy_rotated_proxy.downloadmiddlwares.proxy.RotatedProxyMiddleware': 750, }) ROTATED_PROXY_ENABLED = True PROXY_STORAGE = 'scrapy_rotated_proxy.extensions.file_storage.FileProxyStorage' PROXY_FILE_PATH = 'file_path/proxy.txt' # proxy file content, must conform to json format, otherwise will cause json # load error HTTP_PROXIES = [ 'http://proxy0:8888', 'http://user:pass@proxy1:8888', 'https://user:pass@proxy1:8888' ] HTTPS_PROXIES = [ 'http://proxy0:8888', 'http://user:pass@proxy1:8888', 'https://user:pass@proxy1:8888' ] # ----------------------------------------------------------------------------- # ROTATED PROXY SETTINGS (MongoDB Backend) # ----------------------------------------------------------------------------- # mongodb document required field: scheme, ip, port, username, password # document example: {'scheme': 'http', 'ip': '10.0.0.1', 'port': 8080, # 'username':'user', 'password':'password'} DOWNLOADER_MIDDLEWARES.update({ 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware': None, 'scrapy_rotated_proxy.downloadmiddlwares.proxy.RotatedProxyMiddleware': 750, }) ROTATED_PROXY_ENABLED = True PROXY_STORAGE = 'scrapy_rotated_proxy.extensions.mongodb_storage.MongoDBProxyStorage' PROXY_MONGODB_HOST = HOST_OR_IP PROXY_MONGODB_PORT = 27017 PROXY_MONGODB_USERNAME = USERNAME_OR_NONE PROXY_MONGODB_PASSWORD = PASSWORD_OR_NONE PROXY_MONGODB_STORAGE_URI = 'mongodb://{auth}{host}:{port}'.format( auth='{}:{}@'.format(PROXY_MONGODB_USERNAME, PROXY_MONGODB_PASSWORD) if PROXY_MONGODB_USERNAME else '', host=PROXY_MONGODB_HOST, port=PROXY_MONGODB_PORT ) PROXY_MONGODB_STORAGE_DB = 'vps_management' PROXY_MONGODB_STORAGE_COLL = 'service'
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file scrapy-rotated-proxy-0.0.6.tar.gz
.
File metadata
- Download URL: scrapy-rotated-proxy-0.0.6.tar.gz
- Upload date:
- Size: 9.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 |
522712f6dab1ee1bb3334cc633039b5f82b1d512e811e0a97c2370bd68ec8c8d
|
|
MD5 |
e2614ba474733bdd87dce42b502b1420
|
|
BLAKE2b-256 |
976f3fbfcd0d592d9157f2a38b95e169fd34aa5145b4efce0800ea5a95a242a8
|