Scrapy proxy pool that allows custom proxy provider
Project description
A Scrapy middleware to manage a custom proxy pool and randomly choose a proxy for each request.
Installation
pip install scrapy-custom-proxy-pool
Usage
In settings.py:
DOWNLOADER_MIDDLEWARES = {
# ...
'scrapy_custom_proxy_pool.middlewares.ProxyPoolMiddleware': 610,
'scrapy_custom_proxy_pool.middlewares.BanDetectionMiddleware': 620,
# ...
}
PROXY_POOL_ENABLED = True
PROXY_POOL_SIZE = 10
PROXY_POOL_COLLECTOR = 'myproject.collectors.MyCollector'
PROXY_POOL_COLLECTOR_ARGS = {}
PROXY_POOL_REFRESH_INTERVAL = 60
PROXY_POOL_PAGE_RETRY_TIMES = 5
PROXY_POOL_TRY_WITH_HOST = True
Settings
PROXY_POOL_ENABLED - Enable/disable proxy pool (default: False)
PROXY_POOL_SIZE - Max proxies in pool (default: 10)
PROXY_POOL_COLLECTOR - Custom proxy collector path (required)
PROXY_POOL_COLLECTOR_ARGS - Collector initialization arguments (default: {})
PROXY_POOL_REFRESH_INTERVAL - Proxy refresh interval in secs (default: 60)
PROXY_POOL_PAGE_RETRY_TIMES - Max retry times per page (default: 5)
PROXY_POOL_TRY_WITH_HOST - Try requests without proxy (default: True)
Customization
Ban Detection Policy
You can customize the ban detection policy:
# myproject/policy.py
from scrapy_custom_proxy_pool.policy import BanDetectionPolicy
class MyPolicy(BanDetectionPolicy):
# override response_is_ban and exception_is_ban
# settings.py
PROXY_POOL_BAN_POLICY = 'myproject.policy.MyPolicy'
Proxy Collector
You need to implement a custom proxy collector by subclassing ProxyCollector:
from scrapy_custom_proxy_pool.collectors import ProxyCollector
class MyCollector(ProxyCollector):
def fetch_proxies(self, count):
# return list of `Proxy` objects
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Close
Hashes for scrapy-custom-proxy-pool-0.1.0.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | c36810580f449d14b50614320f6d64604675a478904796ce81cd9a846ce28a25 |
|
MD5 | 4f6b0024350f48773037595e9bf4a929 |
|
BLAKE2b-256 | 90913978ba1f86e950666cc0cfa52fe5336f229a2faffa3068a5b31df89963f2 |
Close
Hashes for scrapy_custom_proxy_pool-0.1.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 60ef67dcfe8ec6ceb8910a1efa2f067890233491fbab5f7af89b10b6129edf62 |
|
MD5 | b4e1fb2c0f9c69abc2ab52d300d0b77c |
|
BLAKE2b-256 | 100edc49613d0f64f4bf0d7619e955218c6d5b8c565c009e7696c375ceda27e4 |