Scrapy proxy pool that allows custom proxy provider
Project description
A Scrapy middleware to manage a custom proxy pool and randomly choose a proxy for each request.
Installation
pip install scrapy-custom-proxy-pool
Usage
In settings.py:
DOWNLOADER_MIDDLEWARES = {
# ...
'scrapy_custom_proxy_pool.middlewares.ProxyPoolMiddleware': 610,
'scrapy_custom_proxy_pool.middlewares.BanDetectionMiddleware': 620,
# ...
}
PROXY_POOL_ENABLED = True
PROXY_POOL_SIZE = 10
PROXY_POOL_COLLECTOR = 'myproject.collectors.MyCollector'
PROXY_POOL_COLLECTOR_ARGS = {}
PROXY_POOL_REFRESH_INTERVAL = 60
PROXY_POOL_PAGE_RETRY_TIMES = 5
PROXY_POOL_TRY_WITH_HOST = True
Settings
PROXY_POOL_ENABLED - Enable/disable proxy pool (default: False)
PROXY_POOL_SIZE - Max proxies in pool (default: 10)
PROXY_POOL_COLLECTOR - Custom proxy collector path (required)
PROXY_POOL_COLLECTOR_ARGS - Collector initialization arguments (default: {})
PROXY_POOL_REFRESH_INTERVAL - Proxy refresh interval in secs (default: 60)
PROXY_POOL_PAGE_RETRY_TIMES - Max retry times per page (default: 5)
PROXY_POOL_TRY_WITH_HOST - Try requests without proxy (default: True)
Customization
Ban Detection Policy
You can customize the ban detection policy:
# myproject/policy.py
from scrapy_custom_proxy_pool.policy import BanDetectionPolicy
class MyPolicy(BanDetectionPolicy):
# override response_is_ban and exception_is_ban
# settings.py
PROXY_POOL_BAN_POLICY = 'myproject.policy.MyPolicy'
Proxy Collector
You need to implement a custom proxy collector by subclassing ProxyCollector:
from scrapy_custom_proxy_pool.collectors import ProxyCollector
class MyCollector(ProxyCollector):
def fetch_proxies(self, count):
# return list of `Proxy` objects
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file scrapy-custom-proxy-pool-0.1.0.tar.gz.
File metadata
- Download URL: scrapy-custom-proxy-pool-0.1.0.tar.gz
- Upload date:
- Size: 8.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c36810580f449d14b50614320f6d64604675a478904796ce81cd9a846ce28a25
|
|
| MD5 |
4f6b0024350f48773037595e9bf4a929
|
|
| BLAKE2b-256 |
90913978ba1f86e950666cc0cfa52fe5336f229a2faffa3068a5b31df89963f2
|
File details
Details for the file scrapy_custom_proxy_pool-0.1.0-py3-none-any.whl.
File metadata
- Download URL: scrapy_custom_proxy_pool-0.1.0-py3-none-any.whl
- Upload date:
- Size: 9.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
60ef67dcfe8ec6ceb8910a1efa2f067890233491fbab5f7af89b10b6129edf62
|
|
| MD5 |
b4e1fb2c0f9c69abc2ab52d300d0b77c
|
|
| BLAKE2b-256 |
100edc49613d0f64f4bf0d7619e955218c6d5b8c565c009e7696c375ceda27e4
|