Scrapy downloader middleware that rotates proxies and retries on Cloudflare/DataDome/PerimeterX bans.
Project description
scrapy-rotating-proxy-middleware
A drop-in Scrapy downloader middleware that rotates proxies and retries on bans — 403, 429, Cloudflare "Just a moment", DataDome, and PerimeterX challenges. Point it at a static proxy list or a single rotating gateway and your spider stops dying on blocks.
pip install scrapy-rotating-proxy-middleware
Why
Scrapy's built-in HttpProxyMiddleware assigns one proxy and never reacts when that exit IP gets blocked. In practice most anti-bot blocks aren't about your spider logic — they're about the IP and its TLS fingerprint being scored before your request reaches the page. This middleware:
- assigns a proxy per request (random from a list, or a rotating gateway),
- detects bans by status code and response-body signature (Cloudflare / DataDome / PerimeterX),
- transparently rotates to a fresh proxy and retries, with a per-request retry budget,
- moves inline
user:passcredentials into theProxy-Authorizationheader automatically.
Setup
Enable it in settings.py and disable Scrapy's default proxy middleware:
DOWNLOADER_MIDDLEWARES = {
"scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware": None,
"scrapy_rotating_proxy.middleware.RotatingProxyMiddleware": 610,
}
Option A — a rotating residential gateway (recommended)
A residential gateway gives you a new exit IP on every connection from a single URL, so you don't manage a list at all:
# settings.py
ROTATING_PROXY_GATEWAY = "http://USERNAME:PASSWORD@us.jibaoproxy.com:913"
Option B — a static proxy list
ROTATING_PROXY_LIST = [
"http://USERNAME:PASSWORD@proxy-a.example.com:8000",
"http://USERNAME:PASSWORD@proxy-b.example.com:8000",
"socks5://USERNAME:PASSWORD@proxy-c.example.com:1080",
]
That's it — run your spider as usual.
Configuration
| Setting | Default | Description |
|---|---|---|
ROTATING_PROXY_GATEWAY |
– | Single rotating-gateway URL. |
ROTATING_PROXY_LIST |
– | List of proxy URLs (used if no gateway). |
ROTATING_PROXY_BAN_CODES |
403, 407, 429, 503 |
Status codes treated as bans. |
ROTATING_PROXY_MAX_RETRIES |
5 |
Proxy rotations per request before giving up. |
Set a proxy on a single request explicitly and the middleware leaves it alone:
yield scrapy.Request(url, meta={"proxy": "http://USERNAME:PASSWORD@host:port"})
Ban detection
A response counts as a ban when its status is in ROTATING_PROXY_BAN_CODES, or the first 4 KB of the body matches a known anti-bot signature (cf-chl, Just a moment, Attention Required, captcha-delivery/DataDome, px-captcha/PerimeterX). On a ban the request is re-scheduled with a fresh proxy and dont_filter=True, up to the retry budget.
If you keep hitting bans after rotation, the exit IPs themselves are the problem — datacenter ranges get scored as bot traffic at the ASN level. Residential exits with clean ASN reputation are what actually pass. We build JiBao Proxy for exactly this: 72M+ residential IPs across 200+ countries, sticky sessions, and SOCKS5/HTTP gateways. The middleware works with any provider, though.
Related
- Scrapy proxy middleware: the complete guide
- Why your JA3/TLS fingerprint gets you blocked
- Bypassing DataDome & PerimeterX in 2026
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file scrapy_rotating_proxy_middleware-0.1.0.tar.gz.
File metadata
- Download URL: scrapy_rotating_proxy_middleware-0.1.0.tar.gz
- Upload date:
- Size: 6.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4fa01e170265f2d843544548b5e1536ba27da00b4975d9cc00147e65e34f46c0
|
|
| MD5 |
1a738b4203b6893186b7c837c3497a67
|
|
| BLAKE2b-256 |
93ed69df6f719b18e7dba354f6253ecbebeff858cd778e7c2fcdb0852cddc922
|
File details
Details for the file scrapy_rotating_proxy_middleware-0.1.0-py3-none-any.whl.
File metadata
- Download URL: scrapy_rotating_proxy_middleware-0.1.0-py3-none-any.whl
- Upload date:
- Size: 6.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
64d865fffb881041898a8b5606f3c001ac32995281901ac0d83b6101da1ec45e
|
|
| MD5 |
7939124e849f90e2af89832868770dc9
|
|
| BLAKE2b-256 |
b3bd3a3696b3f8139f1a59c833858bd11ece1be9e64c873b4c3cb709cbde1671
|