Stealthy Crawling. Maximum Results. A pluggable anti-bot and stealth framework for Scrapy.
Project description
scrapy-stealth
Stealthy Crawling. Maximum Results.
A pluggable anti-bot and stealth framework for Scrapy.
scrapy-stealth extends Scrapy with browser impersonation, proxy rotation, fingerprint cycling, and intelligent retry strategies —
designed for large-scale, production-grade crawling.
🧠 Why scrapy-stealth?
Scrapy is fast and powerful, but modern websites use advanced anti-bot protections such as:
- TLS fingerprinting
- Browser behavior detection
- Rate limiting and IP blocking
scrapy-stealth helps by adding:
- 🧬 Browser-level impersonation (TLS + HTTP2 fingerprints)
- 🔁 Smarter retry strategies
- 🌐 Proxy and fingerprint rotation
- 🛡️ Anti-bot detection
Result
- Higher success rate
- Lower proxy cost
- More stable crawls
✨ Features
- 🔌 Pluggable engine system (
scrapy,stealth) - 🧠 Per-request engine selection via
request.meta - 🌐 Proxy support and rotation
- 🧬 Browser fingerprint rotation
- 🔁 Smart retry logic (manual integration)
- 🛡️ Anti-bot detection (status + content-based)
- ⚡ Thread-safe async integration
- Advanced anti-bot detection (Cloudflare, Akamai)
📦 Installation
pip install scrapy-stealth
Requires Python 3.10+ and Scrapy 2.15+
⚙️ Setup
Option 1 — Global (settings.py)
# 1. Enable the middleware
DOWNLOADER_MIDDLEWARES = {
"scrapy_stealth.middlewares.stealth.StealthDownloaderMiddleware": 950,
}
# 2. (Optional) Proxy list for automatic rotation
# Used when request.meta["rotate_proxy"] = True
# Supported schemes: http, https, socks4, socks5
# Each entry must include a scheme and port
STEALTH_PROXIES = [
"http://proxy1:8080",
"http://proxy2:8080",
"http://user:pass@proxy3:8080", # with authentication
"socks5://proxy4:1080",
]
Option 2 — Per-spider (custom_settings)
Configure the middleware and proxies directly on the spider — no changes to settings.py required.
Each spider can have its own independent proxy list.
class MySpider(scrapy.Spider):
name = "example"
custom_settings = {
"DOWNLOADER_MIDDLEWARES": {
"scrapy_stealth.middlewares.stealth.StealthDownloaderMiddleware": 950,
},
"STEALTH_PROXIES": [
"http://proxy1:8080",
"http://user:pass@proxy2:8080",
"socks5://proxy3:1080",
],
}
Proxies are validated at startup — invalid format or unsupported scheme raises
ValueErrorimmediately.
🚀 Quick Start
yield scrapy.Request(
url="https://example.com",
meta={
"engine": "stealth",
},
)
⚙️ Per-Request Configuration
All options are passed via request.meta:
| Key | Type | Description |
|---|---|---|
engine |
str |
"scrapy" (default) or "stealth" |
impersonate |
str |
Browser profile (e.g. "chrome_137", "safari_ios_18_1_1") |
proxy |
str |
Explicit proxy URL |
stealth_timeout |
int |
Per-request timeout in seconds (overrides default 30s) |
rotate_proxy |
bool |
Auto-pick a proxy from STEALTH_PROXIES |
rotate_profile |
bool |
Auto-pick a random browser profile |
🔁 Automatic Rotation
yield scrapy.Request(
url,
meta={
"engine": "stealth",
"rotate_proxy": True,
"rotate_profile": True,
},
)
🧩 Strategies
Proxy Rotation
from scrapy_stealth.strategies.proxy import ProxyRotator
proxy_rotator = ProxyRotator([
"http://proxy1:8080",
"http://proxy2:8080",
])
yield scrapy.Request(
url,
meta={
"engine": "stealth",
"proxy": proxy_rotator.get(),
},
)
Fingerprint Rotation
from scrapy_stealth.strategies.fingerprint import ProfileRotator
fp = ProfileRotator()
yield scrapy.Request(
url,
meta={
"engine": "stealth",
"impersonate": fp.get(),
},
)
Intelligent Retry
from scrapy_stealth.strategies.retry import RetryHandler
retry = RetryHandler()
def parse(self, response):
if retry.should_retry(response):
yield retry.build(response.request)
return
🛡️ Anti-Bot Detection
from scrapy_stealth.detectors.antibot import AntiBotDetector
detector = AntiBotDetector()
if detector.is_blocked(response):
print("Blocked!")
📊 Example
import scrapy
class ExampleSpider(scrapy.Spider):
name = "example"
def start_requests(self):
yield scrapy.Request(
"https://example.com",
meta={
"engine": "stealth",
"rotate_proxy": True,
"rotate_profile": True,
},
)
def parse(self, response):
yield {
"title": response.css("title::text").get(),
"url": response.url,
}
⚡ Performance Insight
Using stealth selectively:
- ⚡ Faster crawling (Scrapy for simple pages)
- 💰 Lower proxy cost
- 🛡️ Better success rate on protected pages
📜 Changelog
See CHANGELOG.md for a full history of changes, or browse GitHub Releases.
🤝 Contributing
See CONTRIBUTING.md for guidelines on how to contribute.
📄 License
This project is licensed under the MIT License — free to use, modify, and distribute. See LICENSE for the full text.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file scrapy_stealth-0.2.0.tar.gz.
File metadata
- Download URL: scrapy_stealth-0.2.0.tar.gz
- Upload date:
- Size: 21.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
198c367c73c8d167dcfa0de12f8e19ad1a57183143f7c3a3c58e32be728fcfb4
|
|
| MD5 |
a7052c461e169a6c17893800246b06f3
|
|
| BLAKE2b-256 |
487ebe65adec194c35cc316f21ced955266a3964ed05ade60726a17b61c7eb56
|
File details
Details for the file scrapy_stealth-0.2.0-py3-none-any.whl.
File metadata
- Download URL: scrapy_stealth-0.2.0-py3-none-any.whl
- Upload date:
- Size: 19.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f68197c2f5166183e6e898fdf282bba002f23ef6f21686fd7a245b3eb67c7871
|
|
| MD5 |
f632a768fa26ad87f3ed539a1fbd9e6d
|
|
| BLAKE2b-256 |
16be1ba72e082b359db16945044b23ab08ff6fe6a04d353e3aed7b7b6efae8e0
|