A pluggable stealth and anti-bot framework for Scrapy
Project description
scrapy-stealth
A pluggable anti-bot and stealth framework for Scrapy.
scrapy-stealth extends Scrapy with browser impersonation, proxy rotation, fingerprint cycling, and intelligent retry strategies — built for large-scale, production-grade crawling.
Features
- Pluggable engine system (
scrapy,stealth, or custom) - Browser impersonation (Chrome, Firefox, Safari, Edge, Opera — latest versions)
- Per-request engine selection via
request.meta - Proxy support and rotation
- Browser fingerprint rotation
- Smart retry logic that auto-escalates to stealth engine on block
- Anti-bot detection (403/429 status codes + content keyword matching)
- Thread-safe async integration
Installation
pip install scrapy-stealth
Requires Python 3.10+ and Scrapy 2.15+
Quick Start
1. Enable the middleware in settings.py
DOWNLOADER_MIDDLEWARES = {
"scrapy_stealth.middlewares.stealth.StealthDownloaderMiddleware": 950,
}
2. Use it in your spider
By default, requests go through the standard Scrapy engine. To use the stealth engine, set engine in request.meta:
import scrapy
class MySpider(scrapy.Spider):
name = "example"
start_urls = ["https://example.com"]
def start_requests(self):
for url in self.start_urls:
yield scrapy.Request(
url,
meta={"engine": "stealth"},
)
def parse(self, response):
self.logger.info(f"Status: {response.status}")
yield {"title": response.css("title::text").get()}
Per-Request Configuration
All stealth options are set via request.meta:
| Key | Type | Description |
|---|---|---|
engine |
str |
Engine to use: "scrapy" (default) or "stealth" |
impersonate |
str |
Browser to impersonate: "chrome_137", "firefox_139", "safari_18_5", etc. |
proxy |
str |
Proxy URL, e.g. "http://user:pass@host:port" |
Example with all options
yield scrapy.Request(
url="https://example.com",
meta={
"engine": "stealth",
"impersonate": "firefox_139",
"proxy": "http://user:pass@proxy-host:8080",
},
)
Strategies
Proxy Rotation
Use ProxyRotator to randomly rotate proxies across requests:
from scrapy_stealth.strategies.proxy import ProxyRotator
proxy_strategy = ProxyRotator(proxies=[
"http://proxy1:8080",
"http://proxy2:8080",
"http://proxy3:8080",
])
yield scrapy.Request(
url="https://example.com",
meta={
"engine": "stealth",
"proxy": proxy_strategy.get(),
},
)
Fingerprint Rotation
Use ProfileRotator to randomly rotate the browser fingerprint:
from scrapy_stealth.strategies.fingerprint import ProfileRotator
fp = ProfileRotator()
yield scrapy.Request(
url="https://example.com",
meta={
"engine": "stealth",
"impersonate": fp.get(), # randomly picks from latest Chrome, Firefox, Safari, Edge, Opera
},
)
Intelligent Retry
Use RetryHandler in your spider or middleware to retry blocked responses with automatic engine escalation:
from scrapy_stealth.strategies.retry import RetryHandler
retry = RetryHandler()
def parse(self, response):
if retry.should_retry(response): # triggers on 403, 429, 503
yield retry.build(response.request) # retries via stealth engine
return
# normal parsing ...
build automatically:
- Increments
retry_timesin meta - Switches
engineto"stealth" - Sets
dont_filter=Trueto bypass Scrapy's duplicate filter
Anti-Bot Detection
Use AntiBotDetector to classify responses as blocked:
from scrapy_stealth.detectors.antibot import AntiBotDetector
detector = AntiBotDetector()
def parse(self, response):
if detector.is_blocked(response):
self.logger.warning("Blocked! Retrying...")
# handle retry ...
return
# normal parsing ...
Detects blocks via:
- HTTP status codes:
403,429 - Body keywords:
"captcha","access denied","verify you are human"
Full Example Spider
import scrapy
from scrapy_stealth.strategies.proxy import ProxyRotator
from scrapy_stealth.strategies.fingerprint import ProfileRotator
from scrapy_stealth.strategies.retry import RetryHandler
from scrapy_stealth.detectors.antibot import AntiBotDetector
proxy_rotator = ProxyRotator(proxies=[
"http://proxy1:8080",
"http://proxy2:8080",
])
fp_rotator = ProfileRotator()
retry_handler = RetryHandler()
detector = AntiBotDetector()
class StealthSpider(scrapy.Spider):
name = "stealth_example"
start_urls = ["https://example.com"]
def start_requests(self):
for url in self.start_urls:
yield scrapy.Request(
url,
meta={
"engine": "stealth",
"impersonate": fp_rotator.get(),
"proxy": proxy_rotator.get(),
},
)
def parse(self, response):
if detector.is_blocked(response):
self.logger.warning("Blocked response detected, retrying...")
yield retry_handler.build(response.request)
return
yield {"title": response.css("title::text").get(), "url": response.url}
Supported Browsers for Impersonation
| Value | Browser |
|---|---|
chrome_137 |
Chrome 137 (default) |
chrome_136 |
Chrome 136 |
chrome_135 |
Chrome 135 |
chrome_134 |
Chrome 134 |
chrome_133 |
Chrome 133 |
chrome_132 |
Chrome 132 |
chrome_131 |
Chrome 131 |
chrome_130 |
Chrome 130 |
chrome_129 |
Chrome 129 |
firefox_139 |
Firefox 139 |
firefox_136 |
Firefox 136 |
firefox_135 |
Firefox 135 |
firefox_133 |
Firefox 133 |
firefox_private_136 |
Firefox 136 Private/Incognito |
firefox_private_135 |
Firefox 135 Private/Incognito |
firefox_android_135 |
Firefox Android 135 |
safari_18_5 |
Safari 18.5 |
safari_18_3_1 |
Safari 18.3.1 |
safari_18_3 |
Safari 18.3 |
safari_18_2 |
Safari 18.2 |
safari_18 |
Safari 18 |
safari_ios_18_1_1 |
Safari iOS 18.1.1 |
safari_ios_17_4_1 |
Safari iOS 17.4.1 |
safari_ios_17_2 |
Safari iOS 17.2 |
safari_ipad_18 |
Safari iPad 18 |
edge_134 |
Edge 134 |
edge_131 |
Edge 131 |
edge_127 |
Edge 127 |
edge_122 |
Edge 122 |
opera_119 |
Opera 119 |
opera_118 |
Opera 118 |
opera_117 |
Opera 117 |
opera_116 |
Opera 116 |
okhttp_5 |
OkHttp 5 (Android app) |
okhttp_4_12 |
OkHttp 4.12 (Android app) |
okhttp_4_10 |
OkHttp 4.10 (Android app) |
okhttp_4_9 |
OkHttp 4.9 (Android app) |
okhttp_3_14 |
OkHttp 3.14 (Android app) |
okhttp_3_13 |
OkHttp 3.13 (Android app) |
okhttp_3_11 |
OkHttp 3.11 (Android app) |
okhttp_3_9 |
OkHttp 3.9 (Android app) |
Requirements
- Python 3.10+
scrapy >= 2.15.0
Contributing
Contributions are welcome! This is an open source project and all help is appreciated.
- Fork the repository on GitHub
- Create a feature branch:
git checkout -b feature/my-feature - Make your changes and add tests if applicable
- Open a pull request describing what you changed and why
Ways to contribute:
- Report bugs via GitHub Issues
- Suggest new engines, strategies, or detectors
- Improve documentation or examples
- Add support for new browser fingerprints
Changelog
See CHANGELOG.md for a full history of changes, or browse GitHub Releases.
License
This project is licensed under the MIT License — free to use, modify, and distribute. See LICENSE for the full text.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file scrapy_stealth-0.1.0.tar.gz.
File metadata
- Download URL: scrapy_stealth-0.1.0.tar.gz
- Upload date:
- Size: 19.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
69ffe4e03c1db6fce889d58a625597b2163f833b2ae7072f9112a3b5e201597c
|
|
| MD5 |
eff85dcf06697494f4b1b509812ddbcf
|
|
| BLAKE2b-256 |
3536e021ae459642bf98f710fa9c5f297650bde4945a320c09b4acdbba95d83d
|
File details
Details for the file scrapy_stealth-0.1.0-py3-none-any.whl.
File metadata
- Download URL: scrapy_stealth-0.1.0-py3-none-any.whl
- Upload date:
- Size: 17.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5149fce03e8880ca58ee2f7180bb664e4cfad00229933daa2cf2d0c405293846
|
|
| MD5 |
abbb5bfa89f1e054d55a8695b203356e
|
|
| BLAKE2b-256 |
cd9e1890ef79e73d466e09ce5c3fd725e5ae566fd4bc3198cbc5d35de13971a4
|