Skip to main content

A pluggable stealth and anti-bot framework for Scrapy

Project description

scrapy-stealth

A pluggable anti-bot and stealth framework for Scrapy.

Changelog

scrapy-stealth extends Scrapy with browser impersonation, proxy rotation, fingerprint cycling, and intelligent retry strategies — built for large-scale, production-grade crawling.


Features

  • Pluggable engine system (scrapy, stealth, or custom)
  • Browser impersonation (Chrome, Firefox, Safari, Edge, Opera — latest versions)
  • Per-request engine selection via request.meta
  • Proxy support and rotation
  • Browser fingerprint rotation
  • Smart retry logic that auto-escalates to stealth engine on block
  • Anti-bot detection (403/429 status codes + content keyword matching)
  • Thread-safe async integration

Installation

pip install scrapy-stealth

Requires Python 3.10+ and Scrapy 2.15+


Quick Start

1. Enable the middleware in settings.py

DOWNLOADER_MIDDLEWARES = {
    "scrapy_stealth.middlewares.stealth.StealthDownloaderMiddleware": 950,
}

2. Use it in your spider

By default, requests go through the standard Scrapy engine. To use the stealth engine, set engine in request.meta:

import scrapy

class MySpider(scrapy.Spider):
    name = "example"
    start_urls = ["https://example.com"]

    def start_requests(self):
        for url in self.start_urls:
            yield scrapy.Request(
                url,
                meta={"engine": "stealth"},
            )

    def parse(self, response):
        self.logger.info(f"Status: {response.status}")
        yield {"title": response.css("title::text").get()}

Per-Request Configuration

All stealth options are set via request.meta:

Key Type Description
engine str Engine to use: "scrapy" (default) or "stealth"
impersonate str Browser to impersonate: "chrome_137", "firefox_139", "safari_18_5", etc.
proxy str Proxy URL, e.g. "http://user:pass@host:port"

Example with all options

yield scrapy.Request(
    url="https://example.com",
    meta={
        "engine": "stealth",
        "impersonate": "firefox_139",
        "proxy": "http://user:pass@proxy-host:8080",
    },
)

Strategies

Proxy Rotation

Use ProxyRotator to randomly rotate proxies across requests:

from scrapy_stealth.strategies.proxy import ProxyRotator

proxy_strategy = ProxyRotator(proxies=[
    "http://proxy1:8080",
    "http://proxy2:8080",
    "http://proxy3:8080",
])

yield scrapy.Request(
    url="https://example.com",
    meta={
        "engine": "stealth",
        "proxy": proxy_strategy.get(),
    },
)

Fingerprint Rotation

Use ProfileRotator to randomly rotate the browser fingerprint:

from scrapy_stealth.strategies.fingerprint import ProfileRotator

fp = ProfileRotator()

yield scrapy.Request(
    url="https://example.com",
    meta={
        "engine": "stealth",
        "impersonate": fp.get(),  # randomly picks from latest Chrome, Firefox, Safari, Edge, Opera
    },
)

Intelligent Retry

Use RetryHandler in your spider or middleware to retry blocked responses with automatic engine escalation:

from scrapy_stealth.strategies.retry import RetryHandler

retry = RetryHandler()

def parse(self, response):
    if retry.should_retry(response):  # triggers on 403, 429, 503
        yield retry.build(response.request)  # retries via stealth engine
        return
    # normal parsing ...

build automatically:

  • Increments retry_times in meta
  • Switches engine to "stealth"
  • Sets dont_filter=True to bypass Scrapy's duplicate filter

Anti-Bot Detection

Use AntiBotDetector to classify responses as blocked:

from scrapy_stealth.detectors.antibot import AntiBotDetector

detector = AntiBotDetector()

def parse(self, response):
    if detector.is_blocked(response):
        self.logger.warning("Blocked! Retrying...")
        # handle retry ...
        return
    # normal parsing ...

Detects blocks via:

  • HTTP status codes: 403, 429
  • Body keywords: "captcha", "access denied", "verify you are human"

Full Example Spider

import scrapy
from scrapy_stealth.strategies.proxy import ProxyRotator
from scrapy_stealth.strategies.fingerprint import ProfileRotator
from scrapy_stealth.strategies.retry import RetryHandler
from scrapy_stealth.detectors.antibot import AntiBotDetector

proxy_rotator = ProxyRotator(proxies=[
    "http://proxy1:8080",
    "http://proxy2:8080",
])
fp_rotator = ProfileRotator()
retry_handler = RetryHandler()
detector = AntiBotDetector()


class StealthSpider(scrapy.Spider):
    name = "stealth_example"
    start_urls = ["https://example.com"]

    def start_requests(self):
        for url in self.start_urls:
            yield scrapy.Request(
                url,
                meta={
                    "engine": "stealth",
                    "impersonate": fp_rotator.get(),
                    "proxy": proxy_rotator.get(),
                },
            )

    def parse(self, response):
        if detector.is_blocked(response):
            self.logger.warning("Blocked response detected, retrying...")
            yield retry_handler.build(response.request)
            return

        yield {"title": response.css("title::text").get(), "url": response.url}

Supported Browsers for Impersonation

Value Browser
chrome_137 Chrome 137 (default)
chrome_136 Chrome 136
chrome_135 Chrome 135
chrome_134 Chrome 134
chrome_133 Chrome 133
chrome_132 Chrome 132
chrome_131 Chrome 131
chrome_130 Chrome 130
chrome_129 Chrome 129
firefox_139 Firefox 139
firefox_136 Firefox 136
firefox_135 Firefox 135
firefox_133 Firefox 133
firefox_private_136 Firefox 136 Private/Incognito
firefox_private_135 Firefox 135 Private/Incognito
firefox_android_135 Firefox Android 135
safari_18_5 Safari 18.5
safari_18_3_1 Safari 18.3.1
safari_18_3 Safari 18.3
safari_18_2 Safari 18.2
safari_18 Safari 18
safari_ios_18_1_1 Safari iOS 18.1.1
safari_ios_17_4_1 Safari iOS 17.4.1
safari_ios_17_2 Safari iOS 17.2
safari_ipad_18 Safari iPad 18
edge_134 Edge 134
edge_131 Edge 131
edge_127 Edge 127
edge_122 Edge 122
opera_119 Opera 119
opera_118 Opera 118
opera_117 Opera 117
opera_116 Opera 116
okhttp_5 OkHttp 5 (Android app)
okhttp_4_12 OkHttp 4.12 (Android app)
okhttp_4_10 OkHttp 4.10 (Android app)
okhttp_4_9 OkHttp 4.9 (Android app)
okhttp_3_14 OkHttp 3.14 (Android app)
okhttp_3_13 OkHttp 3.13 (Android app)
okhttp_3_11 OkHttp 3.11 (Android app)
okhttp_3_9 OkHttp 3.9 (Android app)

Requirements

  • Python 3.10+
  • scrapy >= 2.15.0

Contributing

Contributions are welcome! This is an open source project and all help is appreciated.

  1. Fork the repository on GitHub
  2. Create a feature branch: git checkout -b feature/my-feature
  3. Make your changes and add tests if applicable
  4. Open a pull request describing what you changed and why

Ways to contribute:

  • Report bugs via GitHub Issues
  • Suggest new engines, strategies, or detectors
  • Improve documentation or examples
  • Add support for new browser fingerprints

Changelog

See CHANGELOG.md for a full history of changes, or browse GitHub Releases.


License

This project is licensed under the MIT License — free to use, modify, and distribute. See LICENSE for the full text.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scrapy_stealth-0.1.0.tar.gz (19.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

scrapy_stealth-0.1.0-py3-none-any.whl (17.5 kB view details)

Uploaded Python 3

File details

Details for the file scrapy_stealth-0.1.0.tar.gz.

File metadata

  • Download URL: scrapy_stealth-0.1.0.tar.gz
  • Upload date:
  • Size: 19.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for scrapy_stealth-0.1.0.tar.gz
Algorithm Hash digest
SHA256 69ffe4e03c1db6fce889d58a625597b2163f833b2ae7072f9112a3b5e201597c
MD5 eff85dcf06697494f4b1b509812ddbcf
BLAKE2b-256 3536e021ae459642bf98f710fa9c5f297650bde4945a320c09b4acdbba95d83d

See more details on using hashes here.

File details

Details for the file scrapy_stealth-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: scrapy_stealth-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 17.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for scrapy_stealth-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5149fce03e8880ca58ee2f7180bb664e4cfad00229933daa2cf2d0c405293846
MD5 abbb5bfa89f1e054d55a8695b203356e
BLAKE2b-256 cd9e1890ef79e73d466e09ce5c3fd725e5ae566fd4bc3198cbc5d35de13971a4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page