Stealthy Crawling. Maximum Results. A pluggable anti-bot and stealth framework for Scrapy.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

fawadss1

These details have not been verified by PyPI

Project description

scrapy-stealth logo

scrapy-stealth

Stealthy Crawling. Maximum Results.

A pluggable anti-bot and stealth framework for Scrapy.

scrapy-stealth extends Scrapy with browser impersonation, proxy rotation, fingerprint cycling, and intelligent retry strategies — designed for large-scale, production-grade crawling.

🧠 Why scrapy-stealth?

Scrapy is fast and powerful, but modern websites use advanced anti-bot protections such as:

TLS fingerprinting
Browser behavior detection
Rate limiting and IP blocking

scrapy-stealth helps by adding:

🧬 Browser-level impersonation (TLS + HTTP/2 fingerprints)
🔁 Smarter retry strategies
🌐 Proxy and fingerprint rotation
🛡️ Anti-bot detection

Result

Higher success rate
Lower proxy cost
More stable crawls

📊 Comparison

Feature	scrapy-stealth	scrapy-impersonate	scrapy-playwright	scrapy-splash	Scrapy (default)
TLS fingerprint spoofing	✅	✅	❌	❌	❌
HTTP/2 support	✅	✅	✅	❌	❌
Browser impersonation	✅	✅	⚠️ partial	❌	❌
Proxy rotation (built-in)	✅	❌	❌	❌	❌
Fingerprint rotation	✅	❌	❌	❌	❌
Anti-bot detection	✅	❌	❌	❌	❌
Smart retry logic	✅	❌	❌	❌	❌
Per-request engine switching	✅	❌	❌	❌	❌
Headless browser required	✅	❌	✅	✅	❌
JavaScript rendering	️✅	❌	✅	✅	❌
Screenshot / snapshot	✅	❌	✅	✅	❌
Native Scrapy integration	✅	✅	✅	✅	✅
Memory footprint	🟢 Low	🟢 Low	🔴 High	🔴 High	🟢 Low

⚠️ scrapy-playwright passes real browser TLS but does not spoof fingerprint profiles like scrapy-stealth does. scrapy-impersonate provides TLS/HTTP2 impersonation via curl_cffi but lacks built-in rotation, detection, or per-request engine switching. JavaScript rendering is available via the optional browser driver — use it selectively for pages that require a full browser.

✨ Features

🔌 Pluggable engine system (scrapy, stealth)
🧠 Per-request engine selection via request.meta
🌐 Proxy support and rotation
🧬 Browser fingerprint rotation
🔁 Smart retry logic
🛡️ Anti-bot detection (status + content-based, Cloudflare, Akamai)
⚡ Thread-safe async integration
🖥️ Real-browser engine (CDP) for JS-heavy pages
📸 Built-in snapshot decorator (scrapy_stealth.decorators.snapshot)

📦 Installation

pip install scrapy-stealth

Requires Python 3.11+ and Scrapy 2.12–2.x

⚙️ Setup

Option 1 — Global (`settings.py`)

# 1. Enable the middleware
DOWNLOADER_MIDDLEWARES = {
    "scrapy_stealth.StealthDownloaderMiddleware": 950,
}

# 2. (Optional) Route ALL requests through stealth automatically — no meta needed per request
STEALTH_ENABLED = True
STEALTH_DRIVER  = "turbo"   # "basic" (default), "turbo", or "browser"

# 3. (Optional) Proxy list for automatic rotation
#    Used when rotate_proxy=True (per-request) or when STEALTH_ENABLED=True with rotate_proxy
#    Supported schemes: http, https, socks4, socks5
STEALTH_PROXIES = [
    "http://proxy1:8080",
    "http://proxy2:8080",
    "http://user:pass@proxy3:8080",  # with authentication
    "socks5://proxy4:1080",
]

Option 2 — Per-spider (`custom_settings`)

Configure the middleware and all stealth settings directly on the spider — no changes to settings.py required.

class MySpider(scrapy.Spider):
    name = "example"

    custom_settings = {
        "DOWNLOADER_MIDDLEWARES": {
            "scrapy_stealth.StealthDownloaderMiddleware": 950,
        },
        "STEALTH_ENABLED": True,
        "STEALTH_DRIVER": "turbo",
        "STEALTH_PROXIES": [
            "http://proxy1:8080",
            "http://user:pass@proxy2:8080",
            "socks5://proxy3:1080",
        ],
    }

Proxies are validated at startup — invalid format or unsupported scheme raises ValueError immediately.

🚀 Quick Start

Option A — Per-request (stealth only on specific requests):

yield scrapy.Request(
    url="https://example.com",
    meta={"stealth": {}},
)

Option B — Global mode (stealth on every request automatically):

# settings.py or custom_settings
STEALTH_ENABLED = True
STEALTH_DRIVER  = "turbo"

# No meta needed — all requests go through stealth
yield scrapy.Request(url="https://example.com")

# Opt out for a specific request
yield scrapy.Request(url="https://api.internal/health", meta={"stealth": False})

🔧 Global Configuration

Customise package-wide defaults via the shared config instance. All settings must be applied at module level, before the spider class — the engine client is created at middleware initialisation, so changes inside start_requests or parse will have no effect.

# myspider.py
import scrapy
from scrapy_stealth.config import config

config.DEFAULT_ENGINE  = "stealth"      # "scrapy" (native) or "stealth" (browser impersonation)
config.DEFAULT_PROFILE = "chrome_147"   # browser profile when meta["stealth"]["profile"] is not set
config.DEFAULT_TIMEOUT = 30             # stealth request timeout in seconds
config.STEALTH_DRIVER  = "turbo"        # "basic" (default), "turbo", or "browser"
config.HTTP2           = True           # False for servers that only support HTTP/1.1
config.BLOCK_CODES    |= {407}          # extend blocked status codes (|= keeps defaults)
config.BLOCK_KEYWORDS.append("banned")  # extend blocked body-text patterns
config.BROWSER_HEADLESS = True          # browser driver: headless mode (False = visible window, more stealthy)
config.BROWSER_SETTLE_S = 4.0          # browser driver: seconds to wait after navigation for JS to finish


class MySpider(scrapy.Spider):
    name = "example"
    ...

# ❌ wrong — too late, the engine client is already created
class MySpider(scrapy.Spider):
    def start_requests(self):
        config.HTTP2 = False  # has no effect
        ...

You can also read any value programmatically:

config.get("DEFAULT_ENGINE")          # "scrapy"
config.get("MISSING_KEY", "default")  # "default"

Attribute	Type	Default	Description
`DEFAULT_ENGINE`	`str`	`"scrapy"`	Engine used when `request.meta["stealth"]` key is absent
`DEFAULT_PROFILE`	`str`	`"chrome_147"`	Browser profile used when none is specified
`DEFAULT_TIMEOUT`	`int`	`30`	Request timeout in seconds
`STEALTH_DRIVER`	`str`	`"basic"`	Default driver: `"basic"`, `"turbo"`, or `"browser"`. Also readable from Scrapy settings as `STEALTH_DRIVER`
`HTTP2`	`bool`	`True`	HTTP/2 mode; overridable per-request via `meta["stealth"]["http2"]`
`BLOCK_CODES`	`frozenset[int]`	`{403, 429, 503}`	HTTP status codes considered blocked
`BLOCK_KEYWORDS`	`list[str]`	`["captcha", "access denied", …]`	Body-text patterns considered blocked
`BROWSER_HEADLESS`	`bool`	`True`	Browser driver: headless mode (`False` = visible window, more stealthy)
`BROWSER_SETTLE_S`	`float`	`4.0`	Browser driver: seconds to wait after navigation for JS to finish rendering
`BROWSER_NO_SANDBOX`	`bool \| None`	`None`	Browser driver: disable Chrome sandbox. `None` = auto-detect (enabled when running as root, e.g. Docker)

For one-off overrides on a single request, set meta["stealth"]["driver"] or meta["stealth"]["http2"] (see Per-Request Configuration below).

⚙️ Per-Request Configuration

All options are passed via request.meta["stealth"].

The presence of meta["stealth"] (a dict) activates the stealth engine. Omit the key to use the default Scrapy engine. When STEALTH_ENABLED = True, all requests are stealth by default — pass meta={"stealth": False} to opt out for a specific request.

yield scrapy.Request(
    url,
    meta={
        "stealth": {
            "driver": "turbo",
            "profile": "chrome_147",
            "proxy": "http://user:pass@proxy:8080",
            "stealth_timeout": 60,
            "http2": True,
            "rotate_proxy": True,
            "rotate_profile": True,
        }
    },
)

Key	Type	Description
`driver`	`str`	`"basic"`, `"turbo"`, or `"browser"` — overrides `config.STEALTH_DRIVER` per-request
`profile`	`str`	Browser profile (e.g. `"chrome_147"`, `"safari_ios_18_1_1"`)
`proxy`	`str`	Explicit proxy URL
`stealth_timeout`	`int`	Per-request timeout in seconds (overrides default 30s)
`http2`	`bool`	`True` = HTTP/2, `False` = HTTP/1.1 (overrides `config.HTTP2` for this request)
`rotate_proxy`	`bool`	Auto-pick a proxy from `STEALTH_PROXIES`
`rotate_profile`	`bool`	Auto-pick a random browser profile
`headless`	`bool`	Browser driver only: `True` = headless, `False` = visible window (more stealthy)
`settle`	`float`	Browser driver only: seconds to wait for JS after navigation (default `4.0`)
`snapshot`	`bool`	Browser driver only: capture a PNG snapshot — result available as `response.meta["snapshot_content"]` (`bytes`)

🖥️ Browser Engine

For sites protected by Cloudflare JS challenges or heavy JavaScript rendering, use the browser driver. It runs a real Chrome instance via the DevTools Protocol (no WebDriver), keeping one persistent browser and opening a new tab per request.

Per-request (most common):

yield scrapy.Request(
    url,
    meta={
        "stealth": {
            "driver": "browser",
            "headless": False,   # visible window — harder to detect (default: True)
            "settle": 4.0,       # seconds to wait for JS after page load
        }
    },
)

Heavy Cloudflare sites — increase settle time:

meta={"stealth": {"driver": "browser", "headless": False, "settle": 12}}

Global default (all stealth requests use browser engine):

from scrapy_stealth.config import config

config.STEALTH_DRIVER   = "browser"
config.BROWSER_HEADLESS = False   # more stealthy
config.BROWSER_SETTLE_S = 6.0    # longer wait for JS

Docker (running as root):

Chrome requires --no-sandbox when the process runs as root. scrapy-stealth detects this automatically, but you can also set it explicitly in settings.py:

BROWSER_NO_SANDBOX = True   # force no-sandbox (Docker, any root environment)

Or via config:

config.BROWSER_NO_SANDBOX = True

Performance note: the browser engine is slower than basic/turbo (~5-15s per page vs <2s). Use it selectively — route only JS-protected URLs to "browser" and keep everything else on "turbo".

📸 Screenshots

Capture a PNG screenshot of any page rendered by the browser driver and save it to disk.

Enable on the request

yield scrapy.Request(
    url,
    meta={
        "stealth": {
            "driver": "browser",
            "snapshot": True,
        }
    },
    callback=self.parse,
)

The raw PNG bytes are available at response.meta["snapshot_content"] inside your callback.

Auto-save with `snapshot` decorator

from scrapy_stealth.decorators import snapshot

class MySpider(scrapy.Spider):

    @snapshot
    def parse(self, response): ...

    @snapshot(path="stealth_shots/page.png")
    def parse(self, response): ...

    @snapshot(path=lambda r: r.url.split("/")[-1] + ".png")
    def parse(self, response): ...

Note: Requires driver="browser" and snapshot=True in the request meta. Logs an error if no snapshot data is found in the response.

Custom handling (without the built-in helper)

The screenshot is just bytes in response.meta["snapshot_content"] — do anything you like with it:

def parse(self, response):
    shot: bytes | None = response.meta.get("snapshot_content")
    if shot is None:
        return  # screenshot was not requested or capture failed

    # Save manually
    with open("page.png", "wb") as f:
        f.write(shot)

    # Pass to a pipeline via item
    yield {"url": response.url, "screenshot": shot}

🔁 Automatic Rotation

yield scrapy.Request(
    url,
    meta={
        "stealth": {
            "rotate_proxy": True,
            "rotate_profile": True,
        }
    },
)

🧩 Strategies

Proxy Rotation

from scrapy_stealth.strategies.proxy import ProxyRotator

proxy_rotator = ProxyRotator([
    "http://proxy1:8080",
    "http://proxy2:8080",
])

yield scrapy.Request(
    url,
    meta={
        "stealth": {
            "proxy": proxy_rotator.get(),
        }
    },
)

Fingerprint Rotation

from scrapy_stealth.strategies.fingerprint import ProfileRotator

fp = ProfileRotator()

yield scrapy.Request(
    url,
    meta={
        "stealth": {
            "profile": fp.get(),
        }
    },
)

Intelligent Retry

from scrapy_stealth.strategies.retry import RetryHandler

retry = RetryHandler()


def parse(self, response):
    if retry.should_retry(response):
        yield retry.build(response.request)
        return

🛡️ Anti-Bot Detection

from scrapy_stealth.detectors.antibot import AntiBotDetector

detector = AntiBotDetector()

if detector.is_blocked(response):
    print("Blocked!")

📊 Example

import scrapy


class ExampleSpider(scrapy.Spider):
    name = "example"

    def start_requests(self):
        yield scrapy.Request(
            "https://example.com",
            meta={
                "stealth": {
                    "rotate_proxy": True,
                    "rotate_profile": True,
                }
            },
        )

    def parse(self, response):
        yield {
            "title": response.css("title::text").get(),
            "url": response.url,
        }

⚡ Performance Insight

Using stealth selectively:

⚡ Faster crawling (Scrapy for simple pages)
💰 Lower proxy cost
🛡️ Better success rate on protected pages

📜 Changelog

See CHANGELOG.md for a full history of changes, or browse GitHub Releases.

🤝 Contributing

See CONTRIBUTING.md for guidelines on how to contribute.

📄 License

This project is licensed under the MIT License — free to use, modify, and distribute. See LICENSE for the full text.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

fawadss1

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.6.9a2 pre-release

Jun 18, 2026

0.6.9a1 pre-release

Jun 18, 2026

0.6.8

Jun 18, 2026

0.6.8b2 pre-release

Jun 18, 2026

0.6.8b1 pre-release

Jun 16, 2026

0.6.8a1 pre-release

Jun 12, 2026

0.6.7

Jun 10, 2026

0.6.7a4 pre-release

Jun 9, 2026

0.6.7a3 pre-release

Jun 9, 2026

0.6.7a2 pre-release

Jun 9, 2026

0.6.7a1 pre-release

Jun 9, 2026

0.6.6

Jun 8, 2026

0.6.6a10 pre-release

Jun 8, 2026

0.6.6a9 pre-release

Jun 5, 2026

0.6.6a8 pre-release

Jun 5, 2026

0.6.6a7 pre-release

Jun 5, 2026

0.6.6a5 pre-release

Jun 4, 2026

0.6.6a4 pre-release

Jun 4, 2026

0.6.6a3 pre-release

Jun 4, 2026

This version

0.6.6a2 pre-release

Jun 4, 2026

0.6.6a1 pre-release

Jun 4, 2026

0.6.5

Jun 1, 2026

0.6.4

May 21, 2026

0.6.3

May 20, 2026

0.6.2

May 20, 2026

0.6.2a3 pre-release

May 19, 2026

0.6.2a2 pre-release

May 19, 2026

0.6.2a1 pre-release

May 19, 2026

0.6.1

May 18, 2026

0.6.0

May 12, 2026

0.5.0

May 11, 2026

0.4.0

May 6, 2026

0.3.0

May 1, 2026

0.2.1

Apr 27, 2026

0.2.0

Apr 23, 2026

0.1.0

Apr 23, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scrapy_stealth-0.6.6a2.tar.gz (45.7 kB view details)

Uploaded Jun 4, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

scrapy_stealth-0.6.6a2-py3-none-any.whl (38.9 kB view details)

Uploaded Jun 4, 2026 Python 3

File details

Details for the file scrapy_stealth-0.6.6a2.tar.gz.

File metadata

Download URL: scrapy_stealth-0.6.6a2.tar.gz
Upload date: Jun 4, 2026
Size: 45.7 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for scrapy_stealth-0.6.6a2.tar.gz
Algorithm	Hash digest
SHA256	`facbacf51003874801694c6846b02d4261cd63d696efefc4339a1a8ad0fa0ebb`
MD5	`616d6deddebecb1e21fa820df90ccd1e`
BLAKE2b-256	`b0a90b241d841d0d6b93c6a0b37cf7822a93cc3d76d1747ad566a9435c5dfca4`

See more details on using hashes here.

Provenance

The following attestation bundles were made for scrapy_stealth-0.6.6a2.tar.gz:

Publisher: publish.yml on fawadss1/scrapy-stealth

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: scrapy_stealth-0.6.6a2.tar.gz
- Subject digest: facbacf51003874801694c6846b02d4261cd63d696efefc4339a1a8ad0fa0ebb
- Sigstore transparency entry: 1719000614
- Sigstore integration time: Jun 4, 2026
Source repository:
- Permalink: fawadss1/scrapy-stealth@389c39ab72f8e9a5ff29639953cf4637110b96a7
- Branch / Tag: refs/tags/v0.6.6a2
- Owner: https://github.com/fawadss1
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@389c39ab72f8e9a5ff29639953cf4637110b96a7
- Trigger Event: release

File details

Details for the file scrapy_stealth-0.6.6a2-py3-none-any.whl.

File metadata

Download URL: scrapy_stealth-0.6.6a2-py3-none-any.whl
Upload date: Jun 4, 2026
Size: 38.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for scrapy_stealth-0.6.6a2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`45cc97eb28a5e54dc9299e517b505debd7928073917474e9d5d2a974b741f437`
MD5	`9090f21411f4930c2377c21f37f92ac9`
BLAKE2b-256	`6e0e2202b6b2cc938b5256128cd7ef06ccd14ce294a23bf12594ac0bfe0ebf1f`

See more details on using hashes here.

Provenance

The following attestation bundles were made for scrapy_stealth-0.6.6a2-py3-none-any.whl:

Publisher: publish.yml on fawadss1/scrapy-stealth

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: scrapy_stealth-0.6.6a2-py3-none-any.whl
- Subject digest: 45cc97eb28a5e54dc9299e517b505debd7928073917474e9d5d2a974b741f437
- Sigstore transparency entry: 1719000848
- Sigstore integration time: Jun 4, 2026
Source repository:
- Permalink: fawadss1/scrapy-stealth@389c39ab72f8e9a5ff29639953cf4637110b96a7
- Branch / Tag: refs/tags/v0.6.6a2
- Owner: https://github.com/fawadss1
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@389c39ab72f8e9a5ff29639953cf4637110b96a7
- Trigger Event: release

scrapy-stealth 0.6.6a2

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Meta

Unverified details

Meta

Classifiers

Project description

scrapy-stealth

🧠 Why scrapy-stealth?

Result

📊 Comparison

✨ Features

📦 Installation

⚙️ Setup

Option 1 — Global (settings.py)

Option 2 — Per-spider (custom_settings)

🚀 Quick Start

🔧 Global Configuration

⚙️ Per-Request Configuration

🖥️ Browser Engine

📸 Screenshots

Enable on the request

Auto-save with snapshot decorator

Custom handling (without the built-in helper)

🔁 Automatic Rotation

🧩 Strategies

Proxy Rotation

Fingerprint Rotation

Intelligent Retry

🛡️ Anti-Bot Detection

📊 Example

⚡ Performance Insight

📜 Changelog

🤝 Contributing

📄 License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Meta

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

Option 1 — Global (`settings.py`)

Option 2 — Per-spider (`custom_settings`)

Auto-save with `snapshot` decorator