Skip to main content

Stealthy Crawling. Maximum Results. A pluggable anti-bot and stealth framework for Scrapy.

Project description

scrapy-stealth logo

scrapy-stealth

Stealthy Crawling. Maximum Results.

A pluggable anti-bot and stealth framework for Scrapy.

PyPI version Python versions Downloads GitHub release License: MIT Changelog

scrapy-stealth extends Scrapy with browser impersonation, proxy rotation, fingerprint cycling, and intelligent retry strategies โ€” designed for large-scale, production-grade crawling.


๐Ÿง  Why scrapy-stealth?

Scrapy is fast and powerful, but modern websites use advanced anti-bot protections such as:

  • TLS fingerprinting
  • Browser behavior detection
  • Rate limiting and IP blocking

scrapy-stealth helps by adding:

  • ๐Ÿงฌ Browser-level impersonation (TLS + HTTP/2 fingerprints)
  • ๐Ÿ” Smarter retry strategies
  • ๐ŸŒ Proxy and fingerprint rotation
  • ๐Ÿ›ก๏ธ Anti-bot detection

Result

  • Higher success rate
  • Lower proxy cost
  • More stable crawls

๐Ÿ“Š Comparison

Feature scrapy-stealth scrapy-playwright scrapy-splash scrapy-selenium Scrapy (default)
TLS fingerprint spoofing โœ… โŒ โŒ โŒ โŒ
HTTP/2 support โœ… โœ… โŒ โŒ โŒ
Browser impersonation โœ… โš ๏ธ partial โŒ โŒ โŒ
Proxy rotation (built-in) โœ… โŒ โŒ โŒ โŒ
Fingerprint rotation โœ… โŒ โŒ โŒ โŒ
Anti-bot detection โœ… โŒ โŒ โŒ โŒ
Smart retry logic โœ… โŒ โŒ โŒ โŒ
Per-request engine switching โœ… โŒ โŒ โŒ โŒ
Headless browser required โŒ โœ… โœ… โœ… โŒ
JavaScript rendering โŒ โœ… โœ… โœ… โŒ
Native Scrapy integration โœ… โœ… โœ… โš ๏ธ partial โœ…
Memory footprint ๐ŸŸข Low ๐Ÿ”ด High ๐Ÿ”ด High ๐Ÿ”ด High ๐ŸŸข Low

โš ๏ธ scrapy-playwright passes real browser TLS but does not spoof fingerprint profiles like scrapy-stealth does. scrapy-stealth does not render JavaScript โ€” use it for APIs and HTML pages that don't require a full browser.


โœจ Features

  • ๐Ÿ”Œ Pluggable engine system (scrapy, stealth)
  • ๐Ÿง  Per-request engine selection via request.meta
  • ๐ŸŒ Proxy support and rotation
  • ๐Ÿงฌ Browser fingerprint rotation
  • ๐Ÿ” Smart retry logic
  • ๐Ÿ›ก๏ธ Anti-bot detection (status + content-based, Cloudflare, Akamai)
  • โšก Thread-safe async integration

๐Ÿ“ฆ Installation

pip install scrapy-stealth

Requires Python 3.11+ and Scrapy 2.15+


โš™๏ธ Setup

Option 1 โ€” Global (settings.py)

# 1. Enable the middleware
DOWNLOADER_MIDDLEWARES = {
    "scrapy_stealth.middlewares.stealth.StealthDownloaderMiddleware": 950,
}

# 2. (Optional) Proxy list for automatic rotation
#    Used when request.meta["rotate_proxy"] = True
#    Supported schemes: http, https, socks4, socks5
#    Each entry must include a scheme and port
STEALTH_PROXIES = [
    "http://proxy1:8080",
    "http://proxy2:8080",
    "http://user:pass@proxy3:8080",  # with authentication
    "socks5://proxy4:1080",
]

Option 2 โ€” Per-spider (custom_settings)

Configure the middleware and proxies directly on the spider โ€” no changes to settings.py required. Each spider can have its own independent proxy list.

class MySpider(scrapy.Spider):
    name = "example"

    custom_settings = {
        "DOWNLOADER_MIDDLEWARES": {
            "scrapy_stealth.middlewares.stealth.StealthDownloaderMiddleware": 950,
        },
        "STEALTH_PROXIES": [
            "http://proxy1:8080",
            "http://user:pass@proxy2:8080",
            "socks5://proxy3:1080",
        ],
    }

Proxies are validated at startup โ€” invalid format or unsupported scheme raises ValueError immediately.


๐Ÿš€ Quick Start

yield scrapy.Request(
    url="https://example.com",
    meta={
        "engine": "stealth",
    },
)

๐Ÿ”ง Global Configuration

Customise package-wide defaults via the shared config instance. All settings must be applied at module level, before the spider class โ€” the engine client is created at middleware initialisation, so changes inside start_requests or parse will have no effect.

# myspider.py
import scrapy
from scrapy_stealth.config import config

config.DEFAULT_ENGINE  = "stealth"      # "scrapy" (native) or "stealth" (browser impersonation)
config.DEFAULT_PROFILE = "chrome_147"   # browser profile when meta["profile"] is not set
config.DEFAULT_TIMEOUT = 30             # stealth request timeout in seconds
config.STEALTH_DRIVER  = "turbo"        # "basic" (default) or "turbo" (deeper TLS fingerprinting)
config.HTTP2           = True           # False for servers that only support HTTP/1.1
config.BLOCK_CODES    |= {407}          # extend blocked status codes (|= keeps defaults)
config.BLOCK_KEYWORDS.append("banned")  # extend blocked body-text patterns


class MySpider(scrapy.Spider):
    name = "example"
    ...
# โŒ wrong โ€” too late, the engine client is already created
class MySpider(scrapy.Spider):
    def start_requests(self):
        config.HTTP2 = False  # has no effect
        ...

You can also read any value programmatically:

config.get("DEFAULT_ENGINE")          # "scrapy"
config.get("MISSING_KEY", "default")  # "default"
Attribute Type Default Description
DEFAULT_ENGINE str "scrapy" Engine used when request.meta["engine"] is absent
DEFAULT_PROFILE str "chrome_147" Browser profile used when none is specified
DEFAULT_TIMEOUT int 30 Request timeout in seconds
STEALTH_DRIVER str "basic" Default driver for stealth engine: "basic" or "turbo"
HTTP2 bool True HTTP/2 mode; overridable per-request via meta["http2"]
BLOCK_CODES frozenset[int] {403, 429, 503} HTTP status codes considered blocked
BLOCK_KEYWORDS list[str] ["captcha", "access denied", โ€ฆ] Body-text patterns considered blocked

For one-off overrides on a single request, use request.meta["driver"] or meta["http2"] instead (see Per-Request Configuration below).


โš™๏ธ Per-Request Configuration

All options are passed via request.meta:

Key Type Description
engine str "scrapy" (default) or "stealth"
driver str "basic" (default) or "turbo" โ€” overrides config.STEALTH_DRIVER per-request
profile str Browser profile (e.g. "chrome_147", "safari_ios_18_1_1")
proxy str Explicit proxy URL
stealth_timeout int Per-request timeout in seconds (overrides default 30s)
http2 bool True = HTTP/2, False = HTTP/1.1 (overrides config.HTTP2 for this request)
rotate_proxy bool Auto-pick a proxy from STEALTH_PROXIES
rotate_profile bool Auto-pick a random browser profile

๐Ÿ” Automatic Rotation

yield scrapy.Request(
    url,
    meta={
        "engine": "stealth",
        "rotate_proxy": True,
        "rotate_profile": True,
    },
)

๐Ÿงฉ Strategies

Proxy Rotation

from scrapy_stealth.strategies.proxy import ProxyRotator

proxy_rotator = ProxyRotator([
    "http://proxy1:8080",
    "http://proxy2:8080",
])

yield scrapy.Request(
    url,
    meta={
        "engine": "stealth",
        "proxy": proxy_rotator.get(),
    },
)

Fingerprint Rotation

from scrapy_stealth.strategies.fingerprint import ProfileRotator

fp = ProfileRotator()

yield scrapy.Request(
    url,
    meta={
        "engine": "stealth",
        "profile": fp.get(),
    },
)

Intelligent Retry

from scrapy_stealth.strategies.retry import RetryHandler

retry = RetryHandler()


def parse(self, response):
    if retry.should_retry(response):
        yield retry.build(response.request)
        return

๐Ÿ›ก๏ธ Anti-Bot Detection

from scrapy_stealth.detectors.antibot import AntiBotDetector

detector = AntiBotDetector()

if detector.is_blocked(response):
    print("Blocked!")

๐Ÿ“Š Example

import scrapy


class ExampleSpider(scrapy.Spider):
    name = "example"

    def start_requests(self):
        yield scrapy.Request(
            "https://example.com",
            meta={
                "engine": "stealth",
                "rotate_proxy": True,
                "rotate_profile": True,
            },
        )

    def parse(self, response):
        yield {
            "title": response.css("title::text").get(),
            "url": response.url,
        }

โšก Performance Insight

Using stealth selectively:

  • โšก Faster crawling (Scrapy for simple pages)
  • ๐Ÿ’ฐ Lower proxy cost
  • ๐Ÿ›ก๏ธ Better success rate on protected pages

๐Ÿ“œ Changelog

See CHANGELOG.md for a full history of changes, or browse GitHub Releases.


๐Ÿค Contributing

See CONTRIBUTING.md for guidelines on how to contribute.


๐Ÿ“„ License

This project is licensed under the MIT License โ€” free to use, modify, and distribute. See LICENSE for the full text.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scrapy_stealth-0.3.0.tar.gz (28.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

scrapy_stealth-0.3.0-py3-none-any.whl (25.1 kB view details)

Uploaded Python 3

File details

Details for the file scrapy_stealth-0.3.0.tar.gz.

File metadata

  • Download URL: scrapy_stealth-0.3.0.tar.gz
  • Upload date:
  • Size: 28.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for scrapy_stealth-0.3.0.tar.gz
Algorithm Hash digest
SHA256 48cdb9c915b9f3ec68fe868b2174cf28e242bc0a30ed393100398fd70c1806e6
MD5 500b6bf6931921d81cbed46d32e4fc8e
BLAKE2b-256 4d2b8e14033823813108d7285111da8ed139337161b0ffd0dfb3ce058872ac1a

See more details on using hashes here.

Provenance

The following attestation bundles were made for scrapy_stealth-0.3.0.tar.gz:

Publisher: publish.yml on fawadss1/scrapy-stealth

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file scrapy_stealth-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: scrapy_stealth-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 25.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for scrapy_stealth-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 38b99d47ad2b581ac9cf73d7922925a4d377fc4e7126671b1b6d61a902e01eff
MD5 b1fb892b55780c11c6868f6b20fa4151
BLAKE2b-256 7a440a6818b1a1f1e6db037231723df9eb2ba24d902269a062dd9a3bafa6d0f8

See more details on using hashes here.

Provenance

The following attestation bundles were made for scrapy_stealth-0.3.0-py3-none-any.whl:

Publisher: publish.yml on fawadss1/scrapy-stealth

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page