Skip to main content

Stealthy Crawling. Maximum Results. A pluggable anti-bot and stealth framework for Scrapy.

Project description

scrapy-stealth logo

scrapy-stealth

Stealthy Crawling. Maximum Results.

A pluggable anti-bot and stealth framework for Scrapy.

PyPI version Python versions Downloads GitHub release License: MIT Changelog

scrapy-stealth extends Scrapy with browser impersonation, proxy rotation, fingerprint cycling, and intelligent retry strategies — designed for large-scale, production-grade crawling.


🧠 Why scrapy-stealth?

Scrapy is fast and powerful, but modern websites use advanced anti-bot protections such as:

  • TLS fingerprinting
  • Browser behavior detection
  • Rate limiting and IP blocking

scrapy-stealth helps by adding:

  • 🧬 Browser-level impersonation (TLS + HTTP2 fingerprints)
  • 🔁 Smarter retry strategies
  • 🌐 Proxy and fingerprint rotation
  • 🛡️ Anti-bot detection

Result

  • Higher success rate
  • Lower proxy cost
  • More stable crawls

✨ Features

  • 🔌 Pluggable engine system (scrapy, stealth)
  • 🧠 Per-request engine selection via request.meta
  • 🌐 Proxy support and rotation
  • 🧬 Browser fingerprint rotation
  • 🔁 Smart retry logic (manual integration)
  • 🛡️ Anti-bot detection (status + content-based)
  • ⚡ Thread-safe async integration
  • Advanced anti-bot detection (Cloudflare, Akamai)

📦 Installation

pip install scrapy-stealth

Requires Python 3.10+ and Scrapy 2.15+


⚙️ Setup

Option 1 — Global (settings.py)

# 1. Enable the middleware
DOWNLOADER_MIDDLEWARES = {
    "scrapy_stealth.middlewares.stealth.StealthDownloaderMiddleware": 950,
}

# 2. (Optional) Proxy list for automatic rotation
#    Used when request.meta["rotate_proxy"] = True
#    Supported schemes: http, https, socks4, socks5
#    Each entry must include a scheme and port
STEALTH_PROXIES = [
    "http://proxy1:8080",
    "http://proxy2:8080",
    "http://user:pass@proxy3:8080",  # with authentication
    "socks5://proxy4:1080",
]

Option 2 — Per-spider (custom_settings)

Configure the middleware and proxies directly on the spider — no changes to settings.py required. Each spider can have its own independent proxy list.

class MySpider(scrapy.Spider):
    name = "example"

    custom_settings = {
        "DOWNLOADER_MIDDLEWARES": {
            "scrapy_stealth.middlewares.stealth.StealthDownloaderMiddleware": 950,
        },
        "STEALTH_PROXIES": [
            "http://proxy1:8080",
            "http://user:pass@proxy2:8080",
            "socks5://proxy3:1080",
        ],
    }

Proxies are validated at startup — invalid format or unsupported scheme raises ValueError immediately.


🚀 Quick Start

yield scrapy.Request(
    url="https://example.com",
    meta={
        "engine": "stealth",
    },
)

⚙️ Per-Request Configuration

All options are passed via request.meta:

Key Type Description
engine str "scrapy" (default) or "stealth"
profile str Browser profile (e.g. "chrome_147", "safari_ios_18_1_1")
proxy str Explicit proxy URL
stealth_timeout int Per-request timeout in seconds (overrides default 30s)
rotate_proxy bool Auto-pick a proxy from STEALTH_PROXIES
rotate_profile bool Auto-pick a random browser profile

🔁 Automatic Rotation

yield scrapy.Request(
    url,
    meta={
        "engine": "stealth",
        "rotate_proxy": True,
        "rotate_profile": True,
    },
)

🧩 Strategies

Proxy Rotation

from scrapy_stealth.strategies.proxy import ProxyRotator

proxy_rotator = ProxyRotator([
    "http://proxy1:8080",
    "http://proxy2:8080",
])

yield scrapy.Request(
    url,
    meta={
        "engine": "stealth",
        "proxy": proxy_rotator.get(),
    },
)

Fingerprint Rotation

from scrapy_stealth.strategies.fingerprint import ProfileRotator

fp = ProfileRotator()

yield scrapy.Request(
    url,
    meta={
        "engine": "stealth",
        "profile": fp.get(),
    },
)

Intelligent Retry

from scrapy_stealth.strategies.retry import RetryHandler

retry = RetryHandler()


def parse(self, response):
    if retry.should_retry(response):
        yield retry.build(response.request)
        return

🛡️ Anti-Bot Detection

from scrapy_stealth.detectors.antibot import AntiBotDetector

detector = AntiBotDetector()

if detector.is_blocked(response):
    print("Blocked!")

📊 Example

import scrapy


class ExampleSpider(scrapy.Spider):
    name = "example"

    def start_requests(self):
        yield scrapy.Request(
            "https://example.com",
            meta={
                "engine": "stealth",
                "rotate_proxy": True,
                "rotate_profile": True,
            },
        )

    def parse(self, response):
        yield {
            "title": response.css("title::text").get(),
            "url": response.url,
        }

⚡ Performance Insight

Using stealth selectively:

  • ⚡ Faster crawling (Scrapy for simple pages)
  • 💰 Lower proxy cost
  • 🛡️ Better success rate on protected pages

📜 Changelog

See CHANGELOG.md for a full history of changes, or browse GitHub Releases.


🤝 Contributing

See CONTRIBUTING.md for guidelines on how to contribute.


📄 License

This project is licensed under the MIT License — free to use, modify, and distribute. See LICENSE for the full text.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scrapy_stealth-0.2.1.tar.gz (22.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

scrapy_stealth-0.2.1-py3-none-any.whl (19.8 kB view details)

Uploaded Python 3

File details

Details for the file scrapy_stealth-0.2.1.tar.gz.

File metadata

  • Download URL: scrapy_stealth-0.2.1.tar.gz
  • Upload date:
  • Size: 22.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for scrapy_stealth-0.2.1.tar.gz
Algorithm Hash digest
SHA256 6e89f33357a54a013e75e41f37ca4c376e5c4495af0bc2aecace4c014c619a9d
MD5 a2add248fb24caad0d273afecb99b10c
BLAKE2b-256 3fb0ab8ade331819d3e0023c8278bd62229c3a40db110f1f7cd1c06bc6b55f21

See more details on using hashes here.

File details

Details for the file scrapy_stealth-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: scrapy_stealth-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 19.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for scrapy_stealth-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 b263e849a3d9c55892c0326753003b4e7e472f634b1fd3e7ef2e05f60c8b102e
MD5 6b385f743a62c8f41912e8d81283baee
BLAKE2b-256 88c3dba7935f8d42d36e875620a3c12f9767666cd1cfef1c7ce18e15639de491

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page