ShadowCrawler — High-performance modular crawling framework

These details have not been verified by PyPI

Project links

Project description

ShadowCrawler

A modern, domain‑aware, hybrid web crawling framework for Python

ShadowCrawler is a modular, extensible crawling framework designed for developers who want full control over how websites are fetched, parsed, and processed.
It combines speed, modularity, and browser‑level extraction into a single, clean architecture.

❤️ Origin Story

ShadowCrawler began as a small personal project — a quiet gift, a spark of affection — and unexpectedly grew into a full, production‑ready crawling framework.
It was built with care, curiosity, and intention.
Originally created for my guiding star, and built with the help of my AI copilot — a companion in code, clarity, and curiosity.

✨ Features

Automatic domain detection
Hybrid fetcher (HTTP + Playwright)
Persistent authentication
Modular spiders
Media pipeline
Checkpointing
Full CLI toolkit

Requirements

Python 3.10+
Playwright installed:
```
playwright install
```

🚀 Installation

pip install shadowcrawler

⚡ Quickstart

Run with automatic spider detection:

shadowcrawler run --url https://quotes.toscrape.com

Run with browser mode:

shadowcrawler run --url https://demoqa.com/login --browser

List spiders:

shadowcrawler spiders list

🕷 Creating a Spider

from shadowcrawler.core.spider_base import SpiderBase

class QuotesSpider(SpiderBase):
    domain = "quotes.toscrape.com"

    async def parse(self, response):
        for quote in response.css(".quote"):
            yield {
                "text": quote.css(".text::text").get(),
                "author": quote.css(".author::text").get(),
            }

🔍 Domain Autodetection

ShadowCrawler automatically selects the correct spider based on the URL:

shadowcrawler run --url https://example.com/page

If your spider declares:

domain = "example.com"

…it will be used automatically.

🌐 Fetch Modes

HTTP Mode (default)
Fast, lightweight, ideal for most sites.

Browser Mode (Playwright)
Used automatically when:

login is required
the site is dynamic
the spider requests browser mode

🔐 Persistent Authentication

Login once
Session saved to JSON
BrowserManager loads it automatically
AuthHandler detects login state

🖼 Media Pipeline

Automatically extracts:

images
videos
GIFs
downloadable files

🧰 CLI Commands

run
resume
download
spiders list
spiders create
inspect
stats
version

📁 Project Structure

shadowcrawler/
  core/
  spiders/
  site_extractors/
  auth/
  cli/
  models/
  parsing/
  tools/

🕸 Included Example Spiders

QuotesSpider
WikiSpider
HTTPNewsSpider
GallerySpider
AuthBrowserDemoSpider

🗺 Roadmap

PyPI release
Plugin system
Distributed crawling
Dashboard / Web UI
Cloud runner
Spider templates
Auto‑throttling

📦 itch.io Distribution

ShadowCrawler is also distributed through itch.io, where you can get:

The latest stable release
Optional Pro features
Example spiders
Early access builds
Support the project directly

☕ Support the Project

If ShadowCrawler has helped you or you want to support future development, you can leave a tip on Ko‑fi.
Every contribution helps keep the project alive and evolving.

https://ko-fi.com/shadowcrawlerframework

📜 License

ShadowCrawler is licensed under the Business Source License 1.1 (BUSL‑1.1).
It will convert to Apache 2.0 on November 16, 2030.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

4.1.3

Jul 4, 2026

4.1.1

Jun 25, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

shadowcrawler-4.1.3.tar.gz (1.5 MB view details)

Uploaded Jul 4, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

shadowcrawler-4.1.3-py3-none-any.whl (1.6 MB view details)

Uploaded Jul 4, 2026 Python 3

File details

Details for the file shadowcrawler-4.1.3.tar.gz.

File metadata

Download URL: shadowcrawler-4.1.3.tar.gz
Upload date: Jul 4, 2026
Size: 1.5 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.5

File hashes

Hashes for shadowcrawler-4.1.3.tar.gz
Algorithm	Hash digest
SHA256	`85701f392845bb8936c93dc57bbd13e212cbc6af080cb45789671594cba51e8d`
MD5	`acf41971c17dc90f03e6b38e01540f97`
BLAKE2b-256	`55c135e072515824257755b9c392130eb39770ba80db07888eeba633e21756b8`

See more details on using hashes here.

File details

Details for the file shadowcrawler-4.1.3-py3-none-any.whl.

File metadata

Download URL: shadowcrawler-4.1.3-py3-none-any.whl
Upload date: Jul 4, 2026
Size: 1.6 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.5

File hashes

Hashes for shadowcrawler-4.1.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`c1cb05ea0192aa115c938354932aae16c3570e367a41b146abc82983d4f4822b`
MD5	`e449665d65672b1bb0c10b8608f0f122`
BLAKE2b-256	`615288fb05814c06b76c1fd0b7bbdda34c2b1445e7277e6a93127bdc6f8e31e0`

See more details on using hashes here.

shadowcrawler 4.1.3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

ShadowCrawler

❤️ Origin Story

✨ Features

Requirements

🚀 Installation

⚡ Quickstart

🕷 Creating a Spider

🔍 Domain Autodetection

🌐 Fetch Modes

🔐 Persistent Authentication

🖼 Media Pipeline

🧰 CLI Commands

📁 Project Structure

🕸 Included Example Spiders

🗺 Roadmap

📦 itch.io Distribution

☕ Support the Project

📜 License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes