URL in, LLM-ready markdown out. Stealth fetch with anti-bot bypass.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

leba01

These details have not been verified by PyPI

Project description

StealthFetch

URL in, LLM-ready markdown out.

from stealthfetch import fetch_markdown

md = fetch_markdown("https://en.wikipedia.org/wiki/Web_scraping")

Fetches any web page, strips nav, ads, and boilerplate, returns clean markdown. If the site blocks you, it auto-escalates to a stealth browser. One function, no config.

StealthFetch doesn't reinvent the hard parts: curl_cffi, trafilatura, html-to-markdown, Camoufox, and Patchright do the heavy lifting. StealthFetch is the orchestration layer: wiring them together, detecting blocks, deciding when to escalate, and handling the security concerns most tools skip.

How It Works

URL
 │
 ▼
┌───────────────────────────────────────────┐
│  FETCH          curl_cffi                 │
│                 Chrome TLS fingerprint    │
│                 ↓ blocked?                │
│                 auto-escalate to stealth  │
│                 browser (Camoufox /       │
│                 Patchright)               │
└─────────────────┬─────────────────────────┘
                  │
┌─────────────────▼─────────────────────────┐
│  EXTRACT        trafilatura               │
│                 strips nav, ads,          │
│                 boilerplate               │
└─────────────────┬─────────────────────────┘
                  │
┌─────────────────▼─────────────────────────┐
│  CONVERT        html-to-markdown (Rust)   │
└─────────────────┬─────────────────────────┘
                  │
                  ▼
               markdown

Each layer is one library call. The libraries do the hard work.

What StealthFetch Owns

Block Detection

Most anti-bot systems give themselves away before you ever see a captcha. StealthFetch uses status codes (403, 429, 503) as a fast first pass, then pattern-matches HTML signatures from Cloudflare, DataDome, PerimeterX, and Akamai. The trick is knowing when not to check: vendor-specific signatures (like _cf_chl_opt or perimeterx) are always checked because they never appear in real content. Generic phrases like "just a moment" or "access denied" are only checked on small pages (< 15k chars) since on a real article those strings are just words.

Auto-Escalation

Headless browsers are slow, heavy, and detectable in their own right. An HTTP request with a Chrome TLS fingerprint (via curl_cffi) gets through most sites just fine. So StealthFetch tries HTTP first always. It only spins up a stealth browser when the response actually looks blocked. The interesting part isn't the browser itself, it's the decision of when to use it.

SSRF Protection

Most scraping tools — including ones with 60-85k GitHub stars — trust whatever URL you hand them. StealthFetch doesn't. A hostname that resolves to 127.0.0.1? Rejected. A redirect chain that bounces through three domains and lands on a private IP? Caught. IPv6-mapped IPv4 bypasses, link-local addresses are all validated before the request goes out, and again after redirects resolve.

Works On

Most sites return clean markdown in under a second. Sites that fight back (Reddit, Amazon) get auto-escalated to a stealth browser — takes 5–8 seconds but you don't have to think about it.

Site	What You Get
Wikipedia, Reuters, BBC News, TechCrunch	Articles and news — straight through
Hacker News	Threads and comments
Stack Overflow	Q&A with code blocks
Medium	Articles — Cloudflare-protected, but no false-positive escalation (passive JS, not a block page)
Reddit	Blocked by challenge page → auto-escalates to browser
Amazon	Blocked by CAPTCHA → auto-escalates to browser

Install

Try it — no install needed (requires uv):

uvx stealthfetch https://en.wikipedia.org/wiki/Web_scraping

Install as a library:

pip install stealthfetch

Note: trafilatura brings ~20 transitive dependencies (lxml, charset-normalizer, etc.). Total install is ~50 packages.

Add stealth browser support (necessary for escalation logic):

pip install "stealthfetch[browser]"
camoufox fetch

CLI

stealthfetch https://en.wikipedia.org/wiki/Web_scraping
stealthfetch https://spa-app.com -m browser
stealthfetch https://example.com --no-links --no-tables
stealthfetch https://example.com --header "Cookie: session=abc"

MCP Server

StealthFetch is an MCP server — any MCP client (Claude Desktop, Claude Code, Cursor, etc.) can call it as a tool to fetch web pages as markdown.

No install needed — add this to your MCP client config:

{
  "mcpServers": {
    "stealthfetch": {
      "command": "uvx",
      "args": ["--from", "stealthfetch[mcp]", "stealthfetch-mcp"]
    }
  }
}

Or if you prefer a persistent install:

pip install "stealthfetch[mcp]"

{
  "mcpServers": {
    "stealthfetch": {
      "command": "stealthfetch-mcp"
    }
  }
}

API

`fetch_markdown(url, **kwargs) -> str`

Also available as afetch_markdown — same signature, async. Extraction and conversion run off the event loop via asyncio.to_thread.

Parameter	Type	Default	Description
`url`	`str`	required	URL to fetch
`method`	`str`	`"auto"`	`"auto"`, `"http"`, or `"browser"`
`browser_backend`	`str`	`"auto"`	`"auto"`, `"camoufox"`, or `"patchright"`
`include_links`	`bool`	`True`	Preserve hyperlinks
`include_images`	`bool`	`False`	Preserve image references
`include_tables`	`bool`	`True`	Preserve tables
`timeout`	`int`	`30`	Timeout in seconds
`proxy`	`dict`	`None`	`{"server": "...", "username": "...", "password": "..."}`
`headers`	`dict`	`None`	Additional HTTP headers

`fetch_result(url, **kwargs) -> FetchResult`

Same fetch/extract/convert pipeline as fetch_markdown, but returns a structured dataclass with the markdown and page metadata extracted as a free side-effect of parsing.

from stealthfetch import fetch_result

r = fetch_result("https://en.wikipedia.org/wiki/Web_scraping", method="http")
print(r.title)       # "Web scraping"
print(r.author)      # "Wikipedia contributors" (when available)
print(r.date)        # ISO 8601 date (when available)
print(r.markdown[:200])

FetchResult fields:

Field	Type	Description
`markdown`	`str`	Cleaned markdown content
`title`	`str \| None`	Page title
`author`	`str \| None`	Author name
`date`	`str \| None`	Publication date (ISO 8601 when available)
`description`	`str \| None`	Meta description
`url`	`str \| None`	Canonical URL (may differ from input)
`hostname`	`str \| None`	Hostname
`sitename`	`str \| None`	Publisher name

To get a plain dict: dataclasses.asdict(result).

afetch_result has the same signature, async.

Optional Dependencies

Extra	What it adds
`stealthfetch[camoufox]`	Camoufox stealth Firefox
`stealthfetch[patchright]`	Patchright stealth Chromium
`stealthfetch[browser]`	Both
`stealthfetch[mcp]`	MCP server

Python 3.10+. Tested on 3.10–3.13, Linux and macOS.

Roadmap

Things that would make sense if this gets traction:

Homebrew tap — brew install stealthfetch for people who don't want to think about Python
Docker image — bundle browser backends pre-installed, no camoufox fetch step, plays well with Docker's MCP Catalog

Contributions welcome.

License

MIT

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

leba01

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.2.1

Mar 2, 2026

This version

0.2.0

Mar 1, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

stealthfetch-0.2.0.tar.gz (41.4 kB view details)

Uploaded Mar 1, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

stealthfetch-0.2.0-py3-none-any.whl (20.1 kB view details)

Uploaded Mar 1, 2026 Python 3

File details

Details for the file stealthfetch-0.2.0.tar.gz.

File metadata

Download URL: stealthfetch-0.2.0.tar.gz
Upload date: Mar 1, 2026
Size: 41.4 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for stealthfetch-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`26987e631ec1be2b87fda126cf1e4527a3f82eb7f502b146b6540ba029c508a4`
MD5	`7cde7933bca883c4dd954420c594be1b`
BLAKE2b-256	`70d265beaf7af5cb980f7ee80eb8f0f1586ada1702a3113429434d00800ea48e`

See more details on using hashes here.

Provenance

The following attestation bundles were made for stealthfetch-0.2.0.tar.gz:

Publisher: publish.yml on leba01/stealthfetch

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: stealthfetch-0.2.0.tar.gz
- Subject digest: 26987e631ec1be2b87fda126cf1e4527a3f82eb7f502b146b6540ba029c508a4
- Sigstore transparency entry: 1006429026
- Sigstore integration time: Mar 1, 2026
Source repository:
- Permalink: leba01/stealthfetch@2af41f53565343bd2bb6b2097f9e7b433c12d00a
- Branch / Tag: refs/tags/v0.2.0
- Owner: https://github.com/leba01
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@2af41f53565343bd2bb6b2097f9e7b433c12d00a
- Trigger Event: push

File details

Details for the file stealthfetch-0.2.0-py3-none-any.whl.

File metadata

Download URL: stealthfetch-0.2.0-py3-none-any.whl
Upload date: Mar 1, 2026
Size: 20.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for stealthfetch-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a0ca0cf1324d1e52f8c7eec39f9542663a47777c8b5f2a8daebae2ccb40a7589`
MD5	`77fefe54414a7de2543670e6bb3e83fe`
BLAKE2b-256	`f76770c35e8d4eb2a9de76022f2251f9f9692d5b419b86ef5981362e40bcf48e`

See more details on using hashes here.

Provenance

The following attestation bundles were made for stealthfetch-0.2.0-py3-none-any.whl:

Publisher: publish.yml on leba01/stealthfetch

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: stealthfetch-0.2.0-py3-none-any.whl
- Subject digest: a0ca0cf1324d1e52f8c7eec39f9542663a47777c8b5f2a8daebae2ccb40a7589
- Sigstore transparency entry: 1006429030
- Sigstore integration time: Mar 1, 2026
Source repository:
- Permalink: leba01/stealthfetch@2af41f53565343bd2bb6b2097f9e7b433c12d00a
- Branch / Tag: refs/tags/v0.2.0
- Owner: https://github.com/leba01
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@2af41f53565343bd2bb6b2097f9e7b433c12d00a
- Trigger Event: push

stealthfetch 0.2.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

StealthFetch

How It Works

What StealthFetch Owns

Block Detection

Auto-Escalation

SSRF Protection

Works On

Install

CLI

MCP Server

API

fetch_markdown(url, **kwargs) -> str

fetch_result(url, **kwargs) -> FetchResult

Optional Dependencies

Roadmap

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

`fetch_markdown(url, **kwargs) -> str`

`fetch_result(url, **kwargs) -> FetchResult`