Skip to main content

Local-first deep research agent that reads the whole web — even pages behind Cloudflare, Datadome, Turnstile & reCAPTCHA.

Project description

🛡️ DeepCloak

The deep-research agent that reads the pages others can't.

Cloudflare · Datadome · Turnstile · reCAPTCHA — it walks straight through them.

CI License: MIT Python 3.11+ MCP native PRs welcome GitHub stars Watch on YouTube

English · 한국어 · 简体中文

▶ Live demo — deepcloak.vercel.app · ▶ Watch on YouTube

DeepCloak running: it detects a Cloudflare Turnstile, escalates, and bypasses it — then writes a cited report


The problem

You ask a research tool a question. Half the best sources sit behind a Bot Wall — Cloudflare, Datadome, a Turnstile, a reCAPTCHA. Every other tool gets a 403, silently drops those pages, and hands you a thinner report. You never even learn what it missed.

What DeepCloak does

When a plain fetch hits a Bot Wall, DeepCloak Escalates that one URL to a Stealth Fetch and Bypasses the wall — recovering the content other agents abandon. Then it tells you, at the bottom of every report, exactly how many walls it broke through.

It's a thin, local-first orchestrator over two great projects: local-deep-research (the research loop) and CloakBrowser (the stealth browser). Use it as a CLI, an MCP server, or a Claude skill. MIT.

🌑 Why we built this

The open web is quietly closing. More of the best writing now sits behind a bot check, and AI research agents — the tools we increasingly trust to read the web for us — go blind at exactly those doors, without ever saying so. A report that silently skips every walled source isn't neutral; it's wrong in a way you can't see.

DeepCloak's stance is simple: your agent should be able to read what a person with a browser can read — and it should be honest about how it got there. So it Bypasses the wall when it has to, keeps everything local (no query or page leaves your machine), and prints an Evidence Record of every wall it crossed. Capability and transparency, MIT-licensed, no lock-in.

✨ Why it's different

Plain deep research DeepCloak
Reads the open web
Reads Cloudflare / Datadome / Turnstile / reCAPTCHA pages dropped silently Bypassed
Tells you which sources were walled ✅ Evidence Record
Local-first (no API key required)
Fast on open pages plain-first, stealth only when needed

Verified live — not mocked. The clip above is an unedited screen recording (captured with ffmpeg, no compositing) of a real deepcloak run against a local LLM (Qwen) + SearXNG — no API key. It Escalates on each Bot Wall and Bypasses 8 Cloudflare/Turnstile walls in one pass, then writes a cited report. Full clip: docs/media/demo-real.mp4; a raw asciinema session is also kept at docs/media/demo.cast. Wall counts vary per run (8–20) because the open web does.

🚀 Quickstart

pip install deepcloak
deepcloak setup                       # one-time: downloads the stealth browser
export OPENAI_API_KEY=...             # or ANTHROPIC_API_KEY / GEMINI_API_KEY — or --provider ollama
deepcloak "How does Cloudflare Turnstile detect bots?" --depth detailed --out report.md

You get a cited report.md ending with a 🛡️ Bypassed N bot-walled sources section, plus a report.md.evidence.json sidecar.

🧠 How it works

search (DuckDuckGo, no setup) ─▶ candidate URLs
        │
        ▼  for each page:
   plain fetch ─▶ Bot Wall detected? ──no──▶ use it (fast)
                        │ yes
                        ▼
                  Escalate ─▶ Stealth Fetch (CloakBrowser) ─▶ Bypass
        │
        ▼
research loop (local-deep-research) ─▶ cited report + Evidence Records

Stealth is heavy, so DeepCloak tries a cheap plain fetch first and only launches the stealth browser when it actually detects a Bot Wall (--stealth auto, the default). Use --depth detailed/report to fetch full pages where Bypasses happen.

🤖 Connect it to your agent (MCP)

DeepCloak runs as a stdio MCP server exposing deep_research(query, depth), quick_summary(query), and get_evidence(run_id).

Claude Code — add to your project's .mcp.json (an example ships in this repo):

{ "mcpServers": { "deepcloak": { "command": "deepcloak", "args": ["mcp"] } } }

Codex — add to ~/.codex/config.toml:

[mcp_servers.deepcloak]
command = "deepcloak"
args = ["mcp"]

Then your agent can call deep_research and read bot-walled sources directly. Prefer a slash-style skill? Drop skill/SKILL.md into ~/.claude/skills/deepcloak/.

⚙️ Configuration

Flag Default Notes
--depth detailed quick / detailed / report
--engine duckduckgo searxng / auto
--stealth auto always / off
--provider / --model auto-detected OPENAIANTHROPICGEMINI, or ollama
--respect-robots off honor robots.txt
--proxy SOCKS5 for the Stealth Fetch

⚠️ Responsible use

DeepCloak Bypasses bot-detection. You are responsible for having the right to access whatever you fetch. robots.txt is ignored by default; pass --respect-robots to honor it (ADR-0002). Don't use it to violate sites' terms or the law.

🛠️ Built on

local-deep-research (MIT) + CloakBrowser (MIT), via pip — no vendored code. Domain glossary in CONTEXT.md; design decisions in docs/adr/; contributing guide in CONTRIBUTING.md.

📄 License

MIT — see LICENSE and NOTICE.

If DeepCloak read a page your last tool gave up on, drop a ⭐ — it helps others find it.

Built by Mrbaeksang · baeksang.dev · contact@baeksang.dev

Star History Chart

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

deepcloak-0.1.0.tar.gz (18.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

deepcloak-0.1.0-py3-none-any.whl (22.9 kB view details)

Uploaded Python 3

File details

Details for the file deepcloak-0.1.0.tar.gz.

File metadata

  • Download URL: deepcloak-0.1.0.tar.gz
  • Upload date:
  • Size: 18.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for deepcloak-0.1.0.tar.gz
Algorithm Hash digest
SHA256 ec454ef581823eb920ddead36b97c0cb1ebb52440212a4e9a379a3f377fdf92f
MD5 511778980cfc1e3c988edaa8151dcab7
BLAKE2b-256 43956e40356f2548543195702fdeb3dbe60fa5af52e62278ac4d7257deb8552f

See more details on using hashes here.

File details

Details for the file deepcloak-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: deepcloak-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 22.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for deepcloak-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 8ce254dccaee39d6c8c182f3f2feb6fdcdd1e8e8785562c52d4f45289ad874fd
MD5 4b24cc7f0189dcaa7efe996341dbd7be
BLAKE2b-256 daba59e58b93757d236457e07faf917ca6f9020a1a298978b59680258fd5ede1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page