Local-first deep research agent that reads the whole web — even pages behind Cloudflare, Datadome, Turnstile & reCAPTCHA.
Project description
🛡️ DeepCloak
The deep-research agent that reads the pages others can't.
Cloudflare · Datadome · Turnstile · reCAPTCHA — it walks straight through them.
▶ Live demo — deepcloak.vercel.app · ▶ Watch on YouTube
The problem
You ask a research tool a question. Half the best sources sit behind a Bot Wall — Cloudflare, Datadome, a Turnstile, a reCAPTCHA. Every other tool gets a 403, silently drops those pages, and hands you a thinner report. You never even learn what it missed.
What DeepCloak does
When a plain fetch hits a Bot Wall, DeepCloak Escalates that one URL to a Stealth Fetch and Bypasses the wall — recovering the content other agents abandon. Then it tells you, at the bottom of every report, exactly how many walls it broke through.
It's a thin, local-first orchestrator over two great projects: local-deep-research (the research loop) and CloakBrowser (the stealth browser). Use it as a CLI, an MCP server, or a Claude skill. MIT.
🌑 Why we built this
The open web is quietly closing. More of the best writing now sits behind a bot check, and AI research agents — the tools we increasingly trust to read the web for us — go blind at exactly those doors, without ever saying so. A report that silently skips every walled source isn't neutral; it's wrong in a way you can't see.
DeepCloak's stance is simple: your agent should be able to read what a person with a browser can read — and it should be honest about how it got there. So it Bypasses the wall when it has to, keeps everything local (no query or page leaves your machine), and prints an Evidence Record of every wall it crossed. Capability and transparency, MIT-licensed, no lock-in.
✨ Why it's different
| Plain deep research | DeepCloak | |
|---|---|---|
| Reads the open web | ✅ | ✅ |
| Reads Cloudflare / Datadome / Turnstile / reCAPTCHA pages | ❌ dropped silently | ✅ Bypassed |
| Tells you which sources were walled | ❌ | ✅ Evidence Record |
| Local-first (no API key required) | ✅ | ✅ |
| Fast on open pages | — | ✅ plain-first, stealth only when needed |
Verified live — not mocked. The clip above is an unedited screen recording (captured with
ffmpeg, no compositing) of a realdeepcloakrun against a local LLM (Qwen) + SearXNG — no API key. It Escalates on each Bot Wall and Bypasses 8 Cloudflare/Turnstile walls in one pass, then writes a cited report. Full clip:docs/media/demo-real.mp4; a raw asciinema session is also kept atdocs/media/demo.cast. Wall counts vary per run (8–20) because the open web does.
🚀 Quickstart
pip install deepcloak
deepcloak setup # one-time: downloads the stealth browser
export OPENAI_API_KEY=... # or ANTHROPIC_API_KEY / GEMINI_API_KEY — or --provider ollama
deepcloak "How does Cloudflare Turnstile detect bots?" --depth detailed --out report.md
You get a cited report.md ending with a 🛡️ Bypassed N bot-walled sources section, plus a report.md.evidence.json sidecar.
🧠 How it works
search (DuckDuckGo, no setup) ─▶ candidate URLs
│
▼ for each page:
plain fetch ─▶ Bot Wall detected? ──no──▶ use it (fast)
│ yes
▼
Escalate ─▶ Stealth Fetch (CloakBrowser) ─▶ Bypass
│
▼
research loop (local-deep-research) ─▶ cited report + Evidence Records
Stealth is heavy, so DeepCloak tries a cheap plain fetch first and only launches the stealth browser when it actually detects a Bot Wall (--stealth auto, the default). Use --depth detailed/report to fetch full pages where Bypasses happen.
🤖 Connect it to your agent (MCP)
DeepCloak runs as a stdio MCP server exposing deep_research(query, depth), quick_summary(query), and get_evidence(run_id).
Claude Code — add to your project's .mcp.json (an example ships in this repo):
{ "mcpServers": { "deepcloak": { "command": "deepcloak", "args": ["mcp"] } } }
Codex — add to ~/.codex/config.toml:
[mcp_servers.deepcloak]
command = "deepcloak"
args = ["mcp"]
Then your agent can call deep_research and read bot-walled sources directly. Prefer a slash-style skill? Drop skill/SKILL.md into ~/.claude/skills/deepcloak/.
⚙️ Configuration
| Flag | Default | Notes |
|---|---|---|
--depth |
detailed |
quick / detailed / report |
--engine |
duckduckgo |
searxng / auto |
--stealth |
auto |
always / off |
--provider / --model |
auto-detected | OPENAI → ANTHROPIC → GEMINI, or ollama |
--respect-robots |
off | honor robots.txt |
--proxy |
— | SOCKS5 for the Stealth Fetch |
⚠️ Responsible use
DeepCloak Bypasses bot-detection. You are responsible for having the right to access whatever you fetch. robots.txt is ignored by default; pass --respect-robots to honor it (ADR-0002). Don't use it to violate sites' terms or the law.
🛠️ Built on
local-deep-research (MIT) + CloakBrowser (MIT), via pip — no vendored code. Domain glossary in CONTEXT.md; design decisions in docs/adr/; contributing guide in CONTRIBUTING.md.
📄 License
If DeepCloak read a page your last tool gave up on, drop a ⭐ — it helps others find it.
Built by Mrbaeksang · baeksang.dev · contact@baeksang.dev
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file deepcloak-0.1.0.tar.gz.
File metadata
- Download URL: deepcloak-0.1.0.tar.gz
- Upload date:
- Size: 18.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ec454ef581823eb920ddead36b97c0cb1ebb52440212a4e9a379a3f377fdf92f
|
|
| MD5 |
511778980cfc1e3c988edaa8151dcab7
|
|
| BLAKE2b-256 |
43956e40356f2548543195702fdeb3dbe60fa5af52e62278ac4d7257deb8552f
|
File details
Details for the file deepcloak-0.1.0-py3-none-any.whl.
File metadata
- Download URL: deepcloak-0.1.0-py3-none-any.whl
- Upload date:
- Size: 22.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8ce254dccaee39d6c8c182f3f2feb6fdcdd1e8e8785562c52d4f45289ad874fd
|
|
| MD5 |
4b24cc7f0189dcaa7efe996341dbd7be
|
|
| BLAKE2b-256 |
daba59e58b93757d236457e07faf917ca6f9020a1a298978b59680258fd5ede1
|