Cloaked headless browser for AI agents — stealth TLS, smart extraction, SPA fallback
Project description
Kloakt
Cloaked headless browser for AI agents.
Lightweight, stealthy, built in Rust. Based on Obscura.
Kloakt is a headless browser built for AI agents. It runs JavaScript via V8, extracts clean markdown from any page (including SPAs), and exposes tools via MCP for Claude Code and other AI systems.
Why Kloakt?
| Metric | Kloakt | Headless Chrome |
|---|---|---|
| Memory | 30 MB | 200+ MB |
| Binary size | 70 MB | 300+ MB |
| Anti-detect | Built-in | None |
| Page load | 85 ms | ~500 ms |
| Startup | Instant | ~2s |
| SPA extract | Yes | Manual |
Install
Build from source
git clone https://github.com/KultMember6Banger/kloakt.git
cd kloakt
cargo build --release
# With stealth mode (anti-detection + tracker blocking)
cargo build --release --features stealth
Requires Rust 1.75+ (rustup.rs). First build takes ~5 min (V8 compiles from source, cached after).
Quick Start
Extract content (AI agent use)
# Clean markdown from any page
kloakt extract https://example.com --main
# Structured JSON with metadata
kloakt extract https://example.com --main --json
# Cap output for agent context windows
kloakt extract https://en.wikipedia.org/wiki/Rust --main --json --max-chars 3000
# Wait for SPA hydration
kloakt extract https://example.com --delay 2000 --json
Fetch a page
# Get the page title
kloakt fetch https://example.com --eval "document.title"
# Extract all links
kloakt fetch https://example.com --dump links
# Render JavaScript and dump markdown
kloakt fetch https://news.ycombinator.com --dump markdown
# Wait for dynamic content
kloakt fetch https://example.com --wait-until networkidle0
Start the CDP server
kloakt serve --port 9222
# With stealth mode
kloakt serve --port 9222 --stealth
Scrape in parallel
kloakt scrape url1 url2 url3 ... \
--concurrency 25 \
--eval "document.querySelector('h1').textContent" \
--format json
Smart Extraction
The extract command uses a multi-phase pipeline optimized for AI agents:
- Noise removal — strips cookie banners, ads, popups, nav, social widgets
- Content scoring — text-density algorithm (Readability-like) finds the main content block
- Markdown conversion — DOM-to-markdown with absolute URL resolution
- SPA fallback — when JS rendering fails, extracts from meta tags, Open Graph, JSON-LD, and noscript content
Works on static HTML, server-rendered pages, and pure client-side SPAs (React, Vue, etc.).
Python API
from kloakt import extract, fetch, scrape
# Extract clean markdown
page = extract("https://example.com")
print(page.title, page.content, page.meta)
# Cap output length
page = extract("https://example.com", max_chars=3000)
# Wait for SPA content
page = extract("https://example.com", delay=2000)
# Raw fetch
html = fetch("https://example.com", dump="html")
title = fetch("https://example.com", eval_js="document.title")
# Parallel scrape
results = scrape(["https://a.com", "https://b.com"], concurrency=5)
MCP Server (Claude Code)
Kloakt includes an MCP server for use as a Claude Code tool:
{
"mcpServers": {
"kloakt": {
"command": "python3",
"args": ["/path/to/kloakt/mcp_server.py"]
}
}
}
Exposes kloakt_extract and kloakt_fetch as native tools.
Puppeteer / Playwright
Puppeteer
import puppeteer from 'puppeteer-core';
const browser = await puppeteer.connect({
browserWSEndpoint: 'ws://127.0.0.1:9222/devtools/browser',
});
const page = await browser.newPage();
await page.goto('https://news.ycombinator.com');
const stories = await page.evaluate(() =>
Array.from(document.querySelectorAll('.titleline > a'))
.map(a => ({ title: a.textContent, url: a.href }))
);
await browser.disconnect();
Playwright
import { chromium } from 'playwright-core';
const browser = await chromium.connectOverCDP({
endpointURL: 'ws://127.0.0.1:9222',
});
const page = await browser.newContext().then(ctx => ctx.newPage());
await page.goto('https://en.wikipedia.org/wiki/Web_scraping');
console.log(await page.title());
await browser.close();
Stealth Mode
Enable with --features stealth.
- Per-session fingerprint randomization (GPU, screen, canvas, audio, battery)
- Realistic
navigator.userAgentData(Chrome 145, high-entropy values) event.isTrusted = truefor dispatched events- Native function masking (
Function.prototype.toString()→[native code]) navigator.webdriver = undefined- 3,520 tracker domains blocked
CLI Reference
kloakt extract <URL>
| Flag | Default | Description |
|---|---|---|
--format |
markdown |
Output: markdown, text, or links |
--main |
off | Strip nav, header, footer, sidebar |
--json |
off | Structured JSON: title, URL, content, meta |
--max-chars |
unlimited | Truncate content to N characters |
--delay |
0 |
Extra ms to wait after load |
--stealth |
off | Anti-detection mode |
--selector |
— | Wait for CSS selector |
--wait-until |
load |
load, domcontentloaded, networkidle0 |
kloakt fetch <URL>
| Flag | Default | Description |
|---|---|---|
--dump |
html |
Output: html, text, links, markdown |
--eval |
— | JavaScript expression to evaluate |
--wait-until |
load |
Wait condition |
--selector |
— | Wait for CSS selector |
--stealth |
off | Anti-detection mode |
--quiet |
off | Suppress banner |
kloakt serve
| Flag | Default | Description |
|---|---|---|
--port |
9222 |
WebSocket port |
--proxy |
— | HTTP/SOCKS5 proxy URL |
--stealth |
off | Anti-detection + tracker blocking |
--workers |
1 |
Parallel workers |
kloakt scrape <URL...>
| Flag | Default | Description |
|---|---|---|
--concurrency |
10 |
Parallel workers |
--eval |
— | JS expression per page |
--format |
json |
Output: json or text |
CDP API
Full Chrome DevTools Protocol support for Puppeteer/Playwright compatibility.
| Domain | Methods |
|---|---|
| Target | createTarget, closeTarget, attachToTarget, createBrowserContext, disposeBrowserContext |
| Page | navigate, getFrameTree, addScriptToEvaluateOnNewDocument, lifecycleEvents |
| Runtime | evaluate, callFunctionOn, getProperties, addBinding |
| DOM | getDocument, querySelector, querySelectorAll, getOuterHTML, resolveNode |
| Network | enable, setCookies, getCookies, setExtraHTTPHeaders, setUserAgentOverride |
| Fetch | enable, continueRequest, fulfillRequest, failRequest |
| Storage | getCookies, setCookies, deleteCookies |
| Input | dispatchMouseEvent, dispatchKeyEvent |
License
Apache 2.0 — Based on Obscura by h4ckf0r0day.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file kloakt-0.1.1.tar.gz.
File metadata
- Download URL: kloakt-0.1.1.tar.gz
- Upload date:
- Size: 11.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e89920cb497ee0fd352262441659dd5901acb0148017866533e612a4549f9e0b
|
|
| MD5 |
5c75f0c4c232d680f334a0fb674e32a6
|
|
| BLAKE2b-256 |
5fe6a1dbf16535660ca04da6d70b3e67b7e997b32209cbeceb853c0c2d624f16
|
File details
Details for the file kloakt-0.1.1-py3-none-any.whl.
File metadata
- Download URL: kloakt-0.1.1-py3-none-any.whl
- Upload date:
- Size: 12.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
860c3f5805d3d6a56743f840a7b8b77848bfdb06343db645b03d56707ad68273
|
|
| MD5 |
515795c4336d1a9394e4acc2d8eaeddb
|
|
| BLAKE2b-256 |
83e3731ed68baf5b3acf86098c9b3d662f7bca0534f601930ff7a7c65e20abf8
|