Privacy-native deep research agent. Multi-step research without leaving fingerprints.
Project description
The first privacy-native research agent. Multi-step, multi-source, multi-LLM. Your queries never leave the shadow.
Why Penumbra exists
gpt-researcher has 40k+ stars. It does deep multi-step research beautifully.
It also sends every one of your queries to OpenAI in cleartext, browses with a fingerprinted Chrome, and leaves a trail of your curiosity across every site it touches.
What you search reveals more about you than what you say.
Penumbra fixes this. It's a drop-in Python library (and CLI, and MCP server) that does deep research the way gpt-researcher does — but with privacy as a first-class concern at every layer.
from penumbra import Researcher
researcher = Researcher(privacy="high")
report = await researcher.run("Compare the latest open-source RAG frameworks in 2026")
print(report.markdown)
That's it. Behind the scenes:
- Every web request is routed through Tor with fresh circuits per source
- The browser is Playwright headless with randomized fingerprints (not Tor Browser bloat)
- Your query is PII-scrubbed before being sent to any LLM
- Sensitive subqueries can be auto-routed to a local model (Ollama) instead of the cloud
- Every source is verified across multiple search engines before being trusted
- The final report includes a citation graph you can audit
What makes Penumbra different
The privacy-research space already has projects. None of them do what Penumbra does.
| Feature | Penumbra | gpt-researcher | Onion-Search-MCP | LLM-Tor | OnionClaw |
|---|---|---|---|---|---|
| Multi-step deep research | ✅ | ✅ | ❌ | ❌ | ⚠️ |
| Native Tor routing | ✅ | ❌ | ✅ | ✅ | ✅ |
| Drop-in Python library | ✅ | ✅ | ❌ (MCP only) | ❌ | ❌ |
| Works without MCP | ✅ | ✅ | ❌ | ❌ | ❌ |
| Playwright headless (fast) | ✅ | ✅ | ❌ (Tor Browser) | n/a | ❌ |
| PII scrubbing before LLM calls | ✅ | ❌ | ❌ | ⚠️ | ❌ |
| Multi-LLM (cloud + local routing) | ✅ | ⚠️ | ❌ | ❌ | ❌ |
| Citation graph + verification | ✅ | ⚠️ | ❌ | ❌ | ❌ |
| Per-source Tor circuit rotation | ✅ | ❌ | ❌ | ⚠️ | ⚠️ |
| Fingerprint randomization | ✅ | ❌ | partial | n/a | partial |
| Free-form privacy levels (0-3) | ✅ | ❌ | ❌ | ❌ | ❌ |
Penumbra is not "gpt-researcher + Tor". It's a different architecture that starts from threat model and works backwards to the research workflow — not the other way around.
30-second quickstart
pip install penumbra-research[all]
playwright install chromium
If you want full Tor support (recommended), install Tor on your system:
# Windows (Chocolatey)
choco install tor
# macOS
brew install tor
# Debian/Ubuntu
sudo apt install tor
Then:
import asyncio
from penumbra import Researcher
async def main():
async with Researcher(privacy="high") as r:
report = await r.run("State of open-source LLM agents in 2026")
print(report.markdown)
report.save("output.md")
asyncio.run(main())
Or from the CLI:
penumbra "State of open-source LLM agents in 2026" --privacy high --output report.md
Privacy levels
You don't always need maximum paranoia. Penumbra exposes a 0-3 privacy dial:
| Level | Name | Tor | Fingerprint | PII scrub | LLM routing | Speed |
|---|---|---|---|---|---|---|
| 0 | off |
❌ | ❌ | ❌ | Cloud only | ⚡⚡⚡ |
| 1 | low |
❌ | ✅ | ✅ | Cloud only | ⚡⚡ |
| 2 | medium |
✅ | ✅ | ✅ | Cloud (scrubbed) | ⚡ |
| 3 | high |
✅ | ✅ | ✅ | Local for sensitive | 🐢 |
Picking the right level is a tradeoff. Penumbra lets you choose; most tools don't even offer the choice.
Architecture
penumbra/
├── privacy/ → Tor controller, PII scrubber, fingerprint engine
├── llm/ → Provider-agnostic LLM abstraction (Anthropic, OpenAI, Ollama)
├── research/ → Planner, browser, content extractor, citation graph
├── output/ → Markdown / JSON / citation-graph rendering
└── core.py → The Researcher class that ties it all together
The codebase is intentionally compact. The core is ~1500 lines of Python you can audit in an afternoon. No hidden dependencies, no telemetry, no phone-home.
What Penumbra is NOT
- Not a dark-web crawler. It can route to
.onionsites but that's not its purpose. - Not a magic anonymity blanket. If you log into Google with your real account inside a Penumbra session, that's on you.
- Not a Tor Browser replacement. It's headless and meant for programmatic use.
- Not a proxy for chatting with LLMs anonymously — see LLM-Tor for that. Penumbra is about research.
Use Penumbra inside any agent
Penumbra is a library. Plug it into anything:
# Inside a LangGraph node
from penumbra import Researcher
async def research_node(state):
async with Researcher(privacy="medium") as r:
report = await r.run(state.question)
return {"research": report.markdown}
# As a tool exposed to any agent framework
from penumbra import Researcher
async def private_research(query: str) -> str:
"""Research a topic without leaving fingerprints."""
async with Researcher(privacy="high") as r:
report = await r.run(query)
return report.markdown
Roadmap
- v0.1 — Core research engine, Tor routing, PII scrubbing, 3 LLM providers
- v0.2 — MCP server, citation-graph visualization, residential proxy support
- v0.3 — Browser session persistence with per-identity isolation
- v0.4 — Differential-privacy noise on aggregate queries
- v0.5 — Self-hosted search index (no DuckDuckGo dependency)
Contributing
PRs welcome. The codebase is small enough that a weekend can land a meaningful feature.
Read CONTRIBUTING.md for the dev setup, test instructions, and PR checklist. The one rule: every PR must justify itself against the threat model. If a feature makes research better but privacy worse, it doesn't ship without a flag.
See CHANGELOG.md for the release history.
License
MIT. Use it however you want. Just don't sell something Penumbra-powered without telling your users it's Penumbra-powered.
The penumbra is the region of partial shadow. You're not invisible — that's impossible. You're not exposed — that's default. You choose the shade.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file penumbra_research-0.1.0.tar.gz.
File metadata
- Download URL: penumbra_research-0.1.0.tar.gz
- Upload date:
- Size: 28.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4975eeac3775432738f41e5acc1a8cf58fbc516e934caf6a58310109ad09e361
|
|
| MD5 |
028a7f3da494ffca11a855cdee78de3c
|
|
| BLAKE2b-256 |
d35b602e4f7ef77fff8388f4ad177c86595bc107ebb9a318a4dc54bad227d7ff
|
File details
Details for the file penumbra_research-0.1.0-py3-none-any.whl.
File metadata
- Download URL: penumbra_research-0.1.0-py3-none-any.whl
- Upload date:
- Size: 37.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4cbc87c369590b399f6d6e0796c96c918a9bea030c3f031cb7ff0f9ff26483ed
|
|
| MD5 |
bf4aabada8265f2d2c69ccdbb712c0ef
|
|
| BLAKE2b-256 |
a53da134db724fdc6469dbd1f77a3496b53ddb3ad70d33fa1e2cdaeaffb2b283
|