penumbra-research

Privacy-native deep research agent. Multi-step research without leaving fingerprints.

These details have not been verified by PyPI

Project links

Project description

Penumbra — Deep research without leaving fingerprints.

The first privacy-native research agent. Multi-step, multi-source, multi-LLM. Your queries never leave the shadow.

Status: Beta

Why Penumbra exists

gpt-researcher has 40k+ stars. It does deep multi-step research beautifully. It also sends every one of your queries to OpenAI in cleartext, browses with a fingerprinted Chrome, and leaves a trail of your curiosity across every site it touches.

What you search reveals more about you than what you say.

Penumbra fixes this. It's a drop-in Python library (and CLI, and MCP server) that does deep research the way gpt-researcher does — but with privacy as a first-class concern at every layer.

from penumbra import Researcher

researcher = Researcher(privacy="high")
report = await researcher.run("Compare the latest open-source RAG frameworks in 2026")

print(report.markdown)

That's it. Behind the scenes:

Every web request is routed through Tor with fresh circuits per source
The browser is Playwright headless with randomized fingerprints (not Tor Browser bloat)
Your query is PII-scrubbed before being sent to any LLM
Sensitive subqueries can be auto-routed to a local model (Ollama) instead of the cloud
Every source is verified across multiple search engines before being trusted
The final report includes a citation graph you can audit

What makes Penumbra different

The privacy-research space already has projects. None of them do what Penumbra does.

Feature	Penumbra	gpt-researcher	Onion-Search-MCP	LLM-Tor	OnionClaw
Multi-step deep research	✅	✅	❌	❌	⚠️
Native Tor routing	✅	❌	✅	✅	✅
Drop-in Python library	✅	✅	❌ (MCP only)	❌	❌
Works without MCP	✅	✅	❌	❌	❌
Playwright headless (fast)	✅	✅	❌ (Tor Browser)	n/a	❌
PII scrubbing before LLM calls	✅	❌	❌	⚠️	❌
Multi-LLM (cloud + local routing)	✅	⚠️	❌	❌	❌
Citation graph + verification	✅	⚠️	❌	❌	❌
Per-source Tor circuit rotation	✅	❌	❌	⚠️	⚠️
Fingerprint randomization	✅	❌	partial	n/a	partial
Free-form privacy levels (0-3)	✅	❌	❌	❌	❌

Penumbra is not "gpt-researcher + Tor". It's a different architecture that starts from threat model and works backwards to the research workflow — not the other way around.

30-second quickstart

pip install penumbra-research[all]
playwright install chromium

If you want full Tor support (recommended), install Tor on your system:

# Windows (Chocolatey)
choco install tor

# macOS
brew install tor

# Debian/Ubuntu
sudo apt install tor

Then:

import asyncio
from penumbra import Researcher

async def main():
    async with Researcher(privacy="high") as r:
        report = await r.run("State of open-source LLM agents in 2026")
        print(report.markdown)
        report.save("output.md")

asyncio.run(main())

Or from the CLI:

penumbra "State of open-source LLM agents in 2026" --privacy high --output report.md

Privacy levels

You don't always need maximum paranoia. Penumbra exposes a 0-3 privacy dial:

Level	Name	Tor	Fingerprint	PII scrub	LLM routing	Speed
0	`off`	❌	❌	❌	Cloud only	⚡⚡⚡
1	`low`	❌	✅	✅	Cloud only	⚡⚡
2	`medium`	✅	✅	✅	Cloud (scrubbed)	⚡
3	`high`	✅	✅	✅	Local for sensitive	🐢

Picking the right level is a tradeoff. Penumbra lets you choose; most tools don't even offer the choice.

Architecture

penumbra/
├── privacy/          → Tor controller, PII scrubber, fingerprint engine
├── llm/              → Provider-agnostic LLM abstraction (Anthropic, OpenAI, Ollama)
├── research/         → Planner, browser, content extractor, citation graph
├── output/           → Markdown / JSON / citation-graph rendering
└── core.py           → The Researcher class that ties it all together

The codebase is intentionally compact. The core is ~1500 lines of Python you can audit in an afternoon. No hidden dependencies, no telemetry, no phone-home.

What Penumbra is NOT

Not a dark-web crawler. It can route to .onion sites but that's not its purpose.
Not a magic anonymity blanket. If you log into Google with your real account inside a Penumbra session, that's on you.
Not a Tor Browser replacement. It's headless and meant for programmatic use.
Not a proxy for chatting with LLMs anonymously — see LLM-Tor for that. Penumbra is about research.

Use Penumbra inside any agent

Penumbra is a library. Plug it into anything:

# Inside a LangGraph node
from penumbra import Researcher

async def research_node(state):
    async with Researcher(privacy="medium") as r:
        report = await r.run(state.question)
        return {"research": report.markdown}

# As a tool exposed to any agent framework
from penumbra import Researcher

async def private_research(query: str) -> str:
    """Research a topic without leaving fingerprints."""
    async with Researcher(privacy="high") as r:
        report = await r.run(query)
        return report.markdown

Roadmap

v0.1 — Core research engine, Tor routing, PII scrubbing, 3 LLM providers
v0.2 — MCP server, citation-graph visualization, residential proxy support
v0.3 — Browser session persistence with per-identity isolation
v0.4 — Differential-privacy noise on aggregate queries
v0.5 — Self-hosted search index (no DuckDuckGo dependency)

Contributing

PRs welcome. The codebase is small enough that a weekend can land a meaningful feature.

Read CONTRIBUTING.md for the dev setup, test instructions, and PR checklist. The one rule: every PR must justify itself against the threat model. If a feature makes research better but privacy worse, it doesn't ship without a flag.

See CHANGELOG.md for the release history.

License

MIT. Use it however you want. Just don't sell something Penumbra-powered without telling your users it's Penumbra-powered.

The penumbra is the region of partial shadow. You're not invisible — that's impossible. You're not exposed — that's default. You choose the shade.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

May 23, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

penumbra_research-0.1.0.tar.gz (28.0 kB view details)

Uploaded May 23, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

penumbra_research-0.1.0-py3-none-any.whl (37.6 kB view details)

Uploaded May 23, 2026 Python 3

File details

Details for the file penumbra_research-0.1.0.tar.gz.

File metadata

Download URL: penumbra_research-0.1.0.tar.gz
Upload date: May 23, 2026
Size: 28.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.2

File hashes

Hashes for penumbra_research-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`4975eeac3775432738f41e5acc1a8cf58fbc516e934caf6a58310109ad09e361`
MD5	`028a7f3da494ffca11a855cdee78de3c`
BLAKE2b-256	`d35b602e4f7ef77fff8388f4ad177c86595bc107ebb9a318a4dc54bad227d7ff`

See more details on using hashes here.

File details

Details for the file penumbra_research-0.1.0-py3-none-any.whl.

File metadata

Download URL: penumbra_research-0.1.0-py3-none-any.whl
Upload date: May 23, 2026
Size: 37.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.2

File hashes

Hashes for penumbra_research-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`4cbc87c369590b399f6d6e0796c96c918a9bea030c3f031cb7ff0f9ff26483ed`
MD5	`bf4aabada8265f2d2c69ccdbb712c0ef`
BLAKE2b-256	`a53da134db724fdc6469dbd1f77a3496b53ddb3ad70d33fa1e2cdaeaffb2b283`

See more details on using hashes here.

penumbra-research 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Why Penumbra exists

What makes Penumbra different

30-second quickstart

Privacy levels

Architecture

What Penumbra is NOT

Use Penumbra inside any agent

Roadmap

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes