Scan your agentic codebase for unguarded tool calls with real-world side effects

These details have not been verified by PyPI

Project links

Project description

diplomat-agent

You deployed a Python AI agent. Do you know every function it can call that writes to a database, sends an email, charges a card, or deletes data — and which ones have zero checks?

diplomat-agent runs a static AST scan and tells you exactly that. Zero dependencies. 2 seconds on a 1,000-file repo.

pip install diplomat-agent
diplomat-agent scan .

What it looks like

diplomat-agent — governance scan

Scanned: ./my-agent
Tool calls with side effects: 12

⚠ process_refund(amount, customer_id)
  Write protection:       NONE
  Rate limit:             NONE
  → stripe.Refund.create() with no amount limit
  Governance: ❌ UNGUARDED

⚠ delete_user_data(user_id)
  Confirmation step:      NONE
  Batch protection:       NONE
  → session.delete() with no confirmation
  Governance: ❌ UNGUARDED

✓ update_order(order_id)
  Governance: ✅ GUARDED

────────────────────────────────────────────
RESULT: 8 unguarded · 3 partial · 1 guarded (12 total)

diplomat-agent before/after scan

Why this matters for AI agents

In a web app, a human clicks a button. The UI has validation, confirmation dialogs, rate limits per session.

In an agent, an LLM decides which functions to call, with what arguments, how many times. It doesn't know your business rules. It can loop, hallucinate arguments, or get prompt-injected.

Without guards in the code, there's nothing between the LLM's decision and the real-world consequence.

We scanned 16 open-source agent repos. ~71% of tool calls have no guard — measured with inter-procedural tracing across 6,529 tool calls.

What it detects

40+ patterns across 8 categories:

Category	Examples
Database writes	`session.commit()`, `.save()`, `.create()`, `.update()`
Database deletes	`session.delete()`, `.remove()`, `DELETE FROM`
HTTP writes	`requests.post()`, `httpx.put()`, `client.patch()`
Payments	`stripe.Charge.create()`, `stripe.Refund.create()`
Email / messaging	`smtp.sendmail()`, `ses.send_email()`, `slack.chat_postMessage()`
Agent invocations	`graph.ainvoke()`, `agent.execute()`, `Runner.run_sync()`
Destructive commands	`subprocess.run()`, `exec()`, `eval()`
Publish / upload	`s3.put_object()`, `client.publish()`

What counts as a guard: input validation, rate limiting, auth checks, confirmation steps, idempotency keys, retry bounds. Full list →

Integrate everywhere

CI — block unguarded PRs

- name: Diplomat governance scan
  run: |
    pip install diplomat-agent
    diplomat-agent scan . --fail-on-unchecked

IDE — review what the copilot wrote

Works in your IDE with zero extension to install:

IDE	How	Setup
Copilot Chat (VS Code, Cursor, Windsurf)	Select "Diplomat Reviewer" in agent dropdown	Copy `.github/agents/diplomat-reviewer.agent.md`
Claude Code	Ask "scan for unguarded tool calls"	`AGENTS.md` at repo root (included)
Cursor (native)	Auto-activates on Python files	Copy `.cursor/rules/diplomat-reviewer.mdc`

Pre-commit hook

repos:
  - repo: https://github.com/Diplomat-ai/diplomat-agent
    rev: v0.5.0
    hooks:
      - id: diplomat-agent

SARIF — native VS Code Problems panel

diplomat-agent scan . --format sarif --output results.sarif

Open with SARIF Viewer. Or upload to GitHub Code Scanning.

Scan only changed files

diplomat-agent scan . --diff-only

Generate your agent's SBOM

diplomat-agent scan . --format registry --output-registry toolcalls.yaml

toolcalls.yaml lifecycle

Like requirements.txt — but for what your agent can do, not what it depends on. Commit it. Diff it in PRs. When your agent gains a new capability, the change shows up in review.

What is a Behavioral BOM →

Benchmarks

Repo	Type	Tool calls	Unguarded
Skyvern	Application	753	435 (58%)
AutoGPT	Application	668	469 (70%)
Dify	Platform	1,361	967 (71%)
PraisonAI	Framework	1,281	1,106 (86%)
CrewAI	Framework	425	317 (75%)

Application layer: ~62% unguarded across 2,943 tool calls in 9 repos (weighted, v0.5.0 with inter-procedural tracing). Frameworks sit higher — absence of guards there is by design. We scan both identically. Large repos (>400 tool calls) take longer with inter-procedural tracing (e.g. CrewAI ~38s).

Full results on 16 repos →

Output formats

Format	Flag	Use case
Terminal (default)	—	Human review
JSON	`--format json`	IDE agents, automation
SARIF 2.1.0	`--format sarif`	VS Code, GitHub Code Scanning
CSAF 2.0	`--format csaf`	Security teams, CERTs
Markdown	`--format markdown`	Documentation, reports
Registry	`--format registry`	`toolcalls.yaml` SBOM

Acknowledge a tool call

If a function is intentionally unguarded or protected elsewhere:

def send_alert(message):  # checked:ok — protected by API gateway
    requests.post(ALERT_URL, json={"msg": message})

From scanning to runtime

diplomat-agent finds what your agent can do. diplomat-gate stops it from doing the dangerous parts at runtime.

How diplomat-agent works

Tool	Stage	What it does
diplomat-agent	Know	Maps every tool call with side effects. Static. Pre-deploy.
diplomat-gate	Decide	Enforces CONTINUE / REVIEW / STOP at runtime. < 1ms. Zero deps.
diplomat.run	Prove	Immutable audit trail, dashboard, compliance export.

# Step 1 — find what your agent can do
pip install diplomat-agent
diplomat-agent scan .
# → 12 unguarded tool calls (8 payments, 4 emails)

# Step 2 — protect them at runtime
pip install "diplomat-gate[yaml]"
# → write gate.yaml, wrap your tools with @gate

from diplomat_gate import Gate

gate = Gate.from_yaml("gate.yaml")
verdict = gate.evaluate({"action": "charge_card", "amount": 15000})
# verdict.decision  → STOP
# verdict.violations → [{"policy": "amount_limit", "message": "Amount 15000 exceeds limit of 10000"}]

15+ pre-built policies (payments, emails, shell commands). CONTINUE / REVIEW / STOP in < 1ms. Zero dependencies.

diplomat-gate → · diplomat.run → (hosted control plane with hash-chained audit trail)

Standards alignment

Known limitations

Static analysis only — no runtime detection
Python only — TypeScript on the roadmap
Inter-procedural tracing: same-package top-level functions (depth 2). Class methods, cross-package chains, and depth > 2 are not resolved — use # checked:ok for guards in those paths or external packages
MCP scanning: Python only (FastMCP / official SDK) — TypeScript/Node MCP servers are out of scope
MCP scanning: transport-layer auth (OAuth, token gateway) is invisible — "unguarded" means no guard inside the tool function, independent of transport
MCP scanning: @mcp.tool attribute decorator only — bare @tool (from direct import) is not detected in v1
MCP scanning: @server.call_tool() low-level dispatcher is detected with a warning; per-tool resolution is not supported in v1
Full limitations →

Roadmap

Python AST scanner (40+ patterns)
toolcalls.yaml behavioral SBOM
CSAF 2.0 + SARIF 2.1.0 output
CI integration (--fail-on-unchecked)
IDE agents (Copilot Chat, Claude Code, Cursor)
Pre-commit hook
--diff-only and --file modes
Inter-procedural tracing: decorators + same-package call chains (depth 2)
MCP server scanning
TypeScript support
VS Code extension (inline diagnostics on save)
PR comment integration

Requirements

Python 3.9+
Zero dependencies (stdlib ast only)
Optional: rich (colored output), pyyaml (registry)

License

Apache 2.0

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.5.0

Jun 5, 2026

0.4.0

Apr 14, 2026

0.2.0

Mar 25, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

diplomat_agent-0.5.0.tar.gz (132.1 kB view details)

Uploaded Jun 5, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

diplomat_agent-0.5.0-py3-none-any.whl (64.3 kB view details)

Uploaded Jun 5, 2026 Python 3

File details

Details for the file diplomat_agent-0.5.0.tar.gz.

File metadata

Download URL: diplomat_agent-0.5.0.tar.gz
Upload date: Jun 5, 2026
Size: 132.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for diplomat_agent-0.5.0.tar.gz
Algorithm	Hash digest
SHA256	`3758871a24fb4f2113ec6b7e82dba75be304976de8675892b1dbf0e10ed5c40b`
MD5	`016ee119a6febc60ce362ee35d91c67b`
BLAKE2b-256	`5c92d19bc020699959698f1d596180e1114c8766dc78f51d32843d978a5a31ee`

See more details on using hashes here.

File details

Details for the file diplomat_agent-0.5.0-py3-none-any.whl.

File metadata

Download URL: diplomat_agent-0.5.0-py3-none-any.whl
Upload date: Jun 5, 2026
Size: 64.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for diplomat_agent-0.5.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`8f0ff55a65586ea6f889940698359cd6e320538da0e16db85a24e71fb1ff3ced`
MD5	`351df26425c89f352d8ed01063d5f0f2`
BLAKE2b-256	`de8e7716e8dc1792e36a9a37d1ed71ed543b6a7e011fdb5a8137bd89cac1e6ad`

See more details on using hashes here.

diplomat-agent 0.5.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

diplomat-agent

What it looks like

Why this matters for AI agents

What it detects

Integrate everywhere

CI — block unguarded PRs

IDE — review what the copilot wrote

Pre-commit hook

SARIF — native VS Code Problems panel

Scan only changed files

Generate your agent's SBOM

Benchmarks

Output formats

Acknowledge a tool call

From scanning to runtime

Standards alignment

Known limitations

Roadmap

Requirements

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes