Skip to main content

Scan your agentic codebase for unguarded tool calls with real-world side effects

Project description

diplomat-agent

PyPI version Python 3.9+ License: Apache 2.0 diplomat-agent: scanned CI

Your AI agent can call stripe.Refund.create() 200 times in a loop. diplomat-agent finds these functions before they ship.

pip install diplomat-agent
diplomat-agent scan .

Scans your Python AI agent code. Reports every function that can change the real world — and shows which ones have no checks. Zero dependencies. 2 seconds on a 1,000-file repo.


What it looks like

diplomat-agent — governance scan

Scanned: ./my-agent
Tool calls with side effects: 12

⚠ process_refund(amount, customer_id)
  Write protection:       NONE
  Rate limit:             NONE
  → stripe.Refund.create() with no amount limit
  Governance: ❌ UNGUARDED

⚠ delete_user_data(user_id)
  Confirmation step:      NONE
  Batch protection:       NONE
  → session.delete() with no confirmation
  Governance: ❌ UNGUARDED

✓ update_order(order_id)
  Governance: ✅ GUARDED

────────────────────────────────────────────
RESULT: 8 unguarded · 3 partial · 1 guarded (12 total)

Why this matters for AI agents

In a web app, a human clicks a button. The UI has validation, confirmation dialogs, rate limits per session.

In an agent, an LLM decides which functions to call, with what arguments, how many times. It doesn't know your business rules. It can loop, hallucinate arguments, or get prompt-injected.

Without guards in the code, there's nothing between the LLM's decision and the real-world consequence.

We scanned 16 open-source agent repos. 76% of tool calls had zero checks.


What it detects

40+ patterns across 8 categories:

Category Examples
Database writes session.commit(), .save(), .create(), .update()
Database deletes session.delete(), .remove(), DELETE FROM
HTTP writes requests.post(), httpx.put(), client.patch()
Payments stripe.Charge.create(), stripe.Refund.create()
Email / messaging smtp.sendmail(), ses.send_email(), slack.chat_postMessage()
Agent invocations graph.ainvoke(), agent.execute(), Runner.run_sync()
Destructive commands subprocess.run(), exec(), eval()
Publish / upload s3.put_object(), client.publish()

What counts as a guard: input validation, rate limiting, auth checks, confirmation steps, idempotency keys, retry bounds. Full list →


Integrate everywhere

CI — block unguarded PRs

- name: Diplomat governance scan
  run: |
    pip install diplomat-agent
    diplomat-agent scan . --fail-on-unchecked

IDE — review what the copilot wrote

Works in your IDE with zero extension to install:

IDE How Setup
Copilot Chat (VS Code, Cursor, Windsurf) Select "Diplomat Reviewer" in agent dropdown Copy .github/agents/diplomat-reviewer.agent.md
Claude Code Ask "scan for unguarded tool calls" AGENTS.md at repo root (included)
Cursor (native) Auto-activates on Python files Copy .cursor/rules/diplomat-reviewer.mdc

Pre-commit hook

repos:
  - repo: https://github.com/Diplomat-ai/diplomat-agent
    rev: v0.4.0
    hooks:
      - id: diplomat-agent

SARIF — native VS Code Problems panel

diplomat-agent scan . --format sarif --output results.sarif

Open with SARIF Viewer. Or upload to GitHub Code Scanning.

Scan only changed files

diplomat-agent scan . --diff-only

Generate your agent's SBOM

diplomat-agent scan . --format registry --output-registry toolcalls.yaml

Like requirements.txt — but for what your agent can do, not what it depends on. Commit it. Diff it in PRs. When your agent gains a new capability, the change shows up in review.

What is a Behavioral BOM →


Benchmarks

Repo Files Tool calls Unguarded Time
Skyvern 595 452 345 (76%) ~2s
Dify 1,000+ 1,009 759 (75%) ~3s
PraisonAI 1,028 911 (89%) ~2s
CrewAI 348 273 (78%) ~1s

Full results on 16 repos →


Output formats

Format Flag Use case
Terminal (default) Human review
JSON --format json IDE agents, automation
SARIF 2.1.0 --format sarif VS Code, GitHub Code Scanning
CSAF 2.0 --format csaf Security teams, CERTs
Markdown --format markdown Documentation, reports
Registry --format registry toolcalls.yaml SBOM

Acknowledge a tool call

If a function is intentionally unguarded or protected elsewhere:

def send_alert(message):  # checked:ok — protected by API gateway
    requests.post(ALERT_URL, json={"msg": message})

From scanning to runtime

diplomat-agent finds the gaps. diplomat-gate protects them at runtime.

pip install diplomat-gate
from diplomat_gate import Gate

gate = Gate.from_yaml("gate.yaml")
verdict = gate.evaluate({"action": "charge_card", "amount": 15000})
# → STOP — Amount 15000 exceeds limit of 10000

15+ pre-built policies (payments, emails, shell commands). CONTINUE / REVIEW / STOP in < 1ms. Zero dependencies.

diplomat-gate → · diplomat.run → (hosted control plane with hash-chained audit trail)


Standards alignment


Known limitations

  • Static analysis only — no runtime detection
  • Python only — TypeScript on the roadmap
  • Intra-procedural + same-package decorators — use # checked:ok for guards in external packages
  • Full limitations →

Roadmap

  • Python AST scanner (40+ patterns)
  • toolcalls.yaml behavioral SBOM
  • CSAF 2.0 + SARIF 2.1.0 output
  • CI integration (--fail-on-unchecked)
  • IDE agents (Copilot Chat, Claude Code, Cursor)
  • Pre-commit hook
  • --diff-only and --file modes
  • Inter-procedural decorator resolution
  • TypeScript support
  • MCP server scanning
  • VS Code extension (inline diagnostics on save)
  • PR comment integration

Requirements

  • Python 3.9+
  • Zero dependencies (stdlib ast only)
  • Optional: rich (colored output), pyyaml (registry)

License

Apache 2.0 — Copyright 2026 Diplomat Services SAS

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

diplomat_agent-0.4.0.tar.gz (104.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

diplomat_agent-0.4.0-py3-none-any.whl (53.6 kB view details)

Uploaded Python 3

File details

Details for the file diplomat_agent-0.4.0.tar.gz.

File metadata

  • Download URL: diplomat_agent-0.4.0.tar.gz
  • Upload date:
  • Size: 104.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for diplomat_agent-0.4.0.tar.gz
Algorithm Hash digest
SHA256 43402ec06a79426a9df1be2c2e267b68ee54ecf7296e5d8645325942fdb3beff
MD5 f37e1a6f25411f9d75da74f2d87d0ff3
BLAKE2b-256 c8cf7b38ac719b532c1c80509f2dbbdd1cce3a7f7e43f46b02b657e9429e9b62

See more details on using hashes here.

File details

Details for the file diplomat_agent-0.4.0-py3-none-any.whl.

File metadata

  • Download URL: diplomat_agent-0.4.0-py3-none-any.whl
  • Upload date:
  • Size: 53.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for diplomat_agent-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 29969d2daef6b052ef95fd3ed9a966c1abe337d3cee31ee4c02b1ac5f34fd940
MD5 82c5dbb1b7fb9a35de195fafac6948e6
BLAKE2b-256 5f84760f3dd0a3eda6372690075409f0b2344168d4476ddc8288dcb1baf17c7c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page