Skip to main content

Scan your agentic codebase for unguarded tool calls with real-world side effects

Project description

diplomat-agent

v0.2.0 — 264 tests, 48+ detection patterns, 11 effect categories

Find every tool call in your AI agent that can change the real world.

diplomat-agent is a static scanner for Python AI agents. It maps every function that writes to a database, calls an external API, sends an email, invokes another agent, or deletes data — and tells you which ones have no checks.

What it finds

We scanned 16 open-source agent repos. Here's what we found:

Unguarded
Database writes 3,260
Database deletes 1,305
HTTP writes (POST/PUT/PATCH) 968
Subprocess / exec / eval 697
LLM calls 464
Emails 250

76% of tool calls had zero checks.

One example: Khoj's ai_update_memories lets an LLM delete user memories with no human confirmation.

Quick start

pip install diplomat-agent
diplomat-agent .

Output:

diplomat-agent — governance scan

Scanned: ./my-agent
Tool calls with side effects: 12

⚠ research_and_save(query, db_path)
  Write protection:       NONE
  Rate limit:             NONE
  → no rate limit · no auth check
  Governance: ❌ UNGUARDED

⚠ send_notification(user_id, message)
  Write protection:       NONE
  → no confirmation before send
  Governance: ❌ UNGUARDED

✓ process_order(order_id) — # checked:ok — protected by API gateway
  Governance: ✅ CONFIRMED

────────────────────────────────────────────
RESULT: 8 with no checks · 3 partial · 1 confirmed (12 total)

  Fix              → add validation in code, the next scan picks it up
  Acknowledge      → add  # checked:ok  in your source code
  Protected elsewhere → add  # checked:ok — protected by [where]
  CI enforcement   → --fail-on-unchecked blocks PRs with new unreviewed tool calls

What counts as a tool call

Any function that can change state outside the process:

  • Database writessession.commit(), .save(), .create(), .update()
  • Database deletessession.delete(), .remove(), DELETE FROM
  • HTTP writesrequests.post(), httpx.put(), client.patch()
  • LLM callsopenai.chat.completions.create(), anthropic.messages.create()
  • Agent invocationsgraph.ainvoke(), agent.execute(), Runner.run_sync()
  • Emailsmtp.sendmail(), ses_client.send_email()
  • Destructivesubprocess.run(), exec(), eval()
  • Publishs3.put_object(), client.publish()

What counts as a check

  • Input validationField(le=10000), @validator, if ... raise
  • Rate limit@rate_limit, @throttle
  • AuthDepends(), Security() (FastAPI)
  • Confirmationconfirm, approve, review in function body
  • Idempotencyidempotency_key, get_or_create, ON CONFLICT
  • Retry boundmax_retries=, @retry(stop=stop_after_attempt())

CI integration

Add to your CI pipeline:

- name: Diplomat governance scan
  run: |
    pip install diplomat-agent
    diplomat-agent . --fail-on-unchecked

--fail-on-unchecked blocks the PR if there are new unreviewed tool calls.

If toolcalls.yaml exists in the repo, it's used as baseline: only new findings block the build.

Generate the registry

diplomat-agent . --format registry --output-registry toolcalls.yaml

Commit toolcalls.yaml to your repo. Review changes in PRs. The file is a mirror of what your agent can do — it updates on every scan.

Acknowledge a tool call

If a tool call is intentionally unguarded or protected elsewhere:

def send_alert(message):  # checked:ok — protected by API gateway
    requests.post(ALERT_URL, json={"msg": message})

# diplomat:ok, # checked:ok, and # canary:ok all work.

Frameworks tested

Framework Coverage
LangGraph graph.ainvoke(), StateGraph patterns
CrewAI agent.execute(), tool patterns
OpenAI SDK client.chat.completions.create()
OpenAI Agents SDK Runner.run_sync()
LangChain chain.invoke(), AgentExecutor
Direct API calls OpenAI, Anthropic, any HTTP client

Requirements

  • Python 3.10+
  • Zero dependencies (stdlib ast module only)
  • Optional: rich for colored terminal output, pyyaml for registry

Benchmarks

Repo Tool calls Unguarded Time
Skyvern (595 files) 452 345 (76%) ~2s
Dify (1000+ files) 1,009 759 (75%) ~3s
PraisonAI 1,028 911 (89%) ~2s
CrewAI 348 273 (78%) ~1s

Known limitations

  • Static analysis only — cannot detect runtime-generated tool calls
  • name_contains patterns (e.g. "refund", "charge") may match internal business methods that aren't actual payment operations (~22% FP rate on payment patterns)
  • No inter-procedural analysis (doesn't follow calls across files)
  • No import alias resolution

Roadmap

  • TypeScript support
  • MCP server scanning
  • PR comment integration
  • Runtime enforcement (Diplomat runtime)

License

Apache 2.0 — Copyright 2026 Diplomat Services SAS

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

diplomat_agent-0.2.0.tar.gz (75.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

diplomat_agent-0.2.0-py3-none-any.whl (39.9 kB view details)

Uploaded Python 3

File details

Details for the file diplomat_agent-0.2.0.tar.gz.

File metadata

  • Download URL: diplomat_agent-0.2.0.tar.gz
  • Upload date:
  • Size: 75.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for diplomat_agent-0.2.0.tar.gz
Algorithm Hash digest
SHA256 ddb2bcb0c795691af4791b995c580e675b0e34ff81d6bfc7ca300aeaac83e86e
MD5 7f1b7c6959403518a94612337f109cde
BLAKE2b-256 ca7f914d4d13d5f48f019dd54a0b614cc58438e80f4128b8628c14d185278502

See more details on using hashes here.

File details

Details for the file diplomat_agent-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: diplomat_agent-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 39.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for diplomat_agent-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d80dd43e738e9024a86f54167ffe86910279958e81e786a0809f26b1bfa6ed73
MD5 64c60d16a5b77213d1a98d7fb78cdc97
BLAKE2b-256 353c318fc3ddd8dad26cfc7ebf8d91ad7706e032917ca4742b3e34fe064309ff

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page