Scan your agentic codebase for unguarded tool calls with real-world side effects

These details have not been verified by PyPI

Project links

Project description

diplomat-agent

v0.2.0 — 264 tests, 48+ detection patterns, 11 effect categories

Find every tool call in your AI agent that can change the real world.

diplomat-agent is a static scanner for Python AI agents. It maps every function that writes to a database, calls an external API, sends an email, invokes another agent, or deletes data — and tells you which ones have no checks.

What it finds

We scanned 16 open-source agent repos. Here's what we found:

	Unguarded
Database writes	3,260
Database deletes	1,305
HTTP writes (POST/PUT/PATCH)	968
Subprocess / exec / eval	697
LLM calls	464
Emails	250

76% of tool calls had zero checks.

One example: Khoj's ai_update_memories lets an LLM delete user memories with no human confirmation.

Quick start

pip install diplomat-agent
diplomat-agent .

Output:

diplomat-agent — governance scan

Scanned: ./my-agent
Tool calls with side effects: 12

⚠ research_and_save(query, db_path)
  Write protection:       NONE
  Rate limit:             NONE
  → no rate limit · no auth check
  Governance: ❌ UNGUARDED

⚠ send_notification(user_id, message)
  Write protection:       NONE
  → no confirmation before send
  Governance: ❌ UNGUARDED

✓ process_order(order_id) — # checked:ok — protected by API gateway
  Governance: ✅ CONFIRMED

────────────────────────────────────────────
RESULT: 8 with no checks · 3 partial · 1 confirmed (12 total)

  Fix              → add validation in code, the next scan picks it up
  Acknowledge      → add  # checked:ok  in your source code
  Protected elsewhere → add  # checked:ok — protected by [where]
  CI enforcement   → --fail-on-unchecked blocks PRs with new unreviewed tool calls

What counts as a tool call

Any function that can change state outside the process:

Database writes — session.commit(), .save(), .create(), .update()
Database deletes — session.delete(), .remove(), DELETE FROM
HTTP writes — requests.post(), httpx.put(), client.patch()
LLM calls — openai.chat.completions.create(), anthropic.messages.create()
Agent invocations — graph.ainvoke(), agent.execute(), Runner.run_sync()
Email — smtp.sendmail(), ses_client.send_email()
Destructive — subprocess.run(), exec(), eval()
Publish — s3.put_object(), client.publish()

What counts as a check

Input validation — Field(le=10000), @validator, if ... raise
Rate limit — @rate_limit, @throttle
Auth — Depends(), Security() (FastAPI)
Confirmation — confirm, approve, review in function body
Idempotency — idempotency_key, get_or_create, ON CONFLICT
Retry bound — max_retries=, @retry(stop=stop_after_attempt())

CI integration

Add to your CI pipeline:

- name: Diplomat governance scan
  run: |
    pip install diplomat-agent
    diplomat-agent . --fail-on-unchecked

--fail-on-unchecked blocks the PR if there are new unreviewed tool calls.

If toolcalls.yaml exists in the repo, it's used as baseline: only new findings block the build.

Generate the registry

diplomat-agent . --format registry --output-registry toolcalls.yaml

Commit toolcalls.yaml to your repo. Review changes in PRs. The file is a mirror of what your agent can do — it updates on every scan.

Acknowledge a tool call

If a tool call is intentionally unguarded or protected elsewhere:

def send_alert(message):  # checked:ok — protected by API gateway
    requests.post(ALERT_URL, json={"msg": message})

# diplomat:ok, # checked:ok, and # canary:ok all work.

Frameworks tested

Framework	Coverage
LangGraph	`graph.ainvoke()`, StateGraph patterns
CrewAI	`agent.execute()`, tool patterns
OpenAI SDK	`client.chat.completions.create()`
OpenAI Agents SDK	`Runner.run_sync()`
LangChain	`chain.invoke()`, AgentExecutor
Direct API calls	OpenAI, Anthropic, any HTTP client

Requirements

Python 3.10+
Zero dependencies (stdlib ast module only)
Optional: rich for colored terminal output, pyyaml for registry

Benchmarks

Repo	Tool calls	Unguarded	Time
Skyvern (595 files)	452	345 (76%)	~2s
Dify (1000+ files)	1,009	759 (75%)	~3s
PraisonAI	1,028	911 (89%)	~2s
CrewAI	348	273 (78%)	~1s

Known limitations

Static analysis only — cannot detect runtime-generated tool calls
name_contains patterns (e.g. "refund", "charge") may match internal business methods that aren't actual payment operations (~22% FP rate on payment patterns)
No inter-procedural analysis (doesn't follow calls across files)
No import alias resolution

Roadmap

TypeScript support
MCP server scanning
PR comment integration
Runtime enforcement (Diplomat runtime)

License

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.4.0

Apr 14, 2026

This version

0.2.0

Mar 25, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

diplomat_agent-0.2.0.tar.gz (75.4 kB view details)

Uploaded Mar 25, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

diplomat_agent-0.2.0-py3-none-any.whl (39.9 kB view details)

Uploaded Mar 25, 2026 Python 3

File details

Details for the file diplomat_agent-0.2.0.tar.gz.

File metadata

Download URL: diplomat_agent-0.2.0.tar.gz
Upload date: Mar 25, 2026
Size: 75.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for diplomat_agent-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`ddb2bcb0c795691af4791b995c580e675b0e34ff81d6bfc7ca300aeaac83e86e`
MD5	`7f1b7c6959403518a94612337f109cde`
BLAKE2b-256	`ca7f914d4d13d5f48f019dd54a0b614cc58438e80f4128b8628c14d185278502`

See more details on using hashes here.

File details

Details for the file diplomat_agent-0.2.0-py3-none-any.whl.

File metadata

Download URL: diplomat_agent-0.2.0-py3-none-any.whl
Upload date: Mar 25, 2026
Size: 39.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for diplomat_agent-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d80dd43e738e9024a86f54167ffe86910279958e81e786a0809f26b1bfa6ed73`
MD5	`64c60d16a5b77213d1a98d7fb78cdc97`
BLAKE2b-256	`353c318fc3ddd8dad26cfc7ebf8d91ad7706e032917ca4742b3e34fe064309ff`

See more details on using hashes here.

diplomat-agent 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

diplomat-agent

What it finds

Quick start

What counts as a tool call

What counts as a check

CI integration

Generate the registry

Acknowledge a tool call

Frameworks tested

Requirements

Benchmarks

Known limitations

Roadmap

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes