Security observability layer for LangGraph and Anthropic SDK AI agents

These details have not been verified by PyPI

Project links

Project description

AgentMoat

A security layer for AI agents. AgentMoat detects prompt injection, firewalls dangerous tool calls, scores cross-agent trust, and keeps a tamper-evident audit trail — across the Anthropic & OpenAI SDKs, LangGraph, and any MCP server. Drop it in with a one-line change; no rewrite of your agent logic.

AgentMoat blocking a prompt-injection attack at the tool layer

Why

Autonomous agents take actions — they read documents, call tools, write files, hit APIs. That turns a prompt-injection string in a web page or a tool result into a way to make the agent do something, not just say something. Most teams have no visibility into what their agents are doing and no enforcement layer between the model's decision and the action. AgentMoat is that layer.

Install

git clone https://github.com/Shashank-016/agentmoat
cd agentmoat
pip install -e ".[langgraph,openai]"   # extras optional; base install works on its own

Quick start (30 seconds)

import anthropic
from agentmoat import GuardedClient

# Wrap your existing client — same interface as anthropic.Anthropic()
client = GuardedClient(
    anthropic.Anthropic(),
    agent_id="researcher",
    policy_path="policy.yaml",   # optional
    mode="observe",              # "observe" | "enforce" | "interactive"
)

# Use it exactly as before. AgentMoat scans inputs, checks tool calls,
# logs every event, and (in enforce mode) blocks dangerous actions.
resp = client.messages.create(
    model="claude-haiku-4-5-20251001",
    max_tokens=512,
    messages=[{"role": "user", "content": "Summarize this document..."}],
)

See it catch a real attack:

python examples/mcp_proxy_demo.py
# An agent reads a poisoned document and tries a privileged write —
# AgentMoat blocks it at the tool layer and prints a session report.

Add AgentMoat to your own agent

One line at the point you create your client or graph. Everything downstream is instrumented.

Anthropic SDK

from anthropic import Anthropic
from agentmoat import GuardedClient

client = GuardedClient(Anthropic(), agent_id="my-agent", policy_path="policy.yaml", mode="enforce")

OpenAI SDK

from openai import OpenAI
from agentmoat import GuardedOpenAI

client = GuardedOpenAI(OpenAI(), agent_id="my-agent", policy_path="policy.yaml", mode="enforce")

LangGraph — attach the callback to any graph/runnable:

from agentmoat import AgentMoatCallback

graph.invoke(state, config={"callbacks": [AgentMoatCallback(session_id="run-1")]})

Any MCP tool server — run AgentMoat as a transparent proxy, no agent code change at all. Point your MCP client at AgentMoat instead of the real server:

agentmoat mcp proxy stdio \
  --upstream-cmd "npx -y @modelcontextprotocol/server-filesystem /data" \
  --agent-id my-agent \
  --policy policy.yaml \
  --mode enforce

Async variants (AsyncGuardedClient, AsyncGuardedOpenAI) and streaming are supported with the same interface. Events flow to an in-memory bus, an optional SQLite store, and a hash-chained JSONL audit log; view them via the bundled FastAPI service and React dashboard (see below).

Policy File

version: "1"
agents:
  researcher:
    allowed_tools: [web_search, read_file]
    denied_tools:  [write_file, execute_code]
    rate_limits:
      web_search: 10/minute

  writer:
    allowed_tools: [write_file, read_file]
    denied_tools:  [web_search, execute_code]

Argument constraints

Tool names are only half the story — write_file("/etc/crontab", payload) passes a name-level check for any agent allowed to use write_file. ToolPolicyEngine.check_arguments() inspects the arguments of a tool call, combining always-on built-in detectors with per-tool rules declared in the policy file:

agents:
  writer:
    tool_constraints:
      write_file:
        path_allowlist: ["/tmp/**", "./output/**"]   # only these globs are permitted
        path_denylist:  ["/etc/**", "~/.ssh/**"]      # these are always blocked
        max_arg_length: 10000                         # flag oversized argument values
      fetch:
        url_denylist: ["169.254.169.254", "localhost", "10.*"]
        # url_allowlist, arg_denylist also supported

Built-in detectors run on every tool call regardless of configuration:

Detector	Flag	Triggers on
Path traversal	`constraint:path_traversal`	`../`, `..\`, or URL-encoded `%2e%2e` in any argument
SSRF targets	`constraint:ssrf_target`	URLs/hosts pointing at `169.254.169.254`, `localhost`, `127.0.0.1`, RFC-1918 ranges, `metadata.google.internal`
Shell metacharacters	`constraint:shell_metachar`	`;`, `\|`, `&&`, `, `$(`, `>`, `<` in arguments to tools whose name suggests command execution (`exec`, `shell`, `command`, `run`, `bash`, `sh`)
Sensitive path access	`constraint:sensitive_path`	`/etc/`, `/root/`, `~/.ssh`, `id_rsa`, `.env`, `credentials`, `/proc/`

Violations are emitted as policy_violation events with severity="critical" and raise AgentMoatException in enforce mode — both from the SDK wrappers (checked against the arguments the model produced, before the agent runtime executes the tool) and from the MCP proxy (checked before the call is forwarded upstream, where blocking actually prevents execution).

What Gets Detected

Threat	Detection Method	Default Severity
Jailbreak attempt	Regex: "ignore previous instructions", "you are now DAN"	Critical
System prompt exfiltration	Regex: "print your system prompt", "repeat everything above"	Critical
Role override	Regex: "act as if you have no restrictions"	Critical
Tool abuse via injection	Regex: "call the write_file tool"	Critical
Indirect injection (docs, web)	Regex + embedding similarity	Warning/Critical
Tool policy violation	YAML policy engine	Critical
Rate limit exceeded	Sliding window counter	Critical
Low-trust agent calling sensitive tools	Trust score degradation	Warning
Multi-agent trust chain poisoning	Multiplicative provenance tracking	Warning

Response modes

Every guarded client, callback, and the MCP proxy take a mode:

Mode	Behavior
`"observe"` (default)	Detect and log everything — never interrupts the agent.
`"enforce"`	Raise `AgentMoatException` (or return a JSON-RPC error from the MCP proxy) on any hard violation. A fixed, pre-decided policy.
`"interactive"`	Route violations to a human (or programmatic approver) for a real-time decision via `ApprovalGate`. A "deny" blocks the call just like enforce mode; an "approve" lets it through.

"interactive" mode is for situations where a blanket policy is too coarse — let a human apply judgment to a specific borderline case instead of pre-encoding every exception:

from agentmoat import GuardedClient, ApprovalGate
from agentmoat.control import ApprovalRequest, ApprovalDecision

def slack_approval_handler(request: ApprovalRequest) -> ApprovalDecision:
    # Post to Slack, wait for a thumbs-up/thumbs-down reaction, etc.
    ...
    return "approve"  # or "deny"

client = GuardedClient(
    anthropic.Anthropic(),
    agent_id="researcher",
    mode="interactive",
    approval_gate=ApprovalGate(handler=slack_approval_handler),
)

Each request emits approval_required, then approval_granted or approval_denied, so the full decision trail lands in the audit log. The default handler (when no approval_gate= is supplied) prompts on the CLI with a y/N confirmation — fine for local development, but register your own handler (Slack, a web UI, a queue) for anything running unattended. A misbehaving or exception-raising handler defaults to "deny" — approval gates fail closed.

Note: trust_flag warnings never hard-block in enforce mode (a low trust score alone shouldn't halt an agent), but in interactive mode they still route through the approval gate — a human's explicit "deny" blocks the call. This gives interactive mode finer-grained control than a blanket policy.

Kill switch

Independent of mode, any session — or every session in the process — can be halted immediately via KillSwitch:

from agentmoat.control import get_default_kill_switch

switch = get_default_kill_switch()
switch.kill_session("session-123")   # halt one session
switch.kill_all()                    # halt every session in this process
switch.revive_session("session-123") # restore it
switch.status()                      # {"global": False, "killed_sessions": [...]}

A killed session's next intercepted action raises AgentMoatKilled (a subclass of AgentMoatException) — or, for the MCP proxy, returns a JSON-RPC error (AGENTMOAT_SESSION_KILLED) — before any API call or tool execution happens. A critical session_end event with flags=["kill:tripped"] is emitted first, so the halt is visible in the audit trail.

The same switch is reachable over HTTP once the audit API is running:

curl -X POST http://localhost:8000/control/kill/session-123
curl -X POST http://localhost:8000/control/kill-all
curl -X POST http://localhost:8000/control/revive/session-123
curl http://localhost:8000/control/status

These endpoints affect sessions in the API process only — a multi-process deployment needs a shared backing store (see Roadmap) for one trip to halt every worker. They're also unauthenticated for now; put them behind your own auth/network controls before exposing them.

Tamper-evident audit log

AuditLogger (passed via audit_log= to any guarded client/callback) writes one JSON object per line to a durable JSONL file. By default (chained=True) every record also carries prev_hash — the SHA-256 record_hash of the previous line, with a genesis value of 64 zeros for the first line in a fresh file — and its own record_hash, a digest over the record's canonical JSON plus prev_hash. Editing or deleting any line breaks the link to the next record, so tampering is always detectable, not just guessable. The chain survives process restarts (it resumes from the last line on disk) and rotations (the new file's first record continues from the rotated file's last hash).

agentmoat audit verify agentmoat_audit.jsonl
# ✓ Chain intact — 1,432 records verified
#   (or, if a line was edited or removed:)
# ✗ Chain broken at line 87 — record was modified or a prior line was deleted

agentmoat audit tail agentmoat_audit.jsonl -n 50
agentmoat audit stats agentmoat_audit.jsonl   # counts by event_type and severity

This gives you a forensic trail suitable for SOC 2 / ISO 27001 evidence: an auditor (or an incident responder) can independently confirm that the log they're looking at is the complete, unaltered record AgentMoat produced — not a reconstruction. It does not, by itself, prove who tampered with a file; pair it with filesystem-level access controls and off-host replication for full chain-of-custody guarantees.

Running the API + Dashboard

# 1. Install
pip install -e ".[langgraph]"

# 2. Start the audit API
uvicorn api.main:app --reload

# 3. Start the dashboard
cd dashboard
npm install
npm run dev
# → http://localhost:5173

# 4. Run the demo
python examples/langgraph_demo.py

See dashboard/README.md for dashboard-specific setup, including how to authenticate against an API started with AGENTMOAT_API_KEY set.

Running Tests

pytest

Trust Scoring

AgentMoat tracks information provenance across agent hops. When a session processes external content (a file, a web page, a user upload), its trust score degrades:

Initial:          1.0  (TRUSTED  — human instructions)
After file read:  0.3  (EXTERNAL — external content processed)
After handoff:    0.21 (EXTERNAL — downstream agent inherits low trust)
After injection:  0.0  (UNTRUSTED — flagged)

When trust drops below 0.5, any attempt to call a sensitive tool (write, execute, send, delete) emits a trust_flag warning even if the tool is otherwise policy-allowed.

Roadmap

OpenAI SDK support — GuardedOpenAI / AsyncGuardedOpenAI wrap openai.OpenAI / AsyncOpenAI
Async GuardedClient — AsyncGuardedClient wraps AsyncAnthropic for async codebases
Streaming support — GuardedStream / AsyncGuardedStream intercept messages.stream()
MCP server integration — transparent stdio + SSE proxy for Model Context Protocol
Tamper-evident audit log — SHA-256 hash-chained JSONL with agentmoat audit verify
Human-in-the-loop approval — mode="interactive" routes violations through ApprovalGate
Kill switch — halt any session (or every session) immediately, programmatically or via /control
OpenTelemetry export — emit spans/traces to any OTEL-compatible backend
Multi-process bus — Redis-backed EventBus for distributed agent deployments
Slack/PagerDuty alerting — push critical events to on-call channels
SARIF export — machine-readable security findings for CI integration
Policy hot-reload — watch policy.yaml for changes without restart

License

MIT — see LICENSE

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

Jun 10, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agentmoat-0.1.0.tar.gz (358.6 kB view details)

Uploaded Jun 10, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

agentmoat-0.1.0-py3-none-any.whl (81.8 kB view details)

Uploaded Jun 10, 2026 Python 3

File details

Details for the file agentmoat-0.1.0.tar.gz.

File metadata

Download URL: agentmoat-0.1.0.tar.gz
Upload date: Jun 10, 2026
Size: 358.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.0

File hashes

Hashes for agentmoat-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`c3355692bb1de761217d124d47bc8a819f2ef0a91c09eb874297b58b202989b6`
MD5	`f50768c5619e82a4fd05ecee694fec1e`
BLAKE2b-256	`31315ced29841efae8e5fcc9f8de9db5ff05d70f1de52275d069f1c746038fe1`

See more details on using hashes here.

File details

Details for the file agentmoat-0.1.0-py3-none-any.whl.

File metadata

Download URL: agentmoat-0.1.0-py3-none-any.whl
Upload date: Jun 10, 2026
Size: 81.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.0

File hashes

Hashes for agentmoat-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`98b8f1577b9fc7c339131df1fcbaa2be1073d4c90df99feefeaaaee785a8e3ee`
MD5	`20bd5dbe98406bf70ba62fcc3663ba7c`
BLAKE2b-256	`7815378156b2fcd1c39f69112cf7ee9c2dee5144f76e8ff5a468441b3fe11c5a`

See more details on using hashes here.

agentmoat 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

AgentMoat

Why

Install

Quick start (30 seconds)

Add AgentMoat to your own agent

Policy File

Argument constraints

What Gets Detected

Response modes

Kill switch

Tamper-evident audit log

Running the API + Dashboard

Running Tests

Trust Scoring

Roadmap

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes