Skip to main content

Local-first runtime controls for Python LLM apps and agents: budget circuit breakers, PII/secrets redaction, tool-call policy checks, rate limits, and audit traces — in 2 lines, no hosted proxy.

Project description

AgentArmor 🛡️

Local-first runtime controls for Python LLM apps and agents.

PyPI Downloads Python versions CI SDK & Framework Compatibility License: MIT

Budget circuit breakers, PII/secrets redaction, tool-call policy checks, rate limits, and audit traces — wrapped around your existing OpenAI / Anthropic / Gemini calls in two lines. No hosted proxy, no account, no extra network hops.

Links: Support Matrix | Security Policy | Examples

AgentArmor demo: a budget circuit breaker firing at its dollar limit and an unsafe call being blocked

Status (v1.6). The budget circuit breaker, output redaction, rate limiter, context guard, and flight recorder are deterministic and production-ready. The detectors — prompt injection, toxicity, unicode, exfiltration, and more — are heuristic, defense-in-depth checks: they reduce risk but are not a complete security boundary, and pattern-based detection is bypassable by design. See Benchmarks & limitations and SECURITY.md. Adversarial test cases and edge-case reports are very welcome.

What is AgentArmor?

AgentArmor is an open-source Python SDK that adds runtime controls around your LLM calls: a hard budget circuit breaker, PII/secrets redaction, tool-call policy checks, rate limiting, and a complete local audit trail of every interaction.

It hooks into the openai and anthropic client libraries in-process, so the controls apply to your existing code — and anything built on those SDKs — without proxies, accounts, or rewrites. Optional defense-in-depth detectors (prompt injection, toxicity, and more) are documented per-feature below, with their limits stated honestly.


Quickstart

Drop-in Mode (Recommended) Two lines. Zero code changes to your existing agent.

import agentarmor
import openai

# 1. Initialize your shields
agentarmor.init(
    budget="$5.00",            # Circuit breaker — kills runaway spend
    shield=True,               # Prompt injection detection
    # ml_shield=True,          # ML-powered injection detection (requires agentarmor[ml])
    filter=["pii", "secrets"], # Output firewall — blocks leaks
    record=True,               # Flight recorder — replay any session
    rate_limit="10/min",       # Rate limiter — Sliding-window throttling
    context_guard=0.95         # Context guard — Pre-flight token limit
)

# 2. Your existing code — no changes needed!
client = openai.OpenAI()
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Analyze this market..."}]
)

# 3. Get your safety and cost report
print(agentarmor.spent())      # e.g. 0.0035
print(agentarmor.remaining())  # e.g. 4.9965
print(agentarmor.report())     # Full cost/security breakdown

# 4. Tear down the shields
agentarmor.teardown()

agentarmor.init() patches the OpenAI and Anthropic SDKs in-process, so every call is tracked and the configured controls are applied automatically.

Works with Google Gemini too — zero code changes:

import agentarmor
from google import genai

agentarmor.init(budget="$5.00", shield=True, filter=["pii", "secrets"])

client = genai.Client()
response = client.models.generate_content(
    model="gemini-2.0-flash",
    contents="Analyze this market..."
)

print(agentarmor.report())  # Gemini calls tracked automatically

Install

pip install agentarmor

Requires Python 3.10+. No external infrastructure dependencies.

Optional Dependencies

pip install agentarmor[gemini]    # Google Gemini support
pip install agentarmor[ml]        # ML-based injection detection (scikit-learn)
pip install agentarmor[toxicity]  # ML-based toxicity detection (detoxify)
pip install agentarmor[drift]     # Semantic drift detection (sentence-transformers)
pip install agentarmor[all]       # All providers + optional features

Drop-in API

Function Description
agentarmor.init(...) Start tracking. Patches OpenAI/Anthropic/Gemini SDKs. Loads chosen shields.
agentarmor.init_from_config(path) Initialize AgentArmor from a YAML/JSON configuration file.
agentarmor.spent() Total dollars spent so far in this session.
agentarmor.remaining() Dollars left in the budget.
agentarmor.report() Full security and cost breakdown as a dictionary.
agentarmor.teardown() Stop tracking, unpatch SDKs, and clean up.
agentarmor.validate_mcp_server(name) Check if an MCP server is trusted.
agentarmor.validate_mcp_tool(name, args) Validate an MCP tool call against policies.
agentarmor.authenticate_mcp_server(name, token) Pre-authenticate an MCP server with an auth token.
agentarmor.spawn_agent(id, parent_id, budget) Register a sub-agent with inherited safety constraints.
agentarmor.end_agent(id) End a sub-agent and roll up its stats to its parent.
agentarmor.compliance_report(framework) Export control-mapped compliance evidence (SOC2/HIPAA/GDPR).
agentarmor.init(strict=True) (v1.3) Raise ConfigurationError on typo'd kwargs with "did you mean?" suggestions.
agentarmor.demo_attacks() (v1.3) Run ~21 synthetic attacks through active config locally; reports per-module block rates.
agentarmor.last_trace() (v1.4) Returns the most recent Explain Mode trace.
agentarmor.find_trace(e) (v1.4) Recover trace from a wrapped exception.
agentarmor.last_trace_status() (v1.4) Diagnostic — answers "why is last_trace() None?".

Strict Mode (v1.3+)

Catches typo'd kwargs at init() time so misconfigured shields don't silently do nothing.

import agentarmor

# Typo: "unicode_sheild" instead of "unicode_shield"
agentarmor.init(strict=True, unicode_sheild=True)
# raises ConfigurationError: unknown kwarg 'unicode_sheild'. Did you mean 'unicode_shield'?

Without strict=True (the default), typo'd kwargs emit a one-time UserWarning and continue — preserving backwards compatibility. Use strict=True in production to catch silent misconfigurations.

Strict mode also hard-rejects case-typos on the strict kwarg itself (Strict=True, STRICT=True) because silently dropping those would defeat the entire validation.


Demo Attacks (v1.3+)

Instantly see your shields working against ~21 hand-curated synthetic attacks — no LLM calls, no API keys needed.

import agentarmor

agentarmor.init(shield=True, filter=["pii"], toxicity=True)
report = agentarmor.demo_attacks()
print(report)
# AgentArmor — Attack Demo Results
# ================================
# shield (prompt injection):    18/20 blocked  (90%)
# filter (PII):                 5/5  blocked  (100%)
# toxicity:                     12/15 blocked  (80%)
# OVERALL:                      35/40 blocked  (87.5%)

demo_attacks() runs each sample through your active before_request hooks locally and reports per-module block rates. It snapshots and restores module state so it won't pollute your report(). This is a smoke test, NOT a security evaluation — see the benchmarks for measured F1/precision/recall against industry datasets.


Explain Mode (v1.4+)

When a shield blocks (or modifies) an LLM call, agentarmor.last_trace() shows you which shields ran, what each decided, and why. Off by default; near-zero overhead when off; production-safe (PII-redacted by default).

import agentarmor

agentarmor.init(shield=True, filter=["pii"], explain=True)

# Your existing OpenAI / Anthropic / Gemini code, no changes
client.chat.completions.create(...)

trace = agentarmor.last_trace()
print(trace.blocked_by)         # "shield" — module that fired (or None)
print(trace.events)              # list of (module, decision, detail, latency_us)
print(trace.silent_modules)      # modules that ran without recording detail
print(trace.closed_reason)       # "after_response" | "blocked" | "stream_close" | "timeout"

When a shield raises, the exception carries the trace:

try:
    client.chat.completions.create(...)
except agentarmor.InjectionDetected as e:
    print(e.trace.blocked_by)    # "shield"
    print(e.trace.events[0].detail)  # {"exception_type": "...", "message": "..."}

If a framework wraps your exception (FastAPI, Celery, Sentry), recover the trace via find_trace:

except Exception as e:
    trace = agentarmor.find_trace(e) or agentarmor.last_trace()

Full Explain Mode reference — module detail coverage, performance numbers, OpenTelemetry export, redaction security notes, troubleshooting, and version compatibility — lives in FEATURES.md.


Features

AgentArmor's controls fall into three tiers, and the tier tells you how much to trust each one:

  • Core controls (below, in full) — deterministic and production-ready: they do exactly what they say on every call.
  • More deterministic controls — same rule-based reliability, summarized in a table here and documented in full in FEATURES.md.
  • Defense-in-depth detectors and experimental modules — heuristic checks and newer research-grade work. Useful as additional layers, bypassable by a determined attacker, never a complete security boundary. See Benchmarks for measured detection and false-positive rates.

💰 Budget Circuit Breaker

Stop unexpected massive bills. Tracks real-time dollar-denominated token usage across requests. When the configured limit is exceeded, it trips the circuit breaker and raises a BudgetExhausted exception.

import agentarmor
from agentarmor.exceptions import BudgetExhausted

agentarmor.init(budget="$5.00")

try:
    # Run your massive agent loop
    run_agent_loop()
except BudgetExhausted:
    print("Agent stopped. Budget limit reached!")

🔒 Output Firewall

Redact sensitive data from model responses. Automatically scans the LLM's response output before it is returned to your application. Redacts PII (emails, SSNs, US/international phone numbers, IPv4 addresses, IBANs) and secrets (API keys, tokens) on the fly.

Scope: this is an output control — it redacts what the model sends back, before your app or logs see it. It does not prevent PII or secrets in your prompt from being sent to the provider; if that matters, redact your inputs before the call. Redaction is regex-based (see limitations).

agentarmor.init(filter=["pii", "secrets"])

# If the LLM tries to output: "Contact me at admin@company.com or use key sk-123456"
# Your app actually receives: "Contact me at [REDACTED:EMAIL] or use key [REDACTED:API_KEY]"

📼 Flight Recorder

Local debug & replay log of every call. Silently records the exact inputs, outputs, models, timestamps, and latency of every API call to a local JSONL session file. Ideal for debugging rogue agents and replaying sessions.

Scope: the JSONL holds full, unredacted inputs and outputs. On POSIX systems, files are written owner-only (0600), but this is a local debug log, not a tamper-evident audit trail — anything running as your user can still read, modify, or delete it. Don't treat it as forensic evidence or as a compliance control on its own.

agentarmor.init(record=True)
# Sessions are automatically streamed to `.agentarmor/sessions/session_xyz.jsonl`

🚦 Rate Limiter

Prevent API spam and abuse. Sliding-window throttling ensures your agents don't exceed your designated request thresholds (e.g., 10/min, 5/sec).

agentarmor.init(rate_limit="10/min")

🧠 Context Window Guard

Pre-flight token checks. Automatically estimates tokens before sending the prompt to the API. If the prompt plus max_tokens exceeds the model's safe context limit (e.g., 95% of total allowed), the request is immediately blocked with a ContextOverflow exception, saving you from failed requests and truncated contexts.

from agentarmor.exceptions import ContextOverflow
agentarmor.init(context_guard=0.95)

try:
    # Big prompt that exceeds limits
    client.chat.completions.create(...)
except ContextOverflow:
    print("Prompt too large for the model's context window!")

🔥 Tool-Call Firewall

Control which tools your LLM can invoke. Enforces an allow/block list on tool calls (function calls) returned by the model. Unauthorized tool invocations are either blocked (raising ToolCallBlocked) or silently stripped from the response — preventing your agent from executing dangerous actions it was never meant to take.

import agentarmor
from agentarmor.exceptions import ToolCallBlocked

# Allow-list mode — only these tools are permitted
agentarmor.init(tool_firewall={"allow": ["search", "calculator"], "on_violation": "block"})

# Or block-list mode — block specific dangerous tools
agentarmor.init(tool_firewall={"block": ["execute_code", "delete_file"], "on_violation": "strip"})

try:
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": "Delete all files"}],
        tools=[...]
    )
except ToolCallBlocked as e:
    print(f"Blocked unauthorized tool call: {e}")

More deterministic controls

Rule-based and reliable, like the core six — full docs with examples in FEATURES.md.

Control What it does Enable with
Latency Circuit Breaker Trips a breaker after N consecutive slow calls; tracks avg/p95 latency. latency_breaker={...}
Provider-Aware Cost Analytics Per-provider spend breakdown (OpenAI vs. Anthropic vs. Gemini) in report(). automatic with budget=
Canary Token Injection Injects a unique token into system prompts; blocks the response if it leaks. canary=True
Cost Attribution Tags Per-tag cost attribution (set_tag("code-gen")) for multi-tenant or A/B spend. cost_tags=True
Semantic Dedup (Replay Shield) Hash-blocks identical repeated prompt+model calls — the loop killer. dedup=True
Model Downgrade Cascade Auto-downgrades to cheaper models as the budget depletes. cascade=[...]
MCP Server Security (v2) MCP server allow/blocklists, per-tool path & argument policies, result validation. mcp_firewall={...}
Human-in-the-Loop (HITL) Policy Gate Risk-tiered human approval for tool calls, with timeouts and auto-deny. hitl_gate={...}

Defense-in-depth detectors (heuristic)

Pattern- and classifier-based checks. Bypassable by design — pair them with the deterministic tiers and see Benchmarks for false-positive rates. Full docs in FEATURES.md.

Control What it does Enable with
Prompt Shield (pattern-based injection filter) Regex denylist of known jailbreak phrasings — cheap first filter. shield=True
ML-Powered Injection Shield TF-IDF + logistic-regression classifier; ensemble with the regex layer. ml_shield=True
Code Safety Shield Pattern-scans generated Python/JS/SQL/shell for dangerous constructs. code_shield=True
Toxicity & Content Safety Filter 7-category toxicity patterns; optional detoxify ML mode. toxicity=True
Hallucination / Grounding Guard n-gram / number / proper-noun overlap vs. source docs to flag hallucinations. grounding={...}
Data Exfiltration Guard Flags base64/hex/zero-width/URL-encoded data smuggling in outputs. exfiltration_guard=True
Tool-Policy & Capability-Request Detection Hard allowed_tools allowlist + heuristic scan for capability-escalation phrasing. privilege_escalation=True
Unicode Injection Shield Zero-width, homoglyph, and bidi-control tricks in inputs. unicode_shield=True
Semantic Drift Detector Embedding-based topic-drift tracking across conversation turns. semantic_drift={...}

Experimental modules

The newest, most research-grade work — APIs and behavior may evolve. Full docs in FEATURES.md.

Control What it does Enable with
Multi-Agent Graph Safety (v2) Budget and policy inheritance across spawned sub-agent trees. agent_graph={...}
Chain-of-Thought Auditor Scans extended-thinking / reasoning traces for misalignment phrasing. cot_auditor=True
Prompt Fuzzer (Red Team Testing) Built-in red-team generator to fuzz your own shields. tools/prompt_fuzzer
Runtime Taint Tracking Data-provenance labels with sink policies (e.g., no PII into send_email). taint_tracker={...}
Honeytools (Deception Rail) Fake tools and credentials as tripwires for compromised agents. honeytools=True
Safe-Plan Engine Structured "why blocked + safer alternative" guidance on blocks. SafePlanEngine
Echo-Chamber Detector Flags hallucinated claims circulating between agents as fake confirmation. echo_chamber={...}
Compliance Evidence Export (SOC2 / HIPAA / GDPR) Maps safety events to SOC2/HIPAA/GDPR control families; JSON evidence export. compliance={...}

📄 Policy-as-Code Configuration

Store your agent's safety parameters in a declarative YAML or JSON file instead of hard-coding them. AgentArmor automatically detects .agentarmor.yml in your working directory.

.agentarmor.yml

budget: 5.00
shield: true
filter:
  - pii
  - secrets
record: true
rate_limit: "10/min"
context_guard: 0.95
import agentarmor
# Loads .agentarmor.yml and initializes all shields
agentarmor.init_from_config()

Integrations

AgentArmor works well with many major Python AI frameworks that route through supported SDK surfaces.

Because AgentArmor monkey-patches the underlying openai, anthropic, and google-genai clients directly at the network level, you often do not need framework-specific callbacks or middleware. Just initialize agentarmor.init() at the top of your script and it will automatically protect frameworks and SDK scripts that use those patched clients.

See SUPPORT_MATRIX.md for the tested provider surfaces and evidence level behind each compatibility claim.

Current ecosystem examples and support notes include:

  • LiteLLM
  • Pydantic AI
  • Google ADK
  • LangChain / LangGraph
  • LlamaIndex
  • CrewAI
  • Agno / Phidata
  • Autogen
  • SmolAgents
  • Google Gemini (via google-genai)
  • Custom raw SDK scripts

Hooks & Middleware

AgentArmor is highly extensible. You can write custom logic that runs exactly before a request leaves or exactly after a response arrives. Because AgentArmor handles the patching, your hooks work uniformly and safely for both OpenAI and Anthropic.

import agentarmor
from agentarmor import RequestContext, ResponseContext

@agentarmor.before_request
def inject_timestamp(ctx: RequestContext) -> RequestContext:
    # Invisibly append context to the system prompt
    ctx.messages[0]["content"] += f"\nToday is Friday."
    return ctx

@agentarmor.after_response
def custom_analytics(ctx: ResponseContext) -> ResponseContext:
    # Send cost and latency data to your custom dashboard
    print(f"Model {ctx.model} cost {ctx.cost}")
    return ctx

@agentarmor.on_stream_chunk
def censor_profanity(text: str) -> str:
    # Mutate streaming chunks in real-time
    return text.replace("badword", "*******")
    
agentarmor.init()

Supported Models

Built-in automated tracking for standard models across the major providers. Supports both the Chat Completions API and the newer OpenAI Responses/Agents API surface.

Provider Models API Surfaces
OpenAI gpt-4.5, o3-mini, gpt-4o, gpt-4o-mini, gpt-4-turbo, gpt-3.5-turbo Chat Completions, Responses API
Anthropic claude-4, claude-opus-4, claude-sonnet-4-5, claude-haiku-4-5 Messages
Google gemini-2.0-pro, gemini-2.0-flash, gemini-1.5-pro, gemini-1.5-flash GenerateContent

Note: For models not explicitly listed, generic conservative fallback pricing is used.


Benchmarks

These are reproducible evals on public datasets, run with the shipping configuration — shown with false-positive rates and the places we lose, not just the wins. High recall on some sets (AdvBench, HarmBench) comes with real false-positive cost on others (JailbreakBench, RealToxicityPrompts, HaluEval). The detectors are defense-in-depth, not a security guarantee.

Tested against 10 industry datasets + 2 synthetic benchmarks (5,100+ samples) spanning prompt injection, toxicity, hallucination, data exfiltration, and unicode attacks. Full results at benchmarks/README.md.

Head-to-head comparison — AgentArmor vs LlamaGuard 3 and OpenAI Moderation across six datasets with bootstrap F1 CIs, balance-aware metrics (MCC + balanced-accuracy on imbalanced sets), per-dataset operating-point naming, and honest loss annotations: BENCHMARKS_HEAD_TO_HEAD.md. (Perspective API was dropped from v1 — Google/Jigsaw announced sunset with API EOL 2026-12-31.) Methodology in tasks/head-to-head-report/SPEC.md; operations in RUNBOOK.md.

Harmful Content Detection (Combined: Shield + ML Shield + Toxicity)

Benchmark Samples Precision Recall F1 FP Rate
AdvBench 200 100.0% 91.9% 95.8% 0.0%
HarmBench 200 100.0% 90.0% 94.7% 0.0%
Fuzzer Self-Test 148 97.4% 86.7% 91.7% 15.0%
JailbreakBench 200 70.2% 73.0% 71.6% 31.0%

Toxicity & Bias Detection (Built-in ML classifier)

Benchmark Type Precision Recall F1 FP Rate
ToxiGen Implicit hate speech (13 groups) 100.0% 58.5% 73.8% 0.0%
RealToxicityPrompts Subtle toxicity 54.8% 51.0% 52.8% 42.0%

Hallucination Detection (Grounding + TF-IDF semantic similarity)

Benchmark Type Precision Recall F1 FP Rate
TruthfulQA Factual grounding (817 Q&A) 100.0% 56.9% 72.5% 0.0%
HaluEval QA/dialogue/summarization 62.7% 84.0% 71.8% 50.0%

Specialized Detectors

Benchmark Type Precision Recall F1 FP Rate
Exfiltration Base64/hex/steganography/URL 100.0% 100.0% 100.0% 0.0%
Unicode Injection Zero-width/homoglyph/bidi/tags 100.0% 91.2% 95.4% 0.0%

Run benchmarks yourself: pip install datasets scikit-learn && python benchmarks/run_industry_benchmarks.py


The Problem

AI agents are unpredictable by design. A user might try to hijack your system prompt. The model might hallucinate an API key. An agent might get stuck in an infinite loop and make 300 LLM calls.

  1. The Hijack Problem — Users type "ignore previous instructions" and take control of your LLM.
  2. The Output Leak Problem — Your agent accidently regurgitates a real customer's SSN or an OpenAI API key it saw in context.
  3. The Loop Problem — A stuck agent makes 200 LLM calls in 10 minutes. $50-$200 down the drain before anyone notices.
  4. The Invisible Spend — Tokens aren't dollars. gpt-4o costs 15x more than gpt-4o-mini.

AgentArmor fills the gap: real-time, in-memory, deterministic controls that cap spend, redact secrets, and kill runaway sessions — plus defense-in-depth detectors for injection and unsafe output as an additional layer.

Design Philosophy

  • Zero infrastructure. No Redis, no servers, no cloud accounts. AgentArmor is a pure Python library that runs entirely in your process.
  • Zero code changes. You don't rewrite your codebase to use a special client. Just call agentarmor.init() and the controls apply to your existing code.
  • Data stays local. Everything runs in-memory and on-disk. Your prompts and responses never leave your machine.
  • Framework agnostic. Works with any framework that uses the openai, anthropic, or google-genai SDKs under the hood — no vendor lock-in.

License

MIT License

Ship your agents with confidence. Set a budget. Set your shields. Move on.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agentarmor-1.6.3.tar.gz (1.4 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

agentarmor-1.6.3-py3-none-any.whl (158.7 kB view details)

Uploaded Python 3

File details

Details for the file agentarmor-1.6.3.tar.gz.

File metadata

  • Download URL: agentarmor-1.6.3.tar.gz
  • Upload date:
  • Size: 1.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for agentarmor-1.6.3.tar.gz
Algorithm Hash digest
SHA256 6f41e98cb8da5e3288d3e32c159d3e9be7c465d7252c6d6d08d9f0f074a8014b
MD5 c7403a8dd428cedfb15423bb65f0437c
BLAKE2b-256 f1d22d38497d6473871561025f952b8f14745145f8c64857dfbc31586eeebd50

See more details on using hashes here.

Provenance

The following attestation bundles were made for agentarmor-1.6.3.tar.gz:

Publisher: publish.yml on ankitlade12/AgentArmor

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file agentarmor-1.6.3-py3-none-any.whl.

File metadata

  • Download URL: agentarmor-1.6.3-py3-none-any.whl
  • Upload date:
  • Size: 158.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for agentarmor-1.6.3-py3-none-any.whl
Algorithm Hash digest
SHA256 3e5a0f1ecc3c91e6bcc5e39e864e9fc440df8b91e654bbab731194258a94d3a8
MD5 a72df9ed097d9782b669933d494f15ba
BLAKE2b-256 1de1ca8dd68dc2f9bf803109dd146eb3e9cfa741b12c9e83da337bf1dbc7b2c5

See more details on using hashes here.

Provenance

The following attestation bundles were made for agentarmor-1.6.3-py3-none-any.whl:

Publisher: publish.yml on ankitlade12/AgentArmor

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page