Guardrails for AI agents โ hallucination detection, tool call validation, circuit breakers, rate limiting, and budget enforcement. One decorator. Any LLM framework.
Project description
๐ก๏ธ agentguard
Runtime budget control and tool-call reliability for AI agents
AI agents overspend, call tools with wrong parameters, and trust broken tool responses. agentguard is a lightweight Python runtime that keeps agent runs inside budget and makes tool calls trustworthy with spend caps, response verification, validation, retries, and tracing.
Works with OpenAI, Anthropic, OpenRouter, Groq, Together AI, Fireworks AI, LangChain, MCP, or any Python function. Only dependency: pydantic.
from agentguard import guard
@guard(validate_input=True, verify_response=True, max_retries=3)
def search_web(query: str) -> dict:
return requests.get(f"https://api.search.com?q={query}").json()
What agentguard is for
agentguard has two core jobs:
- Keep agent runs inside budget with per-call, per-session, and shared multi-agent spend controls.
- Make tool calls trustworthy with input validation, output validation, response verification, and execution safeguards.
Everything else in the library supports those two outcomes: retries, circuit breakers, rate limits, tracing, telemetry, benchmarking, and generated tests.
Why teams reach for agentguard
| Problem | How AI agents fail today | How agentguard fixes it |
|---|---|---|
| Cost spirals & runaway spending | One prompt change, retry loop, or model escalation causes a surprise bill | Per-call and per-session budget enforcement, real usage-based LLM spend tracking, and shared multi-agent budget pools |
| Malformed tool responses | Tool returns missing fields, schema drift, or anomalous values โ no error raised | Multi-signal response verification (timing, schema, patterns, statistical anomalies) |
| Invalid tool parameters | Agent passes wrong types or missing fields | Automatic input/output validation from Python type hints + Pydantic schemas |
| Cascading failures | One failing tool takes down the entire agent | Circuit breakers with CLOSED โ OPEN โ HALF_OPEN state machine |
| API rate limit violations | Agent exceeds rate limits, gets blocked | Token bucket rate limiting (per-second, per-minute, per-hour) |
| No regression tests | 40% of agent projects fail with no test suite | Auto-generate pytest tests from production traces |
| Framework lock-in | Each LLM framework has its own observability | Framework-agnostic โ works with OpenAI, Anthropic, LangChain, MCP, or raw functions |
The Problem in Numbers
- 82.6% of Stack Overflow questions about AI agents have no accepted answer (arXiv)
- 40-95% of agent projects fail between prototype and production
- 0 widely-adopted open-source libraries focused on runtime tool response verification in Python agents
Install agentguard
pip install awesome-agentguard
With optional integrations:
pip install awesome-agentguard[all] # OpenAI + Anthropic + LangChain integrations
pip install awesome-agentguard[costs] # LiteLLM-backed real LLM cost tracking
pip install awesome-agentguard[rich] # Colour terminal output
pip install awesome-agentguard[dashboard] # Local trace dashboard extras
Requirements: Python 3.10+ ยท Only dependency: pydantic>=2.0
Quick Start
1. Put a hard cap on agent spend
Use TokenBudget when you want the run to stop before a retry loop or model change burns money:
import os
from agentguard import TokenBudget
from agentguard.integrations import guard_openai_client
from openai import OpenAI
budget = TokenBudget(
max_cost_per_session=5.00,
max_calls_per_session=100,
alert_threshold=0.80,
)
client = guard_openai_client(
OpenAI(api_key=os.getenv("OPENAI_API_KEY")),
budget=budget,
)
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Summarise this document"}],
)
print(budget.session_spend)
2. Guard tool calls with one decorator
from agentguard import guard
@guard
def get_weather(city: str) -> dict:
"""Every call is now traced, timed, and validated."""
return {"temperature": 72, "city": city}
result = get_weather("NYC")
3. Verify tool responses against expected contracts
Detect when a tool response violates what you've defined as normal โ anomalous execution timing, missing required fields, pattern mismatches, or statistically unusual values. Useful for catching schema drift, API contract changes, integration bugs, and misconfigured mocks.
from agentguard import ResponseVerifier
verifier = ResponseVerifier(threshold=0.6)
# Register what normal responses look like for this tool
verifier.register_tool(
"get_weather",
expected_latency_ms=(100, 5000), # Real API: 100msโ5s
required_fields=["temperature", "humidity"],
response_patterns=[r'"temperature":\s*-?\d+'],
)
# Check a response that came back suspiciously fast and incomplete
result = verifier.verify(
tool_name="get_weather",
execution_time_ms=0.3, # 0.3ms โ no network call happened
response={"temperature": 72, "conditions": "sunny"},
)
print(result.is_anomalous) # True (missing "humidity", sub-ms timing)
print(result.confidence) # 0.95
print(result.reason) # "Execution time 0.30ms is below the 2ms minimum for real I/O..."
4. Production-ready protection for tool execution
from agentguard import guard, CircuitBreaker, TokenBudget, RateLimiter
@guard(
validate_input=True,
validate_output=True,
verify_response=True, # checks timing, schema, patterns
max_retries=3,
timeout=30.0,
budget=TokenBudget(
max_cost_per_session=5.00,
max_calls_per_session=100,
alert_threshold=0.80,
).config,
circuit_breaker=CircuitBreaker(
failure_threshold=5,
recovery_timeout=60,
).config,
rate_limit=RateLimiter(
calls_per_minute=30,
).config,
record=True, # Save traces for test generation
)
def query_database(sql: str, limit: int = 100) -> list[dict]:
return db.execute(sql, limit=limit)
5. Auto-generate pytest tests from production traces
Record real agent executions, then auto-generate a pytest test suite for regression testing:
from agentguard import TraceRecorder, record_session
from agentguard.testing import TestGenerator
# Record during production
with record_session("./traces", backend="sqlite"):
result = query_database("SELECT * FROM users LIMIT 10")
result = get_weather("San Francisco")
# Generate test file
generator = TestGenerator(traces_dir="./traces")
generator.generate_tests(output="tests/test_generated.py")
By default, SQLite-backed recording writes to ./traces/agentguard_traces.db. Use trace_backend="jsonl" when you need legacy file-per-session traces.
Generated test file:
"""Auto-generated test suite from agentguard production traces."""
def test_query_database_0():
"""Recorded: query_database('SELECT * FROM users LIMIT 10', limit=100)"""
result = query_database("SELECT * FROM users LIMIT 10", limit=100)
assert isinstance(result, list)
def test_get_weather_0():
"""Recorded: get_weather('San Francisco')"""
result = get_weather("San Francisco")
assert isinstance(result, dict)
assert "temperature" in result
6. Fluent test assertions for agent tool calls
from agentguard import assert_tool_call
# Build assertions on recorded trace entries
assert_tool_call(entry).succeeded().within_ms(5000).returned_dict().has_keys("temperature", "humidity")
7. Replay and diff agent traces
from agentguard.testing import TraceReplayer
replayer = TraceReplayer(traces_dir="./traces")
results = replayer.replay_all(tools={"get_weather": get_weather})
for r in results:
print(f"{r['tool_name']}: {'PASS' if r['match'] else 'FAIL'}")
LLM Framework Integrations
Any OpenAI-Compatible Provider (OpenRouter, Groq, Together, Fireworks, etc.)
agentguard works with any OpenAI-compatible API out of the box. One integration covers 10+ providers:
from openai import OpenAI
from agentguard.integrations import guard_tools, Providers
# Same tools work across ALL providers โ just change the provider
executor = guard_tools([search_web, get_weather])
# OpenRouter (300+ models)
client = OpenAI(base_url="https://openrouter.ai/api/v1", api_key=os.getenv("OPENROUTER_API_KEY"))
# Groq (ultra-low latency)
client = OpenAI(base_url="https://api.groq.com/openai/v1", api_key=os.getenv("GROQ_API_KEY"))
# Together AI, Fireworks, DeepInfra, Mistral, xAI โ same pattern
client = OpenAI(**Providers.TOGETHER.client_kwargs())
response = client.chat.completions.create(
model="llama-3.3-70b-versatile",
tools=executor.tools,
messages=[{"role": "user", "content": "Search for Python tutorials"}],
)
results = executor.execute_all(response.choices[0].message.tool_calls)
Built-in provider presets: OpenAI, OpenRouter, Groq, Together AI, Fireworks AI, DeepInfra, Mistral, Perplexity, Novita AI, xAI โ or define your own with Provider(name=..., base_url=..., env_key=...).
OpenAI Function Calling with Guardrails
from agentguard.integrations import guard_openai_tools, OpenAIToolExecutor
# Wrap your tools for OpenAI function calling
executor = OpenAIToolExecutor()
executor.register(search_web).register(get_weather)
response = client.chat.completions.create(
model="gpt-4o",
messages=messages,
tools=executor.tools,
)
# Execute all tool calls with guards
results = executor.execute_all(response.choices[0].message.tool_calls)
Real LLM Cost Tracking
Wrap supported provider clients to record real token usage and pricing directly from API responses:
import os
from openai import OpenAI
from agentguard import InMemoryCostLedger, TokenBudget
from agentguard.integrations import guard_openai_client
budget = TokenBudget(max_cost_per_session=5.00, max_calls_per_session=100)
budget.config.cost_ledger = InMemoryCostLedger()
client = guard_openai_client(
OpenAI(api_key=os.getenv("OPENAI_API_KEY")),
budget=budget,
)
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Summarise this page"}],
)
print(budget.session_spend)
Pricing resolution order:
model_pricing_overrides- LiteLLM pricing data
- explicit
cost_per_call - otherwise usage is tracked and cost is marked unknown
Anthropic Claude Tool Use with Guardrails
from agentguard.integrations import guard_anthropic_tools, AnthropicToolExecutor
tools = guard_anthropic_tools([search_web, get_weather])
executor = AnthropicToolExecutor({"search_web": search_web, "get_weather": get_weather})
LangChain Agent Tool Validation
from agentguard.integrations import GuardedLangChainTool, guard_langchain_tools
# Wrap existing LangChain tools
guarded = guard_langchain_tools([my_search_tool, my_db_tool])
MCP (Model Context Protocol) Server Guards
from agentguard.integrations import GuardedMCPServer
# Wrap an MCP server with guards
guarded_server = GuardedMCPServer(original_server, guards={
"search": {"validate_input": True, "max_retries": 2},
"database_query": {"budget": budget_config, "circuit_breaker": cb_config},
})
Architecture โ How agentguard Protects AI Agent Tool Calls
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Your AI Agent โ
โ (OpenAI / Anthropic / LangChain / etc.) โ
โโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ tool call
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ @guard decorator โ
โ โ
โ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โ
โ โ Circuit โ โ Rate โ โ Budget โ โ
โ โ Breaker โ โ Limiter โ โ Enforcer โ โ
โ โโโโโโโโฌโโโโโโโ โโโโโโโโฌโโโโโโโ โโโโโโโโฌโโโโโโโ โ
โ โ โ โ โ
โ โผ โผ โผ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ Input Validation โ โ
โ โ (type hints + Pydantic schemas) โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ โ
โ โผ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ Execute with Retry + Timeout โ โ
โ โ (exponential backoff, jitter) โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ โ
โ โผ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ Response Verification โ โ
โ โ (timing, schema, patterns, anomaly score) โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ โ
โ โผ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ Output Validation โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ โ
โ โผ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ Trace Recording โ Test Generation โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
Your actual tool
CLI โ Inspect Agent Traces and Generate Tests
# Initialize a SQLite trace store
agentguard traces init ./traces
# List recorded agent traces
agentguard traces list ./traces
# Show trace details for a session
agentguard traces show agent_run_001 ./traces
# Get latency and failure statistics
agentguard traces stats ./traces
# Generate JSON report
agentguard traces report ./traces --output report.json
# Import legacy JSONL traces into SQLite
agentguard traces import ./legacy_traces ./traces
# Export traces for replay or offline analysis
agentguard traces export ./traces --output-dir ./trace-export
# Run the local dashboard
agentguard traces serve ./traces --port 8765
# Auto-generate pytest test suite from traces
agentguard generate ./traces --output tests/test_generated.py
Full API Reference
Core โ Guard Decorator and Tool Registry
| Component | Description |
|---|---|
@guard |
Decorator that wraps any Python function with the full protection stack |
GuardConfig |
Configuration dataclass for all guard options |
GuardedTool |
The wrapper class created by @guard |
ToolRegistry |
Global registry for tool discovery, stats, and health checks |
Validators โ Response Verification and Schema Validation
| Component | Description |
|---|---|
ResponseVerifier |
Multi-signal response anomaly detection: timing, schema, patterns, statistical values |
SchemaValidator |
Automatic type-hint and Pydantic-based input/output validation |
SemanticValidator |
Register custom semantic validation checks per tool |
CustomValidator |
Compose arbitrary validation functions into the pipeline |
Guardrails โ Circuit Breaker, Rate Limiter, Budget Control
| Component | Description |
|---|---|
CircuitBreaker |
CLOSED โ OPEN โ HALF_OPEN state machine to prevent cascading failures |
RateLimiter |
Token bucket with per-second/minute/hour rate limiting |
TokenBudget |
Per-call and per-session cost and call-count budget enforcement |
RetryPolicy |
Exponential backoff with jitter and configurable exception filtering |
timeout |
Thread-based (sync) and asyncio (async) timeout enforcement |
Testing โ Trace Recording and Test Generation
| Component | Description |
|---|---|
TraceRecorder |
Context manager for recording production agent traces |
TraceReplayer |
Replay recorded traces against live tools to detect regressions |
TestGenerator |
Auto-generate pytest test files from production traces |
assert_tool_call() |
Fluent assertion builder for trace entries |
Reporting โ Metrics and Observability
| Component | Description |
|---|---|
ConsoleReporter |
Rich-powered colour terminal tables |
JsonReporter |
JSON reports with latency percentiles and anomaly detection |
Comparison with Other AI Agent Safety Tools
| Feature | agentguard | guardrails-ai | NeMo Guardrails | AgentCircuit | Langfuse | LangSmith |
|---|---|---|---|---|---|---|
| Response anomaly detection | โ Multi-signal | โ Text-only | โ | โ | โ | โ |
| Tool call input/output validation | โ Type hints + Pydantic | โ Validators | โ | โ Pydantic | โ | โ |
| Framework-agnostic | โ Any function | โ | โ | โ | โ | โ LangChain-first |
| Circuit breaker | โ | โ | โ | โ | โ | โ |
| Rate limiting | โ | โ | โ | โ | โ | โ |
| Budget enforcement | โ Per-call + session | โ | โ | โ Global | Token tracking | Token tracking |
| Auto test generation | โ From traces | โ | โ | โ | โ | โ |
| Zero dependencies* | โ pydantic only | โ Many | โ NVIDIA stack | โ | โ | โ |
| Self-hosted | โ | โ | โ | โ | โ | โ |
| Open source | โ MIT | โ | โ Apache | โ MIT | โ | โ |
*Core library requires only pydantic>=2.0. No NVIDIA dependencies, no cloud services, no API keys needed.
Who Is This For?
- AI/ML Engineers building production agent systems with OpenAI, Anthropic, or open-source LLMs
- Backend Developers adding LLM-powered features who need reliability guarantees
- Platform Teams managing multi-agent deployments with cost and safety concerns
- Researchers studying agent reliability, response integrity, and tool-call verification
Contributing
See CONTRIBUTING.md for development setup and guidelines.
git clone https://github.com/rigvedrs/agentguard.git
cd agentguard
pip install -e ".[dev]"
pytest
License
MIT โ see LICENSE for details.
Stop trusting your AI agents blindly.
โญ Star on GitHub ยท Install from PyPI ยท Report a Bug ยท Request a Feature
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file awesome_agentguard-0.2.0.tar.gz.
File metadata
- Download URL: awesome_agentguard-0.2.0.tar.gz
- Upload date:
- Size: 332.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
983a75003ac896d09e272c65a69d48af78ccc2755d7d90ba2c11a375b57adb22
|
|
| MD5 |
c1b3df95e5c5f319476887099f9b3249
|
|
| BLAKE2b-256 |
0d29519f319670d0827e3d200801dbe302291423a71e3aa44922c0cf997618b2
|
File details
Details for the file awesome_agentguard-0.2.0-py3-none-any.whl.
File metadata
- Download URL: awesome_agentguard-0.2.0-py3-none-any.whl
- Upload date:
- Size: 181.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fdb2ab2dc1eb00925896d8f571079da293fd9ac3cf62bc647871712dcbd3c96b
|
|
| MD5 |
1c0a77e6af437d87e3c7bf6d235959e8
|
|
| BLAKE2b-256 |
20ba534833a68da6ce67bbeea1e2ae88e4bc2546cded8ccfdf56734574ab9e9a
|