Production-grade fault tolerance for AI agents — circuit breakers, LLM-aware retry, idempotency, loop detection, fallback chains, async support, health monitoring, and budget enforcement for LangChain, AutoGen, CrewAI, and any LLM pipeline
Project description
agentguard — Production-Grade Fault Tolerance for AI Agents
agentguard is a production-ready Python library that adds circuit breakers, LLM-aware retry logic, idempotency, loop detection, and timeout enforcement to any AI agent or LLM pipeline.
AI agents fail at 91%+ rates in production. agentguard stops that.
The Problem
AI agents built with LangChain, AutoGen, CrewAI, or custom LLM pipelines fail catastrophically in production due to:
- Infinite loops — agents repeat the same tool calls indefinitely
- Silent failures — LLM errors swallowed without retry or alerting
- Duplicate actions — the same expensive LLM call fires multiple times
- Rate limit crashes — no intelligent backoff for 429/503 errors
- Token limit blindness — agents don't know when to stop or summarize
- No circuit breaking — one bad model call cascades into total failure
Existing tools like tenacity, LangGraph, and CrewAI are not LLM-aware and don't address agent-specific failure modes.
Features
- Circuit Breaker — Automatically opens after N failures, protecting downstream LLM APIs from cascading overload
- LLM-Aware Retry — Classifies errors (rate limit, token limit, provider outage, hallucinated tool call) and applies appropriate backoff
- Idempotency — Caches results by key to prevent duplicate expensive LLM executions
- Loop Detection — Detects and halts infinite agent action loops before they run up your API bill
- Timeout Enforcement — Hard timeouts on any agent step, with clean error propagation
- Zero Dependencies — Pure Python standard library only; works with any LLM framework
- Full Observability — Built-in stats, logging at every layer, structured error types
Installation
pip install agentguard
Quick Start
GuardedAgent — Full Protection in One Wrapper
from agentguard import GuardedAgent
agent = GuardedAgent(
name="my_llm_agent",
max_retries=3,
circuit_threshold=5,
timeout=30.0,
loop_detection=True,
max_repeated_actions=3,
)
def call_llm(prompt: str) -> str:
# Your actual LLM call here (OpenAI, Anthropic, etc.)
return f"Response to: {prompt}"
result = agent.run(call_llm, "What is the capital of France?", action_label="llm_call")
print(result)
print(agent.get_stats())
# {'name': 'my_llm_agent', 'total_calls': 1, 'total_failures': 0, 'circuit': {...}}
Circuit Breaker — Standalone
from agentguard import CircuitBreaker
cb = CircuitBreaker(failure_threshold=3, recovery_timeout=60.0, name="openai")
try:
response = cb.call(call_llm, "Hello")
except Exception as e:
print(f"Protected from cascading failure: {e}")
print(cb.get_stats())
# {'name': 'openai', 'state': 'closed', 'failure_count': 0, ...}
LLM-Aware Retry — Decorator Style
from agentguard import llm_retry
@llm_retry(max_attempts=3)
def my_agent_step(query: str) -> str:
# Automatically retries on rate limits (429), provider outages (503)
# Stops immediately on token limit errors (non-retryable)
return call_llm(query)
result = my_agent_step("Summarize this document")
LLM-Aware Retry — Programmatic
from agentguard import LLMRetry, FailureClassifier
retry = LLMRetry(max_attempts=5, on_retry=lambda attempt, ftype, err: print(f"Retry {attempt}: {ftype}"))
result = retry.execute(call_llm, "Hello")
Idempotency — Prevent Duplicate Executions
from agentguard import IdempotentAgent
agent = IdempotentAgent(ttl=3600.0)
# First call executes the function
result1 = agent.run(call_llm, "Summarize report", idempotency_key="report-summary-v1")
# Second call with same key returns cached result instantly
result2 = agent.run(call_llm, "Summarize report", idempotency_key="report-summary-v1")
assert result1 == result2 # True — function only ran once
Failure Classifier — Understand What Went Wrong
from agentguard import FailureClassifier, FailureType
fc = FailureClassifier()
err = Exception("429 Too Many Requests: rate limit exceeded")
ftype = fc.classify(err)
# FailureType.RATE_LIMIT
print(fc.is_retryable(ftype)) # True
print(fc.get_retry_delay(ftype, attempt=1)) # 10.0 seconds
Architecture
agentguard/
├── agent_wrapper.py # GuardedAgent — orchestrates all protections
├── circuit_breaker.py # CircuitBreaker — CLOSED/OPEN/HALF_OPEN state machine
├── retry.py # LLMRetry + llm_retry decorator — intelligent backoff
├── idempotency.py # IdempotentAgent + IdempotencyStore — result caching
├── failure_classifier.py # FailureClassifier — LLM error pattern recognition
└── exceptions.py # Typed exception hierarchy
Data Flow
Agent call
│
├─► Loop Detector (infinite loop check)
│
├─► Circuit Breaker (OPEN? → reject immediately)
│
├─► Idempotency Store (seen this key? → return cached)
│
├─► LLM Retry (classify failure → smart backoff → retry)
│
└─► Timeout Thread (hard deadline enforcement)
Failure Classification Logic
| Error Pattern | Classified As | Retryable | Backoff |
|---|---|---|---|
429, rate limit, quota exceeded |
RATE_LIMIT | Yes | Exponential (5s base, max 60s) |
503, service unavailable, overloaded |
PROVIDER_OUTAGE | Yes | Exponential (10s base, max 120s) |
context length, token limit |
TOKEN_LIMIT | No | — |
tool not found, invalid tool |
HALLUCINATED_TOOL_CALL | No | — |
| anything else | UNKNOWN | Yes | Exponential (1s base, max 30s) |
API Reference
GuardedAgent
GuardedAgent(
name: str = "agent",
max_retries: int = 3,
circuit_threshold: int = 5,
circuit_recovery: float = 60.0,
timeout: Optional[float] = None,
loop_detection: bool = True,
max_repeated_actions: int = 3,
idempotency_ttl: float = 3600.0,
enable_idempotency: bool = True,
)
Methods:
run(func, *args, action_label=None, idempotency_key=None, **kwargs)— Execute with all protectionsget_stats()— Returns dict with call counts, failure counts, circuit statereset_loop_detector()— Clear the loop detection history
CircuitBreaker
CircuitBreaker(
failure_threshold: int = 5,
recovery_timeout: float = 60.0,
half_open_max_calls: int = 1,
name: str = "default"
)
Methods:
call(func, *args, **kwargs)— Protected function callget_stats()— Returns state, failure count, last failure time
LLMRetry
LLMRetry(
max_attempts: int = 3,
classifier: Optional[FailureClassifier] = None,
on_retry: Optional[Callable] = None,
)
Methods:
execute(func, *args, **kwargs)— Execute with retry logic
llm_retry decorator
@llm_retry(max_attempts=3, classifier=None)
def my_func(): ...
IdempotentAgent
IdempotentAgent(store=None, ttl=3600.0)
Methods:
run(func, *args, idempotency_key=None, **kwargs)— Execute with deduplication
FailureClassifier
Methods:
classify(error: Exception) -> FailureTypeis_retryable(failure_type: FailureType) -> boolget_retry_delay(failure_type: FailureType, attempt: int) -> float
Exceptions
| Exception | When Raised |
|---|---|
AgentGuardError |
Base class for all agentguard errors |
CircuitOpenError |
Circuit breaker is OPEN, call rejected |
MaxRetriesExceededError |
All retry attempts exhausted |
IdempotencyError |
Idempotency store conflict |
AgentTimeoutError |
Agent exceeded timeout limit |
Real-World Integration Examples
With OpenAI
import openai
from agentguard import GuardedAgent
agent = GuardedAgent(name="openai-agent", max_retries=3, timeout=30.0)
def gpt_call(prompt: str) -> str:
response = openai.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": prompt}]
)
return response.choices[0].message.content
result = agent.run(gpt_call, "Explain quantum computing", action_label="gpt_call")
With LangChain
from agentguard import llm_retry
@llm_retry(max_attempts=3)
def run_chain(input_text: str):
return my_langchain_chain.invoke({"input": input_text})
With CrewAI / AutoGen
from agentguard import GuardedAgent
guard = GuardedAgent(name="crew-agent", loop_detection=True, max_repeated_actions=5)
# Wrap any crew task execution
result = guard.run(crew.kickoff, action_label="crew_task")
Contributing
Contributions are welcome. Please open an issue first to discuss what you would like to change.
- Fork the repository
- Create your feature branch (
git checkout -b feature/your-feature) - Run tests:
pytest tests/ -v - Submit a pull request
License
MIT License. See LICENSE for details.
Links
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file agentguard_llm-0.2.0.tar.gz.
File metadata
- Download URL: agentguard_llm-0.2.0.tar.gz
- Upload date:
- Size: 20.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1ce33503e99c9dc00141d8777f6367d09cb97f1976729d193f1582bcec5cf901
|
|
| MD5 |
7a15d8a8cf8bb5c004665a2759b868f4
|
|
| BLAKE2b-256 |
50958e8f9df31607a9ac6f09022ec2487dfb0de1cb8d85810707365873cd63d0
|
File details
Details for the file agentguard_llm-0.2.0-py3-none-any.whl.
File metadata
- Download URL: agentguard_llm-0.2.0-py3-none-any.whl
- Upload date:
- Size: 18.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f638568349048355d88e08e9a4f06cd8ff3378a0b3e07da4fa6ef4aee0ea43f6
|
|
| MD5 |
458d7df41e67a584f06b3fe5e0a54785
|
|
| BLAKE2b-256 |
4565c1d72051d67da0af81a53adcd4b2b29ba3502564fd71914ad8228fcbea2d
|