Chaos Engineering for AI Agents
Project description
agent-chaos
Chaos engineering for AI agents.
"Introduce a little anarchy. Upset the established order, and everything becomes chaos. I'm an agent of chaos. Oh, and you know the thing about chaos? It's fair!"
Your agent works in demos. It passes evals. Then it hits production: the LLM sends a 500, the tool returns garbage, the stream cuts mid-response. The agent fails silently, returns wrong answers, or loops forever.
agent-chaos breaks your agent on purpose, before production does. For teams building agents for production, not demos.
pip install agent-chaos
Why does this exist?
LLM APIs are unreliable. They claim certain rate limits, then behave differently. They accept a stream request, then start sending tokens 10 seconds later. They reject mid-stream. They hang for 20 seconds before returning a 500. We've seen providers return "Sorry about that" as an error message.
Production agent backends run multiple LLMs with retry and fallback because things break randomly. What worked last week might not work today. You often don't realize it until production.
But the chaos isn't just at the transport layer. There's a semantic layer that's harder to catch.
Tools fail in obvious ways (timeouts, errors), but also in subtle ways: empty responses, partial data, wrong data types, malformed JSON, stale information, or data for the wrong entity entirely. A tool might return a 200 OK with an error message buried in the response body. An LLM-backed tool might hallucinate. With MCP, your agent calls tools you don't control, with schemas that can change without notice.
Traditional chaos engineering tools (Chaos Monkey, Gremlin) operate at infrastructure: network partitions, pod failures. They can't corrupt a tool result or cut an LLM stream mid-response.
agent-chaos injects these failures so you can test how your agent handles them before users find out. It integrates with evaluation frameworks like DeepEval, so you can inject chaos and judge the quality of your agent's response.
Core concepts
Scenarios: baseline + variants
A baseline scenario defines a conversation with your agent. A variant adds chaos:
from agent_chaos import BaselineScenario, Turn
from agent_chaos.chaos import llm_rate_limit, tool_error
# Baseline: happy path
baseline = BaselineScenario(
name="order-inquiry",
agent=my_agent,
turns=[
Turn("What's the status of order #123?"),
Turn("Can I get a refund?"),
],
)
# Variant: what happens when the LLM rate-limits?
baseline.variant(
name="llm-rate-limit",
chaos=[llm_rate_limit().after_calls(1)],
)
# Variant: what happens when the refund API fails?
baseline.variant(
name="refund-api-down",
chaos=[tool_error("Service unavailable").for_tool("check_refund")],
)
Chaos and assertions
agent-chaos provides chaos injectors for LLM failures (llm_rate_limit, llm_server_error, llm_timeout), tool failures (tool_error, tool_timeout), data corruption (tool_mutate), and more. These are composable and support targeting specific tools, turns, or call counts.
Built-in assertions include MaxTotalLLMCalls, AllTurnsComplete, TokenBurstDetection, among others. For semantic evaluation, agent-chaos optionally integrates with DeepEval, letting you use any DeepEval metric (like GEval) as an assertion.
Both chaos and assertions can be applied per-scenario or per-turn using the at() helper:
from agent_chaos import at
from agent_chaos.chaos import tool_error
from agent_chaos.scenario import CompletesWithin
from agent_chaos.integrations.deepeval import as_assertion
from deepeval.metrics import GEval
# Inject chaos only on turn 2
baseline.variant(
name="check-refund-fails",
turns=[
at(
2,
chaos=[tool_error("Service unavailable").for_tool("check_refund")],
assertions=[
CompletesWithin(60.0),
as_assertion(GEval(name="task-completion", criteria="Did the agent complete the user's request?")),
],
),
],
)
Fuzzing
It's difficult to define every failure mode upfront. fuzz_chaos generates random chaos combinations based on a ChaosSpace configuration, so you can explore how your agent behaves under varied conditions.
from agent_chaos import fuzz_chaos, ChaosSpace, LLMFuzzConfig, ToolFuzzConfig
variants = fuzz_chaos(
baseline,
n=10,
space=ChaosSpace(
llm=LLMFuzzConfig(probability=0.3),
tool=ToolFuzzConfig(probability=0.5, targets=["get_order", "process_refund"]),
),
)
Fuzzing is for exploration, not CI. See examples/ecommerce-support-agent/scenarios/fuzzing.py for more.
Examples
The examples/ecommerce-support-agent/ directory contains a complete example with an e-commerce support agent built with pydantic-ai, including:
scenarios/quickstart.py- baseline scenarios with chaos variantsscenarios/resilience.py- comprehensive resilience testingscenarios/fuzzing.py- automated, random chaos generation
cd examples/ecommerce-support-agent
uv sync
uv run agent-chaos run scenarios/quickstart.py
# on another terminal
uv run agent-chaos ui .agent_chaos_runs
Scenario overview showing baselines, chaos variants, and assertion results:
LLM rate limit injected on turn 1. Agent failed to respond, caught by turn-coherence assertion:
Tool error injected. Agent gracefully handles the failure and offers alternatives:
Status
Under active development.
Supported:
- Anthropic models (via
anthropicSDK) - Multi-turn conversations
- LLM chaos (rate limits, server errors, timeouts, stream cut/hang, slow chunks)
- Tool chaos (errors, timeouts, mutations)
- User input chaos (prompt injections)
- Optional DeepEval integration for LLM-as-judge assertions
- Scenario fuzzing
Planned:
- OpenAI, Gemini models
- Integration with other evaluation tools
- More chaos types
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file agent_chaos-0.1.3.tar.gz.
File metadata
- Download URL: agent_chaos-0.1.3.tar.gz
- Upload date:
- Size: 1.5 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.21 {"installer":{"name":"uv","version":"0.9.21","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c4a30754f8184123b71b1df63d9c335d6d3635427ffaad75a68f22fa342917ee
|
|
| MD5 |
94b06a2376121b737b7467d468ac088e
|
|
| BLAKE2b-256 |
41016ff8291815eedc365e7a71c4db9192b40b84ef3f155db485deb5e8e4e9b1
|
File details
Details for the file agent_chaos-0.1.3-py3-none-any.whl.
File metadata
- Download URL: agent_chaos-0.1.3-py3-none-any.whl
- Upload date:
- Size: 136.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.21 {"installer":{"name":"uv","version":"0.9.21","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3fe37e5e6173692ff52a179d3a2c007b631e623f2dc5a2b6199c15a2446c466a
|
|
| MD5 |
55be321511b2ce43d3f60bfeed591a1c
|
|
| BLAKE2b-256 |
746f62dd84c6719561f488e7d1a0892322c3e5852a2634c900a5599dfc2508f3
|