Chaos Engineering for AI Agents

Project description

agent-chaos

Chaos engineering for AI agents.

"Introduce a little anarchy. Upset the established order, and everything becomes chaos. I'm an agent of chaos. Oh, and you know the thing about chaos? It's fair!"

Your agent works in demos. It passes evals. Then it hits production: the LLM sends a 500, the tool returns garbage, the stream cuts mid-response. The agent fails silently, returns wrong answers, or loops forever.

agent-chaos breaks your agent on purpose, before production does. For teams building agents for production, not demos.

pip install agent-chaos

Why does this exist?

LLM APIs are unreliable. They claim certain rate limits, then behave differently. They accept a stream request, then start sending tokens 10 seconds later. They reject mid-stream. They hang for 20 seconds before returning a 500. We've seen providers return "Sorry about that" as an error message.

Production agent backends run multiple LLMs with retry and fallback because things break randomly. What worked last week might not work today. You often don't realize it until production.

But the chaos isn't just at the transport layer. There's a semantic layer that's harder to catch.

Tools fail in obvious ways (timeouts, errors), but also in subtle ways: empty responses, partial data, wrong data types, malformed JSON, stale information, or data for the wrong entity entirely. A tool might return a 200 OK with an error message buried in the response body. An LLM-backed tool might hallucinate. With MCP, your agent calls tools you don't control, with schemas that can change without notice.

Traditional chaos engineering tools (Chaos Monkey, Gremlin) operate at infrastructure: network partitions, pod failures. They can't corrupt a tool result or cut an LLM stream mid-response.

agent-chaos injects these failures so you can test how your agent handles them before users find out. It integrates with evaluation frameworks like DeepEval, so you can inject chaos and judge the quality of your agent's response.

Core concepts

Scenarios: baseline + variants

A baseline scenario defines a conversation with your agent. A variant adds chaos:

from agent_chaos import BaselineScenario, Turn
from agent_chaos.chaos import llm_rate_limit, tool_error

# Baseline: happy path
baseline = BaselineScenario(
    name="order-inquiry",
    agent=my_agent,
    turns=[
        Turn("What's the status of order #123?"),
        Turn("Can I get a refund?"),
    ],
)

# Variant: what happens when the LLM rate-limits?
baseline.variant(
    name="llm-rate-limit",
    chaos=[llm_rate_limit().after_calls(1)],
)

# Variant: what happens when the refund API fails?
baseline.variant(
    name="refund-api-down",
    chaos=[tool_error("Service unavailable").for_tool("check_refund")],
)

Chaos and assertions

agent-chaos provides chaos injectors for LLM failures (llm_rate_limit, llm_server_error, llm_timeout), tool failures (tool_error, tool_timeout), data corruption (tool_mutate), and more. These are composable and support targeting specific tools, turns, or call counts.

Built-in assertions include MaxTotalLLMCalls, AllTurnsComplete, TokenBurstDetection, among others. For semantic evaluation, agent-chaos optionally integrates with DeepEval, letting you use any DeepEval metric (like GEval) as an assertion.

Both chaos and assertions can be applied per-scenario or per-turn using the at() helper:

from agent_chaos import at
from agent_chaos.chaos import tool_error
from agent_chaos.scenario import CompletesWithin
from agent_chaos.integrations.deepeval import as_assertion
from deepeval.metrics import GEval

# Inject chaos only on turn 2
baseline.variant(
    name="check-refund-fails",
    turns=[
        at(
            2, 
            chaos=[tool_error("Service unavailable").for_tool("check_refund")],
            assertions=[
                CompletesWithin(60.0),
                as_assertion(GEval(name="task-completion", criteria="Did the agent complete the user's request?")),
            ],
        ),
    ],
)

Fuzzing

It's difficult to define every failure mode upfront. fuzz_chaos generates random chaos combinations based on a ChaosSpace configuration, so you can explore how your agent behaves under varied conditions.

from agent_chaos import fuzz_chaos, ChaosSpace, LLMFuzzConfig, ToolFuzzConfig

variants = fuzz_chaos(
    baseline, 
    n=10, 
    space=ChaosSpace(
        llm=LLMFuzzConfig(probability=0.3),
        tool=ToolFuzzConfig(probability=0.5, targets=["get_order", "process_refund"]),
    ),
)

Fuzzing is for exploration, not CI. See examples/ecommerce-support-agent/scenarios/fuzzing.py for more.

Examples

The examples/ecommerce-support-agent/ directory contains a complete example with an e-commerce support agent built with pydantic-ai, including:

scenarios/quickstart.py - baseline scenarios with chaos variants
scenarios/resilience.py - comprehensive resilience testing
scenarios/fuzzing.py - automated, random chaos generation

cd examples/ecommerce-support-agent
uv sync
uv run agent-chaos run scenarios/quickstart.py

# on another terminal
uv run agent-chaos ui .agent_chaos_runs

Scenario overview showing baselines, chaos variants, and assertion results:

All scenarios

LLM rate limit injected on turn 1. Agent failed to respond, caught by turn-coherence assertion:

Rate limit injected

Tool error injected. Agent gracefully handles the failure and offers alternatives:

Tool error injected

Status

Under active development.

Supported:

Anthropic models (via anthropic SDK)
Multi-turn conversations
LLM chaos (rate limits, server errors, timeouts, stream cut/hang, slow chunks)
Tool chaos (errors, timeouts, mutations)
User input chaos (prompt injections)
Optional DeepEval integration for LLM-as-judge assertions
Scenario fuzzing

Planned:

OpenAI, Gemini models
Integration with other evaluation tools
More chaos types

Project details

Release history Release notifications | RSS feed

This version

0.1.3

Jan 2, 2026

0.1.2

Dec 31, 2025

0.1.1

Dec 30, 2025

0.1.0

Dec 30, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agent_chaos-0.1.3.tar.gz (1.5 MB view details)

Uploaded Jan 2, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

agent_chaos-0.1.3-py3-none-any.whl (136.7 kB view details)

Uploaded Jan 2, 2026 Python 3

File details

Details for the file agent_chaos-0.1.3.tar.gz.

File metadata

Download URL: agent_chaos-0.1.3.tar.gz
Upload date: Jan 2, 2026
Size: 1.5 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.9.21 {"installer":{"name":"uv","version":"0.9.21","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for agent_chaos-0.1.3.tar.gz
Algorithm	Hash digest
SHA256	`c4a30754f8184123b71b1df63d9c335d6d3635427ffaad75a68f22fa342917ee`
MD5	`94b06a2376121b737b7467d468ac088e`
BLAKE2b-256	`41016ff8291815eedc365e7a71c4db9192b40b84ef3f155db485deb5e8e4e9b1`

See more details on using hashes here.

File details

Details for the file agent_chaos-0.1.3-py3-none-any.whl.

File metadata

Download URL: agent_chaos-0.1.3-py3-none-any.whl
Upload date: Jan 2, 2026
Size: 136.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.9.21 {"installer":{"name":"uv","version":"0.9.21","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for agent_chaos-0.1.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`3fe37e5e6173692ff52a179d3a2c007b631e623f2dc5a2b6199c15a2446c466a`
MD5	`55be321511b2ce43d3f60bfeed591a1c`
BLAKE2b-256	`746f62dd84c6719561f488e7d1a0892322c3e5852a2634c900a5599dfc2508f3`

See more details on using hashes here.

agent-chaos 0.1.3

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

agent-chaos

Why does this exist?

Core concepts

Scenarios: baseline + variants

Chaos and assertions

Fuzzing

Examples

Status

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes