Chaos Engineering for AI Agents

Project description

agent-chaos

Chaos engineering for AI agents.

The Joker: "Introduce a little anarchy. Upset the established order, and everything becomes chaos. I'm an agent of chaos. Oh, and you know the thing about chaos? It's fair!"

Your agent works in demos. It passes evals. Then it hits production: the LLM rate-limits, the tool API returns garbage, the stream cuts mid-response. The agent fails silently, confidently returns wrong answers, or loops forever.

agent-chaos breaks your agent on purpose—before production does.

Why This Exists

AI agents have failure boundaries that didn't exist before:

Boundary	What Can Break
LLM provider	Rate limits, timeouts, server errors, stream interruptions
Tool execution	API failures, malformed responses, lies
Context/memory	Corrupted retrieval, poisoned history, token overflow

Traditional chaos engineering tools (Chaos Monkey, Gremlin, Litmus) operate at the infrastructure layer—network partitions, pod failures, CPU stress. They don't understand agent-specific failure modes. They can't corrupt a tool result or cut an LLM stream after 10 chunks.

Evaluation tools (Galileo, DeepEval, LangSmith) tell you if your agent worked correctly. They judge past runs. They can't answer: "What happens when the weather API lies?"

agent-chaos injects failures. Eval tools judge outcomes. Use both.

                     ┌─────────────────┐
                     │  agent-chaos    │
                     │  (inject chaos) │
                     └────────┬────────┘
                              │
                              ▼
┌──────────────┐       ┌─────────────┐       ┌──────────────┐
│   CI / Test  │──────▶│  Your Agent │──────▶│  Eval Tools  │
│   Pipeline   │       │             │       │  (judge it)  │
└──────────────┘       └─────────────┘       └──────────────┘

What It Does

Inject chaos at every agent boundary:

from agent_chaos import (
    chaos_context,
    llm_rate_limit,
    llm_stream_cut,
    tool_error,
    tool_mutate,
)

def corrupt_weather(tool_name: str, result: str) -> str:
    # Return plausible lies
    return result.replace("22°C", "-50°C")

with chaos_context(
    name="resilience-test",
    chaos=[
        llm_rate_limit().after_calls(2),
        llm_stream_cut(after_chunks=10).with_probability(0.3),
        tool_error("Service unavailable").for_tool("get_weather").on_call(1),
        tool_mutate(corrupt_weather),
    ],
) as ctx:
    response = my_agent.run("What's the weather in Tokyo?")

    # Did the agent handle the chaos?
    assert ctx.metrics.chaos_injected > 0

Then gate your CI:

agent-chaos run scenarios/ --artifacts-dir ./runs
# Exit code 0 = all scenarios passed
# Exit code 1 = failures detected

The Two Questions

Question	Tool
"Did my agent give the right answer?"	Eval tools (Galileo, DeepEval, LangSmith)
"Did my agent survive when dependencies failed?"	agent-chaos

Evals test correctness. Chaos tests resilience. Production needs both.

What About Edge-Case Inputs?

"What if the user asks something unexpected?"

This is a fair question, but it's not chaos engineering—it's evaluation. A weird user query isn't a fault. It's just input. The user isn't failing; they're being a user.

For testing agent behavior on edge-case inputs: use eval tools with golden datasets. That's what they're built for.

agent-chaos is for failures at external dependencies—the LLM provider, tool APIs, memory systems. Things that break independently of what the user asked.

The exception: multi-agent systems. When Agent A's output becomes Agent B's input, corrupted handoffs are faults at a boundary. That's chaos territory.

Who This Is For

Teams shipping agents to production. Not demos. Not prototypes.

If you've been burned by:

Silent failures from flaky tool APIs
Agents that loop forever on rate limits
Confident wrong answers from corrupted context
"Works on my machine" syndrome in agent behavior

This is for you.

Status

Under active development. Anthropic provider supported. OpenAI and Gemini planned.

Project details

Release history Release notifications | RSS feed

0.1.3

Jan 2, 2026

0.1.2

Dec 31, 2025

This version

0.1.1

Dec 30, 2025

0.1.0

Dec 30, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agent_chaos-0.1.1.tar.gz (344.6 kB view details)

Uploaded Dec 30, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

agent_chaos-0.1.1-py3-none-any.whl (99.5 kB view details)

Uploaded Dec 30, 2025 Python 3

File details

Details for the file agent_chaos-0.1.1.tar.gz.

File metadata

Download URL: agent_chaos-0.1.1.tar.gz
Upload date: Dec 30, 2025
Size: 344.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.9.20 {"installer":{"name":"uv","version":"0.9.20","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for agent_chaos-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`c068b87500d9064e4aefe34e0d821af5038802f9fadbd07599e593d90aa783ba`
MD5	`67165cd60756fd431106f4d6fc35fcac`
BLAKE2b-256	`4473d48f21a3066e50674c16f86388cdcc6f6e0c07068a70e1bc177b5796b04b`

See more details on using hashes here.

File details

Details for the file agent_chaos-0.1.1-py3-none-any.whl.

File metadata

Download URL: agent_chaos-0.1.1-py3-none-any.whl
Upload date: Dec 30, 2025
Size: 99.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.9.20 {"installer":{"name":"uv","version":"0.9.20","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for agent_chaos-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a683268caec165e369556c338d29e5baf6b48916ff33e03b80055dfe0614f87e`
MD5	`50526ceebbf96f185b0c4afd4aaf1792`
BLAKE2b-256	`c1a24c0f0ed1ff1f72f70803855574964ee581a89edac2a883299f711bdb7e54`

See more details on using hashes here.

agent-chaos 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

agent-chaos

Why This Exists

What It Does

The Two Questions

What About Edge-Case Inputs?

Who This Is For

Status

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes