Evaluate agent system robustness through controlled, runtime, non-intrusive LLM API fault injection.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

goutan

These details have not been verified by PyPI

Project description

AgentChaos

Evaluate agent system robustness through controlled, runtime, non-intrusive LLM API fault injection.

Overview

LLM-based agent systems issue multiple API calls per task, and each call can fail (HTTP 5xx, truncation, empty response, encoding corruption, schema violation). Once a faulty response occurs, it propagates through downstream agents and causes task failure. AgentChaos injects controlled faults at the HTTP transport layer — without modifying any agent source code — to evaluate robustness before these failures happen in production.

Quick Start

pip install agentchaos-sdk

import agentchaos

# Inject fault (your agent code needs ZERO changes)
agentchaos.inject("llm_error_single")
result = await my_agent(query)         # agent runs normally, unaware
agentchaos.disable()                   # stop
agentchaos.save_trace("trace.json")   # save full LLM call trace

# examples
git clone https://github.com/floritange/AgentChaos.git
cd AgentChaos
uv sync
uv run python examples/list_faults.py       # list all 65 faults
uv run python examples/agent_openai.py      # OpenAI agent: normal vs faulted
uv run python examples/agent_langchain.py   # LangChain agent
uv run python examples/agent_adk.py         # Google ADK agent
uv run python examples/eval_batch.py        # batch evaluation

How It Works

An HTTP-layer injection mechanism patches the HTTP client at runtime to intercept and modify LLM API responses according to the fault configuration, requiring no changes to any agent system.

Properties:

Works with any framework using OpenAI-compatible APIs (OpenAI, LangChain, ADK, AutoGen, CrewAI, LiteLLM)
Zero code changes — just inject() / disable() around your existing code
Records full execution trace (raw input/output, token usage, timing) for every LLM call
65 pre-built fault configurations covering all real-world failure modes

API

Function	Description
`agentchaos.inject(fault)`	Start fault injection + trace (`None` = trace only)
`agentchaos.disable()`	Stop injection and trace
`agentchaos.save_trace(path)`	Save trace to JSON
`agentchaos.eval(agent_fn, query, faults)`	Batch robustness evaluation
`agentchaos.diagnose(text)`	Detect fault type from output
`agentchaos.list_faults()`	List all 65 experiments

import agentchaos

# Trace only (no fault)
agentchaos.inject(None)
result = await my_agent(query)
agentchaos.disable()
agentchaos.save_trace("trace_normal.json")

# Inject fault + trace
agentchaos.inject("llm_error_single")
result = await my_agent(query)
agentchaos.disable()
agentchaos.save_trace("trace_faulted.json")

# Batch evaluation
report = await agentchaos.eval(my_agent, query, faults="all")
print(report.summary())

Trace Format

{
  "call_index": 0,
  "raw_input": {"model": "gpt-5.5", "messages": [...], "tools": [...]},
  "raw_output": {
    "content": "The answer is 42.",
    "tool_calls": [],
    "finish_reason": "stop",
    "usage": {"prompt_tokens": 306, "completion_tokens": 54, "total_tokens": 360},
    "http_status": 200
  },
  "injected_output": {
    "content": "[API ERROR] HTTP 500: Internal Server Error.",
    "tool_calls": []
  },
  "timing": {"llm_latency_ms": 1523.4, "total_ms": 1524.1},
  "fault_applied": true
}

raw_output = LLM original response. injected_output = what the agent actually receives (only present when fault_applied: true).

Fault Taxonomy

We define a fault taxonomy by adapting the classical fault classification from distributed systems (Avizienis et al., 2004) to LLM API responses. The taxonomy covers crash, omission, and value faults on both content and tool call fields.

Category	Fault Type	Content	Tool Call	Real-world Scenario
Crash	Error	yes	yes	Server overload, HTTP 5xx, rate limiting
Crash	Timeout	yes	yes	Network congestion, backend delay, API latency
Omission	Empty	yes	yes	Safety filter, content policy rejection
Omission	Truncate	yes	yes	Token limit, TCP interruption, incomplete completion
Value	Corrupt	yes	yes	Encoding error, garbled characters
Value	Schema	yes	yes	Parsing error, schema mismatch

From Crash to Value, faults become progressively harder to detect. Crash faults produce obvious error signals and are typically retried. Value faults look like valid output and propagate silently — making them the most dangerous in practice.

65 = (6 fault types x 2 targets x 4 strategies) + 8 compound + 9 positional

Detailed documentation: docs/faults.md

Evaluation Results

Experimental Setup

Agent System	Architecture	Benchmarks
AutoGen	Iterative (coder + executor)	HumanEval, HumanEval+, MBPP, MBPP+, MMLU-Pro, MATH-500
MAD	Debate (proposer + critic)
MapCoder	Pipeline (planner + coder + debugger)
EvoMAC	Iterative (multi-agent collaboration)
Mini-SE	Iterative (SWE agent)	SWE-bench Pro

Backbone LLMs: Claude-Sonnet-4.5, GPT-5.2, DeepSeek-V3.2, Seed-1.8

Metric: Δpass@1 = pass@1 (w/o fault) − pass@1 (w/ fault). Higher = more vulnerable.

RQ1: Overall Robustness Degradation (Claude-Sonnet-4.5)

System	HumanEval	HumanEval+	MBPP	MBPP+	MMLU-Pro	MATH-500
AutoGen	19.44	21.13	17.31	11.61	7.05	8.38
MAD	24.20	24.84	24.49	15.08	20.64	20.70
MapCoder	48.61	49.30	41.07	40.85	38.25	34.27
EvoMAC	18.48	18.18	16.67	14.73	13.63	15.85
Mini-SE	—	—	—	—	—	—

Mini-SE is evaluated only on SWE-bench Pro (Δpass@1 = 0.87%).

RQ2: Impact of Fault Configurations

Content faults cause higher Δpass@1 than tool call faults; only corrupt stays below 7%
Persistent injection causes the highest Δpass@1 — up to 62.39% (MapCoder)
Pipeline systems are most position-sensitive — single early fault drops pass@1 by up to 83.87%
Compound content faults amplify degradation — up to 86.36% (MapCoder)

RQ3: Fault Diagnosis

Existing methods achieve below 53% accuracy on fault type and below 56% on fault step. Truncation — the most harmful fault — is identified with only 4.3% accuracy.

Key Findings

#	Finding
1	All systems degrade under fault injection (Δpass@1 up to 50 pp)
2	Most severe faults are NOT most harmful — truncation/empty propagate silently
3	Most harmful faults are hardest to diagnose (truncation: 4.3% accuracy)
4	Architecture determines robustness — ranking consistent across all LLMs
5	Persistent injection overrides architectural advantages (up to 62.39%)
6	Compound content faults amplify degradation (up to 86.36%)

Documentation

Fault Reference — Complete reference for all 65 fault configurations
Examples — Runnable demos for OpenAI, LangChain, ADK

Citation

If you use AgentChaos in your research, please cite:

@article{agentchaos2026,
  title={AgentChaos: Chaos Engineering for Robust Agent Evaluation via LLM API Fault Injection},
  year={2026}
}

License

MIT -- see LICENSE.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

goutan

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.0

May 7, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agentchaos_sdk-0.1.0.tar.gz (930.3 kB view details)

Uploaded May 7, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

agentchaos_sdk-0.1.0-py3-none-any.whl (19.6 kB view details)

Uploaded May 7, 2026 Python 3

File details

Details for the file agentchaos_sdk-0.1.0.tar.gz.

File metadata

Download URL: agentchaos_sdk-0.1.0.tar.gz
Upload date: May 7, 2026
Size: 930.3 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for agentchaos_sdk-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`28c10f7380ba41d21ee47762eaee779718e6b19d308825dcf01157cee0f81e90`
MD5	`2d5b7f097e28d57ddb0b8993ea7f57a4`
BLAKE2b-256	`725a733e2551b5d7b414697f1411b7f54152257f334965ac0aca785125f12c53`

See more details on using hashes here.

Provenance

The following attestation bundles were made for agentchaos_sdk-0.1.0.tar.gz:

Publisher: publish.yml on floritange/AgentChaos

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: agentchaos_sdk-0.1.0.tar.gz
- Subject digest: 28c10f7380ba41d21ee47762eaee779718e6b19d308825dcf01157cee0f81e90
- Sigstore transparency entry: 1463327329
- Sigstore integration time: May 7, 2026
Source repository:
- Permalink: floritange/AgentChaos@8da82119a73ed9e46fad50971d37b2329029916b
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/floritange
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@8da82119a73ed9e46fad50971d37b2329029916b
- Trigger Event: release

File details

Details for the file agentchaos_sdk-0.1.0-py3-none-any.whl.

File metadata

Download URL: agentchaos_sdk-0.1.0-py3-none-any.whl
Upload date: May 7, 2026
Size: 19.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for agentchaos_sdk-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`6db7cb5b5bf93da8c8fdaef35b7db6477ad81a4f634de2f0c83847997d4a06c5`
MD5	`3acdbe5de2dbe87a6c79d3cd80986268`
BLAKE2b-256	`b038100b09c405c425053691bd664d4a715d46fef6722ad0233393eed8a4fd35`

See more details on using hashes here.

Provenance

The following attestation bundles were made for agentchaos_sdk-0.1.0-py3-none-any.whl:

Publisher: publish.yml on floritange/AgentChaos

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: agentchaos_sdk-0.1.0-py3-none-any.whl
- Subject digest: 6db7cb5b5bf93da8c8fdaef35b7db6477ad81a4f634de2f0c83847997d4a06c5
- Sigstore transparency entry: 1463327403
- Sigstore integration time: May 7, 2026
Source repository:
- Permalink: floritange/AgentChaos@8da82119a73ed9e46fad50971d37b2329029916b
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/floritange
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@8da82119a73ed9e46fad50971d37b2329029916b
- Trigger Event: release

agentchaos-sdk 0.1.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

AgentChaos

Overview

Quick Start

How It Works

API

Trace Format

Fault Taxonomy

Evaluation Results

Experimental Setup

RQ1: Overall Robustness Degradation (Claude-Sonnet-4.5)

RQ2: Impact of Fault Configurations

RQ3: Fault Diagnosis

Key Findings

Documentation

Citation

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance