Chaos Engineering & Failure Diagnosis for AI Agents

These details have not been verified by PyPI

Project links

Project description

AgentChaos

Chaos testing and failure diagnosis for AI agents.

AgentChaos is currently 0.2.0: a Python toolkit for repeatedly running agent tests, injecting realistic failures, collecting trace-like spans, detecting common failure modes, ingesting framework-shaped traces, and exporting reliability reports.

The v0.1 track is intentionally small: pytest integration, two injectors, two detectors, three metrics, JSON reports, and a CLI summary command. v0.2 is focused on framework-neutral adapter boundaries, runtime ingestion prototypes, trace-based semantic detectors, and release hygiene.

Status

Implemented today:

ChaosTracer for agent, tool, and chat spans
ChaosRunner for repeated callable execution
execute_chaos_test() as the framework-neutral execution service
Injectors: ToolTimeout, ArgSchemaMutation
Detectors: LoopDetector, ArgSchemaViolationDetector, ToolInvocationMismatchDetector
Metrics: step_success_rate_at_k, run_variance, recovery_rate
JSON report exporter
Pytest plugin: @chaos, --chaos, --chaos-report, chaos_tracer
CLI: agentchaos summarize <report.json>
No-API-key local demo
v0.2 development: minimal LangGraph-like adapter prototype
v0.2 development: LangGraph runtime stream/astream_events ingestion adapter with parent reconstruction
v0.2 development: OpenAI Agents-like event model skeleton, without OpenAI SDK imports
v0.2 development: CrewAI-like event model skeleton, without CrewAI SDK imports
v0.2 development: MCP-like event model skeleton, without MCP SDK imports

Not implemented yet:

HTML report
CrewAI, OpenAI Agents, or MCP production runtime adapters
Production-ready framework adapters beyond the current LangGraph runtime ingestion and skeleton prototypes
Benchmark integrations such as tau-bench
Production sampling or hosted dashboard

v0.2 Roadmap

Keep adapter prototypes framework-neutral by mapping runtime-like events into TraceSpan without importing framework SDKs.
Expand trace-based semantic detectors around tool-use reliability while keeping detectors dependent only on internal spans.
Harden release hygiene with public preflight checks, package verification commands, and stable JSON/CLI behavior.
Defer production runtime adapters until the adapter boundary and skeleton tests are stable.

Quickstart

Clone the repo and install it locally:

git clone https://github.com/jeffery0929/agentchaos.git
cd agentchaos
python -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"

Run the no-API-key demo:

pytest examples/basic --chaos --chaos-report chaos_reports/basic.json -q

Summarize the report:

agentchaos summarize chaos_reports/basic.json

Expected result:

AgentChaos report: pytest_suite
tests: 1
total runs: 3
successful runs: 3
failed runs: 0
matched detections: 3

First Test

from agent_chaos import chaos
from agent_chaos.injectors import ToolTimeout


@chaos(injectors=[ToolTimeout(p=0.2)], runs=10)
def test_agent_handles_tool_timeouts(chaos_tracer):
    with chaos_tracer.invoke_agent("flight-agent"):
        result = my_agent.run("Book a flight from SFO to NRT")

    assert result.status == "success"

Run it with:

pytest --chaos --chaos-report agentchaos-report.json
agentchaos summarize agentchaos-report.json

@chaos is lazy-loaded from the package root, so ordinary import agent_chaos does not pull in pytest. Pytest is only needed when using the pytest plugin.

Report Contents

The JSON report includes:

total, successful, and failed run counts
pass rate
step_success_rate_at_k
run_variance
recovery_rate
per-run detector results
optional span payloads with --chaos-include-spans

Why This Exists

Production agent failures are often not clean assertion failures. They show up as loops, bad tool arguments, fabricated observations, retry storms, premature stops, and task drift. AgentChaos focuses on a narrow v0.1 gap:

fault injection + trace-backed failure classification + CI-friendly reports

See:

Current v0.1 Scope

In scope:

pytest-first local workflow
deterministic local demo
JSON report as the stable output
CLI summary for report inspection
small, testable core modules

Out of scope for v0.1:

hosted UI
SaaS dashboard
HTML report unless the core stabilizes first
framework-specific adapters
public leaderboard

Development Checks

pytest -q
ruff check agent_chaos tests examples
ruff format --check agent_chaos tests examples
mypy agent_chaos tests examples
pytest examples/basic --chaos --chaos-report chaos_reports/basic.json -q
agentchaos summarize chaos_reports/basic.json

Optional Paid OpenAI Dogfood

After configuring a small API budget and adding OPENAI_API_KEY to ignored local env files, run the manual paid smoke test:

python examples/openai_paid_dogfood/run_demo.py
agentchaos summarize chaos_reports/openai-paid-dogfood.json

The default model is gpt-5.4-nano to keep the first paid run cheap.

License

Apache 2.0. See LICENSE.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.2.0

Jun 3, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agentchaos_core-0.2.0.tar.gz (145.0 kB view details)

Uploaded Jun 3, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

agentchaos_core-0.2.0-py3-none-any.whl (67.6 kB view details)

Uploaded Jun 3, 2026 Python 3

File details

Details for the file agentchaos_core-0.2.0.tar.gz.

File metadata

Download URL: agentchaos_core-0.2.0.tar.gz
Upload date: Jun 3, 2026
Size: 145.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for agentchaos_core-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`cfb8b718038f8c592b4be3b12f88c5018472d76603d3595233cb7b77460ac9d3`
MD5	`ce6920727ed98858c31821307e64c47e`
BLAKE2b-256	`a01d18431622f4c439c601cd7672200fed32e0f6737967b0b02611ef8384324b`

See more details on using hashes here.

File details

Details for the file agentchaos_core-0.2.0-py3-none-any.whl.

File metadata

Download URL: agentchaos_core-0.2.0-py3-none-any.whl
Upload date: Jun 3, 2026
Size: 67.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for agentchaos_core-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`0e0a0e8a5c19646db997df093f19e4d5963c7bf1da9ec5fcc886cebca0c5cc32`
MD5	`86917f2c4f9abe78cddf94359eb11229`
BLAKE2b-256	`2ee3cec1d59b0d512d594a04e2a1039b757b8a8308dfb9188b2c00004de7cb41`

See more details on using hashes here.

agentchaos-core 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

AgentChaos

Status

v0.2 Roadmap

Quickstart

First Test

Report Contents

Why This Exists

Current v0.1 Scope

Development Checks

Optional Paid OpenAI Dogfood

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes