Chaos engineering for AI agents — inject realistic production failures (tool timeouts, malformed responses, cost spirals, prompt injection) and find out what breaks before your users do.
Project description
agentfuzz
Chaos engineering for AI agents.
Your agent works in the demo. In production it breaks because a tool times out,
an API returns garbage JSON, a user injects a prompt, or it spirals into an
infinite tool-call loop burning $200 in tokens. agentfuzz finds those failures
before your users do.
Why this exists
Netflix built Chaos Monkey because cloud apps that passed unit tests still went down in production — the failures were in the seams between systems, not the systems themselves. AI agents have the same problem, with a worse blast radius:
- A flaky tool returns malformed JSON → your agent hallucinates plausible-looking arguments and writes them to your database.
- A user pastes a "translate this" prompt that's actually
IGNORE PREVIOUS INSTRUCTIONS→ your support agent emails the customer your system prompt. - A model upgrade changes how the agent retries a 429 → the agent enters an infinite loop and burns through your monthly token budget in 40 minutes.
These failures don't show up in unit tests because unit tests assume the seams
work. agentfuzz deliberately breaks the seams.
What it does
Wrap your agent. Pick a fault profile. Run. Get a report.
from agentfuzz import Harness, faults
harness = Harness(my_agent)
harness.add(faults.ToolTimeout(rate=0.10))
harness.add(faults.MalformedToolResponse(rate=0.05))
harness.add(faults.PromptInjection.suite("owasp-llm01"))
harness.add(faults.CostSpiral(max_tokens=50_000))
harness.add(faults.LatencyJitter(p99_ms=8000))
harness.add(faults.PartialToolFailure())
report = harness.run(scenarios="tau-bench-airline", iterations=200)
report.html("./report.html")
You get:
- Pass-rate per fault category — "your agent survives malformed JSON 78% of the time but only 12% of timeout cases."
- Cost-blast radius — "fault X caused token usage to spike 14×."
- Tool-call failure modes — hallucinated arguments, retry storms, infinite loops.
- Prompt-injection survival — OWASP LLM01 suite results.
- Replay traces — the exact transcript that broke your agent, so you can fix it.
Install
pip install agentfuzz # core
pip install "agentfuzz[langgraph]" # + LangGraph adapter
pip install "agentfuzz[crewai]" # + CrewAI adapter
pip install "agentfuzz[autogen]" # + AutoGen adapter
pip install "agentfuzz[all]" # everything
60-second example
from agentfuzz import Harness, faults
from my_app import build_agent
harness = Harness(build_agent())
harness.add(faults.MalformedToolResponse(rate=0.2))
harness.add(faults.ToolTimeout(rate=0.1))
result = harness.run(iterations=50)
print(result.summary())
# >>> agentfuzz: 32/50 passed (64%)
# >>> MalformedToolResponse: 8 failures
# >>> - 5× hallucinated arguments
# >>> - 3× silent corruption
# >>> ToolTimeout: 10 failures
# >>> - 7× retry storm (avg 14 retries)
# >>> - 3× infinite loop killed at max_tokens
Fault library
| Fault | What it simulates |
|---|---|
ToolTimeout |
A downstream API hangs past the agent's patience |
MalformedToolResponse |
Garbage JSON, truncated payloads, wrong schema |
PartialToolFailure |
Tool returns 200 then errors mid-stream |
LatencyJitter |
Realistic p50 / p99 latency distribution |
CostSpiral |
Detects runaway token usage above a threshold |
PromptInjection |
OWASP LLM01 catalog of injection payloads |
PromptParaphrase |
Real users mangle messages — typos, filler, contractions |
RateLimitBurst |
Cascading 429s from upstream APIs |
SchemaDrift |
Tool API changed shape between dev and prod |
AuthExpiry |
401 / 403 — tests credential-refresh paths |
NetworkPartition |
Connection refused / TLS error — distinct from timeout |
More planned — see the roadmap.
Supported agent frameworks
- ✅ LangChain
create_agent(1.x) —agentfuzz[langgraph]. The modern entry point. Wrap your tools withwrap_tools(), pointLangGraphAdapterat the compiled graph. - ✅ LangGraph
create_react_agent(0.x) — same adapter; both APIs return aCompiledStateGraphwe handle uniformly. Seeexamples/langgraph_react_agent.py. - ✅ CrewAI —
agentfuzz[crewai].wrap_tools()returns proxycrewai.tools.BaseToolinstances;CrewAIAdapter(crew)drives the harness throughcrew.kickoff(). Seeexamples/crewai_agent.py. - ✅ AutoGen v0.4+ —
agentfuzz[autogen].wrap_tools()returns proxyautogen_core.tools.FunctionToolinstances;AutoGenAdapter(agent)drives any agent / team exposing asyncrun(task=...). Seeexamples/autogen_agent.py. - ✅ Plain Python callables — any
Callable[[State], State]. Simplest way to try the tool. - 🚧 PydanticAI, OpenAI Swarm, LlamaIndex — coming.
The adapter interface is small (is_available() + wrap()); PRs welcome.
Status
Alpha (v0.1). API will change. Built and tested on Python 3.10–3.13. The fault catalog is informed by production multi-agent deployments at enterprise scale — but every codebase fails in its own special way, so file issues when you find a fault we should ship.
Why I'm building this
I've spent the last decade architecting AI systems for enterprises — including multi-agent platforms running across 2,600+ production sites. The failures that hurt are almost never the ones the unit tests check for. They're the quiet, partial, half-degraded ones in the seams.
This is the tool I wish I'd had.
License
Apache 2.0. Use it commercially. Cite it in papers. Build a paid product on top. Just don't claim you wrote it.
Citing
If you use agentfuzz in research or production reports:
@software{agentfuzz,
author = {Tirumalasetti, Pavan Subhash},
title = {agentfuzz: Chaos engineering for AI agents},
year = {2026},
url = {https://github.com/SubhashPavan/agentfuzz},
}
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file agentfuzz-0.4.0.tar.gz.
File metadata
- Download URL: agentfuzz-0.4.0.tar.gz
- Upload date:
- Size: 34.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c4cb6fc8ff432f4a35a396d7b0554de52139bab3fea1c416d7fd420245ebfc64
|
|
| MD5 |
36c3160902130b0e3d77162ce3cc7316
|
|
| BLAKE2b-256 |
a0f3e6b09b1d9e13088a3126a590521c4419d66ba98123fcece56eceedb95435
|
Provenance
The following attestation bundles were made for agentfuzz-0.4.0.tar.gz:
Publisher:
publish.yml on SubhashPavan/agentfuzz
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
agentfuzz-0.4.0.tar.gz -
Subject digest:
c4cb6fc8ff432f4a35a396d7b0554de52139bab3fea1c416d7fd420245ebfc64 - Sigstore transparency entry: 1568408053
- Sigstore integration time:
-
Permalink:
SubhashPavan/agentfuzz@d4b65b12111358a3be5fad988aaf33d01387f0ab -
Branch / Tag:
refs/tags/v0.4.0 - Owner: https://github.com/SubhashPavan
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@d4b65b12111358a3be5fad988aaf33d01387f0ab -
Trigger Event:
release
-
Statement type:
File details
Details for the file agentfuzz-0.4.0-py3-none-any.whl.
File metadata
- Download URL: agentfuzz-0.4.0-py3-none-any.whl
- Upload date:
- Size: 49.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
919661b9dd193b8f792ec3b3f189438395e88cb6fae0bb9a67c3822567528f69
|
|
| MD5 |
d91adbbfc180b03bce932ff8bea82ee7
|
|
| BLAKE2b-256 |
1efcae9418011a32031852a4571ef39a5596e29ab12fc47407e1f0f845689439
|
Provenance
The following attestation bundles were made for agentfuzz-0.4.0-py3-none-any.whl:
Publisher:
publish.yml on SubhashPavan/agentfuzz
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
agentfuzz-0.4.0-py3-none-any.whl -
Subject digest:
919661b9dd193b8f792ec3b3f189438395e88cb6fae0bb9a67c3822567528f69 - Sigstore transparency entry: 1568408106
- Sigstore integration time:
-
Permalink:
SubhashPavan/agentfuzz@d4b65b12111358a3be5fad988aaf33d01387f0ab -
Branch / Tag:
refs/tags/v0.4.0 - Owner: https://github.com/SubhashPavan
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@d4b65b12111358a3be5fad988aaf33d01387f0ab -
Trigger Event:
release
-
Statement type: