Skip to main content

Chaos engineering for AI agents — inject realistic production failures (tool timeouts, malformed responses, cost spirals, prompt injection) and find out what breaks before your users do.

Project description

agentfuzz

Chaos engineering for AI agents.

Your agent works in the demo. In production it breaks because a tool times out, an API returns garbage JSON, a user injects a prompt, or it spirals into an infinite tool-call loop burning $200 in tokens. agentfuzz finds those failures before your users do.

PyPI Python License Status


Why this exists

Netflix built Chaos Monkey because cloud apps that passed unit tests still went down in production — the failures were in the seams between systems, not the systems themselves. AI agents have the same problem, with a worse blast radius:

  • A flaky tool returns malformed JSON → your agent hallucinates plausible-looking arguments and writes them to your database.
  • A user pastes a "translate this" prompt that's actually IGNORE PREVIOUS INSTRUCTIONS → your support agent emails the customer your system prompt.
  • A model upgrade changes how the agent retries a 429 → the agent enters an infinite loop and burns through your monthly token budget in 40 minutes.

These failures don't show up in unit tests because unit tests assume the seams work. agentfuzz deliberately breaks the seams.

What it does

Wrap your agent. Pick a fault profile. Run. Get a report.

from agentfuzz import Harness, faults

harness = Harness(my_agent)

harness.add(faults.ToolTimeout(rate=0.10))
harness.add(faults.MalformedToolResponse(rate=0.05))
harness.add(faults.PromptInjection.suite("owasp-llm01"))
harness.add(faults.CostSpiral(max_tokens=50_000))
harness.add(faults.LatencyJitter(p99_ms=8000))
harness.add(faults.PartialToolFailure())

report = harness.run(scenarios="tau-bench-airline", iterations=200)
report.html("./report.html")

You get:

  • Pass-rate per fault category — "your agent survives malformed JSON 78% of the time but only 12% of timeout cases."
  • Cost-blast radius — "fault X caused token usage to spike 14×."
  • Tool-call failure modes — hallucinated arguments, retry storms, infinite loops.
  • Prompt-injection survival — OWASP LLM01 suite results.
  • Replay traces — the exact transcript that broke your agent, so you can fix it.

Install

pip install agentfuzz                       # core
pip install "agentfuzz[langgraph]"          # + LangGraph adapter
pip install "agentfuzz[crewai]"             # + CrewAI adapter
pip install "agentfuzz[autogen]"            # + AutoGen adapter
pip install "agentfuzz[all]"                # everything

60-second example

from agentfuzz import Harness, faults
from my_app import build_agent

harness = Harness(build_agent())
harness.add(faults.MalformedToolResponse(rate=0.2))
harness.add(faults.ToolTimeout(rate=0.1))

result = harness.run(iterations=50)
print(result.summary())
# >>> agentfuzz: 32/50 passed (64%)
# >>>   MalformedToolResponse: 8 failures
# >>>     - 5× hallucinated arguments
# >>>     - 3× silent corruption
# >>>   ToolTimeout: 10 failures
# >>>     - 7× retry storm (avg 14 retries)
# >>>     - 3× infinite loop killed at max_tokens

Fault library (v0.1)

Fault What it simulates
ToolTimeout A downstream API hangs past the agent's patience
MalformedToolResponse Garbage JSON, truncated payloads, wrong schema
PartialToolFailure Tool returns 200 then errors mid-stream
LatencyJitter Realistic p50 / p99 latency distribution
CostSpiral Detects runaway token usage above a threshold
PromptInjection OWASP LLM01 catalog of injection payloads
RateLimitBurst Cascading 429s from upstream APIs
SchemaDrift Tool API changed shape between dev and prod

More planned — see the roadmap.

Supported agent frameworks

  • LangGraph (agentfuzz[langgraph]) — wrap your tools with wrap_tools(), point a LangGraphAdapter at your compiled graph, done. See examples/langgraph_react_agent.py.
  • Plain Python callables — any Callable[[State], State]. Simplest way to try the tool.
  • 🚧 CrewAI, AutoGen, PydanticAI, OpenAI Swarm, LlamaIndex — coming.

The adapter interface is small (is_available() + wrap()); PRs welcome.

Status

Alpha (v0.1). API will change. Built and tested on Python 3.10–3.13. The fault catalog is informed by production multi-agent deployments at enterprise scale — but every codebase fails in its own special way, so file issues when you find a fault we should ship.

Why I'm building this

I've spent the last decade architecting AI systems for enterprises — including multi-agent platforms running across 2,600+ production sites. The failures that hurt are almost never the ones the unit tests check for. They're the quiet, partial, half-degraded ones in the seams.

This is the tool I wish I'd had.

Pavan Subhash Tirumalasetti

License

Apache 2.0. Use it commercially. Cite it in papers. Build a paid product on top. Just don't claim you wrote it.

Citing

If you use agentfuzz in research or production reports:

@software{agentfuzz,
  author  = {Tirumalasetti, Pavan Subhash},
  title   = {agentfuzz: Chaos engineering for AI agents},
  year    = {2026},
  url     = {https://github.com/SubhashPavan/agentfuzz},
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agentfuzz-0.1.0.tar.gz (28.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

agentfuzz-0.1.0-py3-none-any.whl (38.2 kB view details)

Uploaded Python 3

File details

Details for the file agentfuzz-0.1.0.tar.gz.

File metadata

  • Download URL: agentfuzz-0.1.0.tar.gz
  • Upload date:
  • Size: 28.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for agentfuzz-0.1.0.tar.gz
Algorithm Hash digest
SHA256 00a779d5f5e1a149120fe809ca21268302a455fceb520c29019c44a3ccc2ae95
MD5 4fbe22e674ab8cebe748386431dc24a7
BLAKE2b-256 dccfb7e20e2914774bf6d3c69083bd99f15b083a27198d1d4bd6ab0fe974f891

See more details on using hashes here.

Provenance

The following attestation bundles were made for agentfuzz-0.1.0.tar.gz:

Publisher: publish.yml on SubhashPavan/agentfuzz

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file agentfuzz-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: agentfuzz-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 38.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for agentfuzz-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 8463a0eb4f5642050ab2e22b4a5bcf6c9d247939dfa06d34628d5924e9a7e74a
MD5 a3596ea30e6ec857af775633fb2fd411
BLAKE2b-256 9e776c63883dfb8fe85268ddb73a0a66a0d053bc3fb96ffa43d98b57c2f3506a

See more details on using hashes here.

Provenance

The following attestation bundles were made for agentfuzz-0.1.0-py3-none-any.whl:

Publisher: publish.yml on SubhashPavan/agentfuzz

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page