Skip to main content

Chaos engineering for AI agents — inject realistic production failures (tool timeouts, malformed responses, cost spirals, prompt injection) and find out what breaks before your users do.

Project description

agentfuzz

Chaos engineering for AI agents.

Your agent works in the demo. In production it breaks because a tool times out, an API returns garbage JSON, a user injects a prompt, or it spirals into an infinite tool-call loop burning $200 in tokens. agentfuzz finds those failures before your users do.

PyPI Python License Status


Why this exists

Netflix built Chaos Monkey because cloud apps that passed unit tests still went down in production — the failures were in the seams between systems, not the systems themselves. AI agents have the same problem, with a worse blast radius:

  • A flaky tool returns malformed JSON → your agent hallucinates plausible-looking arguments and writes them to your database.
  • A user pastes a "translate this" prompt that's actually IGNORE PREVIOUS INSTRUCTIONS → your support agent emails the customer your system prompt.
  • A model upgrade changes how the agent retries a 429 → the agent enters an infinite loop and burns through your monthly token budget in 40 minutes.

These failures don't show up in unit tests because unit tests assume the seams work. agentfuzz deliberately breaks the seams.

What it does

Wrap your agent. Pick a fault profile. Run. Get a report.

from agentfuzz import Harness, faults

harness = Harness(my_agent)

harness.add(faults.ToolTimeout(rate=0.10))
harness.add(faults.MalformedToolResponse(rate=0.05))
harness.add(faults.PromptInjection.suite("owasp-llm01"))
harness.add(faults.CostSpiral(max_tokens=50_000))
harness.add(faults.LatencyJitter(p99_ms=8000))
harness.add(faults.PartialToolFailure())

report = harness.run(scenarios="tau-bench-airline", iterations=200)
report.html("./report.html")

You get:

  • Pass-rate per fault category — "your agent survives malformed JSON 78% of the time but only 12% of timeout cases."
  • Cost-blast radius — "fault X caused token usage to spike 14×."
  • Tool-call failure modes — hallucinated arguments, retry storms, infinite loops.
  • Prompt-injection survival — OWASP LLM01 suite results.
  • Replay traces — the exact transcript that broke your agent, so you can fix it.

Install

pip install agentfuzz                       # core
pip install "agentfuzz[langgraph]"          # + LangGraph adapter
pip install "agentfuzz[crewai]"             # + CrewAI adapter
pip install "agentfuzz[autogen]"            # + AutoGen adapter
pip install "agentfuzz[all]"                # everything

60-second example

from agentfuzz import Harness, faults
from my_app import build_agent

harness = Harness(build_agent())
harness.add(faults.MalformedToolResponse(rate=0.2))
harness.add(faults.ToolTimeout(rate=0.1))

result = harness.run(iterations=50)
print(result.summary())
# >>> agentfuzz: 32/50 passed (64%)
# >>>   MalformedToolResponse: 8 failures
# >>>     - 5× hallucinated arguments
# >>>     - 3× silent corruption
# >>>   ToolTimeout: 10 failures
# >>>     - 7× retry storm (avg 14 retries)
# >>>     - 3× infinite loop killed at max_tokens

Fault library

Fault What it simulates
ToolTimeout A downstream API hangs past the agent's patience
MalformedToolResponse Garbage JSON, truncated payloads, wrong schema
PartialToolFailure Tool returns 200 then errors mid-stream
LatencyJitter Realistic p50 / p99 latency distribution
CostSpiral Detects runaway token usage above a threshold
PromptInjection OWASP LLM01 catalog of injection payloads
PromptParaphrase Real users mangle messages — typos, filler, contractions
RateLimitBurst Cascading 429s from upstream APIs
SchemaDrift Tool API changed shape between dev and prod
AuthExpiry 401 / 403 — tests credential-refresh paths
NetworkPartition Connection refused / TLS error — distinct from timeout

More planned — see the roadmap.

Supported agent frameworks

  • LangChain create_agent (1.x)agentfuzz[langgraph]. The modern entry point. Wrap your tools with wrap_tools(), point LangGraphAdapter at the compiled graph.
  • LangGraph create_react_agent (0.x) — same adapter; both APIs return a CompiledStateGraph we handle uniformly. See examples/langgraph_react_agent.py.
  • CrewAIagentfuzz[crewai]. wrap_tools() returns proxy crewai.tools.BaseTool instances; CrewAIAdapter(crew) drives the harness through crew.kickoff(). See examples/crewai_agent.py.
  • Plain Python callables — any Callable[[State], State]. Simplest way to try the tool.
  • 🚧 AutoGen, PydanticAI, OpenAI Swarm, LlamaIndex — coming.

The adapter interface is small (is_available() + wrap()); PRs welcome.

Status

Alpha (v0.1). API will change. Built and tested on Python 3.10–3.13. The fault catalog is informed by production multi-agent deployments at enterprise scale — but every codebase fails in its own special way, so file issues when you find a fault we should ship.

Why I'm building this

I've spent the last decade architecting AI systems for enterprises — including multi-agent platforms running across 2,600+ production sites. The failures that hurt are almost never the ones the unit tests check for. They're the quiet, partial, half-degraded ones in the seams.

This is the tool I wish I'd had.

Pavan Subhash Tirumalasetti

License

Apache 2.0. Use it commercially. Cite it in papers. Build a paid product on top. Just don't claim you wrote it.

Citing

If you use agentfuzz in research or production reports:

@software{agentfuzz,
  author  = {Tirumalasetti, Pavan Subhash},
  title   = {agentfuzz: Chaos engineering for AI agents},
  year    = {2026},
  url     = {https://github.com/SubhashPavan/agentfuzz},
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agentfuzz-0.3.0.tar.gz (32.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

agentfuzz-0.3.0-py3-none-any.whl (46.0 kB view details)

Uploaded Python 3

File details

Details for the file agentfuzz-0.3.0.tar.gz.

File metadata

  • Download URL: agentfuzz-0.3.0.tar.gz
  • Upload date:
  • Size: 32.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for agentfuzz-0.3.0.tar.gz
Algorithm Hash digest
SHA256 a7b3ae49ad1a401596fc28cc9f5090a3efeeea2449fdb11670894afd986a7cdd
MD5 9e5030075e9b98122f2930adbecb01a8
BLAKE2b-256 5b905b10ff71d64d91fa0de07c2e5ca6cf64929fec3647f591d9d6fb54870168

See more details on using hashes here.

Provenance

The following attestation bundles were made for agentfuzz-0.3.0.tar.gz:

Publisher: publish.yml on SubhashPavan/agentfuzz

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file agentfuzz-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: agentfuzz-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 46.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for agentfuzz-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 9d4261f917c00ac65eaaf177430fcd7fd86af2846507f10adabda7f53b5d2f33
MD5 314b5db0fc95345f039ac1941a6ea991
BLAKE2b-256 6c3aa77ae7be12a5f27e17d91ac1d75b95754c8c887659b8761b302471b2231b

See more details on using hashes here.

Provenance

The following attestation bundles were made for agentfuzz-0.3.0-py3-none-any.whl:

Publisher: publish.yml on SubhashPavan/agentfuzz

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page