Chaos engineering framework for AI agents - stress-test agent reliability

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

arielshad

These details have not been verified by PyPI

Project description

BalaganAgent

Chaos Engineering for AI Agents

Everyone demos agents. Nobody stress-tests them.

Quick Start • Features • Documentation • Examples • Contributing

BalaganAgent is a reliability testing framework that stress-tests AI agents through controlled fault injection—because your agent will fail in production, and you should know how it handles it.

Why BalaganAgent?
Features
Quick Start
Use Cases
Chaos Levels
Fault Injectors
Metrics
Reports
Project Structure
Community
Contributing
Roadmap
License

Why BalaganAgent?

AI agents are entering production environments, but there's zero reliability discipline. BalaganAgent brings the battle-tested principles of chaos engineering (think Chaos Monkey, Gremlin) to the world of AI agents.

The Problem

Agents fail silently in production
Tool calls time out, return garbage, or hallucinate
Context gets corrupted, budgets get exhausted
Nobody knows until users complain

The Solution

Inject failures in development, not production
Measure recovery time (MTTR)
Score reliability like we score SLAs
Find breaking points before users do

Industry Adoption

BalaganAgent is designed for teams that take AI reliability seriously:

Production AI systems requiring SLO compliance
Enterprise deployments with strict reliability requirements
Development teams practicing chaos engineering
Organizations building mission-critical AI agents

Features

Fault Injection

Tool Failures: Exceptions, timeouts, empty responses, malformed data, rate limits
Delays: Fixed, random, spike patterns, degrading latency
Hallucinations: Wrong values, fabricated data, contradictions, fake references
Context Corruption: Truncation, reordering, noise injection, encoding issues
Budget Exhaustion: Token limits, cost caps, rate limiting, call quotas

Metrics & Analysis

MTTR (Mean Time To Recovery): How fast does your agent recover?
Recovery Quality: Did it recover correctly or just fail gracefully?
Reliability Score: SRE-grade scoring (five nines to one nine)
Error Budget Tracking: Know when to freeze changes

Reports

Terminal output with colors
JSON for programmatic analysis
Markdown for documentation
HTML dashboards

Quick Start

Get up and running in minutes.

Installation

pip install balagan-agent

Verify installation:

balaganagent --version

CrewAI Integration

Using CrewAI? Check out our CrewAI Integration Guide for a complete step-by-step walkthrough!

Quick example:

from balaganagent.wrappers.crewai import CrewAIWrapper

# Your existing CrewAI setup
crew = Crew(agents=[agent], tasks=[task])

# Add chaos testing (3 lines!)
wrapper = CrewAIWrapper(crew, chaos_level=0.5)
wrapper.configure_chaos(enable_tool_failures=True, enable_delays=True)
result = wrapper.kickoff()

# Check metrics
metrics = wrapper.get_metrics()
print(f"Success rate: {metrics['aggregate']['operations']['success_rate']:.1%}")

→ Full CrewAI Integration Guide

Basic Usage

from balaganagent import ChaosEngine, AgentWrapper

# Your agent with tools
class MyAgent:
    def search(self, query: str) -> dict:
        return {"results": [...]}

    def calculate(self, expr: str) -> float:
        return eval(expr)

# Wrap it with chaos
agent = MyAgent()
wrapper = AgentWrapper(agent)
wrapper.configure_chaos(chaos_level=0.5)  # 50% chaos intensity

# Now calls might fail randomly!
result = wrapper.call_tool("search", "test query")

Run an Experiment

from balaganagent import ChaosEngine
from balaganagent.runner import scenario, ExperimentRunner

# Define a test scenario
test = (
    scenario("search-reliability")
    .description("Test search under chaos")
    .call("search", "AI safety")
    .call("search", "machine learning")
    .call("calculate", "2 + 2")
    .with_chaos(level=0.75)
    .build()
)

# Run it
runner = ExperimentRunner()
runner.set_agent(MyAgent())
result = runner.run_scenario(test)

print(f"Success Rate: {result.experiment_result.success_rate:.1%}")
print(f"MTTR: {result.mttr_stats['mttr_seconds']:.2f}s")

CLI Usage

# Run a demo
balaganagent demo --chaos-level 0.5

# Initialize a new project
balaganagent init my-chaos-tests

# Run a scenario file
balaganagent run scenarios/search_test.json --chaos-level 0.75

# Run stress tests
balaganagent stress scenarios/critical_path.json --iterations 100

Use Cases

BalaganAgent helps you answer critical questions about your agents:

Pre-Production Validation: Will my agent handle API timeouts gracefully?
Integration Testing: Does my agent recover when tools return malformed data?
Load Testing: How does performance degrade under high failure rates?
Reliability Engineering: What's my agent's actual MTTR and recovery rate?
SLO Compliance: Can I maintain 99.9% availability under chaos?
Regression Testing: Did my recent changes break failure handling?

Chaos Levels

The chaos_level parameter controls fault injection probability:

Level	Base Failure Rate	Use Case
0.0	0%	Baseline (no chaos)
0.25	2.5%	Light testing
0.5	5%	Moderate chaos
1.0	10%	Standard chaos
2.0	20%	Stress testing

Fault Injectors

Tool Failure Injector

Simulates various tool failure modes:

from balaganagent.injectors import ToolFailureInjector
from balaganagent.injectors.tool_failure import ToolFailureConfig, FailureMode

injector = ToolFailureInjector(ToolFailureConfig(
    probability=0.1,
    failure_modes=[
        FailureMode.TIMEOUT,
        FailureMode.RATE_LIMIT,
        FailureMode.SERVICE_UNAVAILABLE,
    ]
))

Delay Injector

Simulates network latency patterns:

from balaganagent.injectors import DelayInjector
from balaganagent.injectors.delay import DelayConfig, DelayPattern, LatencySimulator

# Use presets
injector = LatencySimulator.create("poor")  # High latency, high jitter

# Or configure manually
injector = DelayInjector(DelayConfig(
    pattern=DelayPattern.SPIKE,
    min_delay_ms=50,
    max_delay_ms=200,
    spike_probability=0.1,
    spike_multiplier=10,
))

Hallucination Injector

Corrupts data to test agent's ability to detect bad information:

from balaganagent.injectors import HallucinationInjector
from balaganagent.injectors.hallucination import HallucinationConfig, HallucinationType

injector = HallucinationInjector(HallucinationConfig(
    probability=0.05,
    severity=0.5,  # 0=subtle, 1=obvious
    hallucination_types=[
        HallucinationType.WRONG_VALUE,
        HallucinationType.FABRICATED_DATA,
        HallucinationType.NONEXISTENT_REFERENCE,
    ]
))

Budget Exhaustion Injector

Tests behavior when resources run out:

from balaganagent.injectors import BudgetExhaustionInjector
from balaganagent.injectors.budget import BudgetExhaustionConfig

injector = BudgetExhaustionInjector(BudgetExhaustionConfig(
    token_limit=10000,
    cost_limit_dollars=1.00,
    rate_limit_per_minute=60,
    fail_hard=True,  # Raise exception vs return error
))

Metrics

MTTR Calculator

from balaganagent.metrics import MTTRCalculator

calc = MTTRCalculator()

# Record failure and recovery
calc.record_failure("search", "timeout")
# ... agent recovers ...
calc.record_recovery("search", "timeout", retries=2)

stats = calc.get_recovery_stats()
print(f"MTTR: {stats['mttr_seconds']:.2f}s")
print(f"Recovery Rate: {stats['recovery_rate']:.1%}")

Reliability Scorer

from balaganagent.metrics import ReliabilityScorer

scorer = ReliabilityScorer(slos={
    "availability": 0.99,
    "latency_p99_ms": 2000,
})

# Record operations
for result in agent_results:
    scorer.record_operation(
        success=result.success,
        latency_ms=result.latency,
    )

report = scorer.calculate_score()
print(f"Grade: {report.grade.value}")
print(f"Availability: {report.availability:.3%}")
print(f"Error Budget Remaining: {report.error_budget_remaining:.1%}")

Scenarios

Scenarios can be defined in code or JSON:

{
  "name": "critical-path-test",
  "description": "Test the critical user journey",
  "operations": [
    {"tool": "authenticate", "args": ["user123"]},
    {"tool": "fetch_profile", "args": ["user123"]},
    {"tool": "search", "args": ["recent orders"]},
    {"tool": "process_request", "kwargs": {"action": "refund"}}
  ],
  "chaos_config": {
    "chaos_level": 0.5,
    "enable_tool_failures": true,
    "enable_delays": true,
    "enable_budget_exhaustion": true
  }
}

Stress Testing

Find your agent's breaking point:

runner = ExperimentRunner()
runner.set_agent(agent)

results = runner.run_stress_test(
    scenario,
    iterations=100,
    chaos_levels=[0.1, 0.25, 0.5, 0.75, 1.0, 1.5, 2.0],
)

for level, data in results["levels"].items():
    print(f"Chaos {level}: {data['pass_rate']:.1%} pass rate")

Reports

Generate reports in multiple formats:

from balaganagent.reporting import ReportGenerator

gen = ReportGenerator()
report = gen.generate_from_results(results, metrics)

# Terminal (with colors)
print(gen.to_terminal(report))

# Save as files
gen.save(report, "report.json", format="json")
gen.save(report, "report.md", format="markdown")
gen.save(report, "report.html", format="html")

Example Output

============================================================
  BALAGANAGENT EXPERIMENT REPORT
============================================================

  Generated: 2024-01-15T10:30:00
  Status: WARNING

SUMMARY
  Experiments: 5 (Completed: 4, Failed: 1)
  Operations:  150 (Success Rate: 87.3%)
  Faults:      23 injected

RELIABILITY
  Score: 0.82
  Grade: 99%
  MTTR:  1.3s

EXPERIMENTS
  search-reliability [completed]
    Duration: 12.34s | Success: 90.0% | Recovery: 85.0%

  calculate-stress [completed]
    Duration: 8.21s | Success: 95.0% | Recovery: 100.0%

RECOMMENDATIONS
  1. Recovery rate is 85.0%. Agents should implement better recovery mechanisms.
  2. Most frequent fault type: tool_failure. Focus testing on this failure mode.

============================================================

Project Structure

balaganagent/
├── __init__.py          # Main exports
├── engine.py            # Chaos engine core
├── experiment.py        # Experiment definitions
├── wrapper.py           # Agent wrapping
├── runner.py            # Experiment runner
├── reporting.py         # Report generation
├── cli.py               # Command-line interface
├── injectors/           # Fault injectors
│   ├── base.py          # Base injector class
│   ├── tool_failure.py  # Tool failure injection
│   ├── delay.py         # Latency injection
│   ├── hallucination.py # Data corruption
│   ├── context.py       # Context corruption
│   └── budget.py        # Budget exhaustion
└── metrics/             # Metrics collection
    ├── collector.py     # General metrics
    ├── mttr.py          # MTTR calculation
    ├── recovery.py      # Recovery quality
    └── reliability.py   # Reliability scoring

Documentation

Development Guide - Set up your development environment
Contributing Guide - How to contribute to BalaganAgent
Security Policy - Vulnerability reporting process
Changelog - Version history and release notes
CrewAI Integration - Step-by-step CrewAI setup

Examples

Check out real-world examples:

Meeting Notes Agent - Real agent under chaos
CrewAI Integration - CrewAI with chaos testing
Stress Testing - Finding breaking points
BDD Scenarios - Behavior-driven chaos scenarios

Community

GitHub Discussions: Ask questions and share ideas
Issue Tracker: Report bugs and request features
Linkedin: Follow for updates ariel-shadkhan

Contributing

We welcome contributions of all kinds! Whether it's:

Bug reports and feature requests
Code contributions (new injectors, wrappers, metrics)
Documentation improvements
Example agents and scenarios
Blog posts and tutorials

Please read our Contributing Guide and Development Guide to get started.

Contributors

Thanks to all our contributors!

Roadmap

Current (v0.1.x)

✅ Core chaos engine
✅ Basic injectors (tool failure, delay, hallucination, context, budget)
✅ CrewAI, AutoGen, LangChain wrappers
✅ MTTR and reliability metrics
✅ Multi-format reporting

Coming Soon (v0.2.x)

🔄 Real-time chaos injection during agent execution
🔄 Advanced metrics (latency percentiles, error budgets)
🔄 Chaos schedules and campaigns
🔄 Web dashboard for visualization
🔄 More agent framework integrations (LangGraph, AutoGPT)

Future (v0.3.x+)

📋 Distributed chaos experiments
📋 ML-powered failure prediction
📋 Custom injector plugins
📋 Production chaos (with safeguards)
📋 Cost impact analysis

Have an idea? Open a discussion!

Comparison

BalaganAgent vs Manual Testing

Aspect	Manual Testing	BalaganAgent
Coverage	Limited scenarios	Comprehensive failure modes
Consistency	Varies by tester	Reproducible experiments
Metrics	Manual tracking	Automated MTTR, recovery rate
Scale	Time-consuming	Run 100s of tests easily
Integration	N/A	Built-in CI/CD support

BalaganAgent vs Traditional Chaos Tools

Tools like Chaos Monkey and Gremlin are infrastructure-focused. BalaganAgent is purpose-built for AI agents:

Agent-aware: Understands LLMs, tools, context, prompts
Semantic failures: Injects hallucinations, not just network errors
Agent metrics: MTTR, recovery quality, reliability scoring
Framework integration: Works with CrewAI, AutoGen, LangChain

Credits

Built with inspiration from:

Chaos Monkey - Netflix's pioneering chaos engineering
Gremlin - Enterprise chaos engineering platform
pytest - Python testing framework
The entire chaos engineering community

License

Apache License 2.0 - see LICENSE for details

Star History

"Hope is not a strategy. Test your agents."

Made with ❤️ by the reliability community

⭐ Star on GitHub • 📦 PyPI Package • 🐛 Report Bug • 💡 Request Feature

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

arielshad

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.5.0

Feb 15, 2026

0.4.1

Feb 15, 2026

0.4.0

Jan 29, 2026

This version

0.3.1

Jan 28, 2026

0.3.0

Jan 28, 2026

0.2.1

Jan 28, 2026

0.1.3

Jan 28, 2026

0.1.2

Jan 28, 2026

0.1.1

Jan 28, 2026

0.1.0

Jan 28, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

balagan_agent-0.3.1.tar.gz (134.3 kB view details)

Uploaded Jan 28, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

balagan_agent-0.3.1-py3-none-any.whl (77.1 kB view details)

Uploaded Jan 28, 2026 Python 3

File details

Details for the file balagan_agent-0.3.1.tar.gz.

File metadata

Download URL: balagan_agent-0.3.1.tar.gz
Upload date: Jan 28, 2026
Size: 134.3 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for balagan_agent-0.3.1.tar.gz
Algorithm	Hash digest
SHA256	`6f5d3a7bdce78dfa0b75074412c0f51c1bbe4b83b509aa171a3f5fe669ededb7`
MD5	`44ab0b9f96c972e6be985fb4056b8109`
BLAKE2b-256	`f72a6615f3e102441820c7962279be0d08dd2705ce509e06cef2e2ac021f2bdb`

See more details on using hashes here.

Provenance

The following attestation bundles were made for balagan_agent-0.3.1.tar.gz:

Publisher: publish.yml on arielshad/balagan-agent

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: balagan_agent-0.3.1.tar.gz
- Subject digest: 6f5d3a7bdce78dfa0b75074412c0f51c1bbe4b83b509aa171a3f5fe669ededb7
- Sigstore transparency entry: 868843539
- Sigstore integration time: Jan 28, 2026
Source repository:
- Permalink: arielshad/balagan-agent@31df9a6221a41c15626f1048a676b58eda86cfd7
- Branch / Tag: refs/heads/main
- Owner: https://github.com/arielshad
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@31df9a6221a41c15626f1048a676b58eda86cfd7
- Trigger Event: push

File details

Details for the file balagan_agent-0.3.1-py3-none-any.whl.

File metadata

Download URL: balagan_agent-0.3.1-py3-none-any.whl
Upload date: Jan 28, 2026
Size: 77.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for balagan_agent-0.3.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ba57cbe74b6fb60822f6e1d2341b3cfdd2db4dde0f1629446d4fb818472d77c8`
MD5	`1de0a799ca87426e4fb6354a0078c0de`
BLAKE2b-256	`0e5ef1de37cda06e85954759e85f1bc862646da1763a89072038167df28374e6`

See more details on using hashes here.

Provenance

The following attestation bundles were made for balagan_agent-0.3.1-py3-none-any.whl:

Publisher: publish.yml on arielshad/balagan-agent

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: balagan_agent-0.3.1-py3-none-any.whl
- Subject digest: ba57cbe74b6fb60822f6e1d2341b3cfdd2db4dde0f1629446d4fb818472d77c8
- Sigstore transparency entry: 868843543
- Sigstore integration time: Jan 28, 2026
Source repository:
- Permalink: arielshad/balagan-agent@31df9a6221a41c15626f1048a676b58eda86cfd7
- Branch / Tag: refs/heads/main
- Owner: https://github.com/arielshad
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@31df9a6221a41c15626f1048a676b58eda86cfd7
- Trigger Event: push

balagan-agent 0.3.1

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

BalaganAgent

Table of Contents

Why BalaganAgent?

The Problem

The Solution

Industry Adoption

Features

Fault Injection

Metrics & Analysis

Reports

Quick Start

Installation

CrewAI Integration

Basic Usage

Run an Experiment

CLI Usage

Use Cases

Chaos Levels

Fault Injectors

Tool Failure Injector

Delay Injector

Hallucination Injector

Budget Exhaustion Injector

Metrics

MTTR Calculator

Reliability Scorer

Scenarios

Stress Testing

Reports

Example Output

Project Structure

Documentation

Examples

Community

Contributing

Contributors

Roadmap

Current (v0.1.x)

Coming Soon (v0.2.x)

Future (v0.3.x+)

Comparison

BalaganAgent vs Manual Testing

BalaganAgent vs Traditional Chaos Tools

Credits

License

Star History

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance