Chaos engineering framework for AI agents - stress-test agent reliability
Project description
BalaganAgent
Chaos Engineering for AI Agents
Everyone demos agents. Nobody stress-tests them.
Quick Start • Features • Documentation • Examples • Contributing
BalaganAgent is a reliability testing framework that stress-tests AI agents through controlled fault injection—because your agent will fail in production, and you should know how it handles it.
Table of Contents
- Why BalaganAgent?
- Features
- Quick Start
- Use Cases
- Chaos Levels
- Fault Injectors
- Metrics
- Reports
- Project Structure
- Community
- Contributing
- Roadmap
- License
Why BalaganAgent?
AI agents are entering production environments, but there's zero reliability discipline. BalaganAgent brings the battle-tested principles of chaos engineering (think Chaos Monkey, Gremlin) to the world of AI agents.
The Problem
- Agents fail silently in production
- Tool calls time out, return garbage, or hallucinate
- Context gets corrupted, budgets get exhausted
- Nobody knows until users complain
The Solution
- Inject failures in development, not production
- Measure recovery time (MTTR)
- Score reliability like we score SLAs
- Find breaking points before users do
Industry Adoption
BalaganAgent is designed for teams that take AI reliability seriously:
- Production AI systems requiring SLO compliance
- Enterprise deployments with strict reliability requirements
- Development teams practicing chaos engineering
- Organizations building mission-critical AI agents
Features
Fault Injection
- Tool Failures: Exceptions, timeouts, empty responses, malformed data, rate limits
- Delays: Fixed, random, spike patterns, degrading latency
- Hallucinations: Wrong values, fabricated data, contradictions, fake references
- Context Corruption: Truncation, reordering, noise injection, encoding issues
- Budget Exhaustion: Token limits, cost caps, rate limiting, call quotas
Metrics & Analysis
- MTTR (Mean Time To Recovery): How fast does your agent recover?
- Recovery Quality: Did it recover correctly or just fail gracefully?
- Reliability Score: SRE-grade scoring (five nines to one nine)
- Error Budget Tracking: Know when to freeze changes
Reports
- Terminal output with colors
- JSON for programmatic analysis
- Markdown for documentation
- HTML dashboards
Quick Start
Get up and running in minutes.
Installation
pip install balagan-agent
Verify installation:
balaganagent --version
CrewAI Integration
Using CrewAI? Check out our CrewAI Integration Guide for a complete step-by-step walkthrough!
Quick example:
from balaganagent.wrappers.crewai import CrewAIWrapper
# Your existing CrewAI setup
crew = Crew(agents=[agent], tasks=[task])
# Add chaos testing (3 lines!)
wrapper = CrewAIWrapper(crew, chaos_level=0.5)
wrapper.configure_chaos(enable_tool_failures=True, enable_delays=True)
result = wrapper.kickoff()
# Check metrics
metrics = wrapper.get_metrics()
print(f"Success rate: {metrics['aggregate']['operations']['success_rate']:.1%}")
→ Full CrewAI Integration Guide
Basic Usage
from balaganagent import ChaosEngine, AgentWrapper
# Your agent with tools
class MyAgent:
def search(self, query: str) -> dict:
return {"results": [...]}
def calculate(self, expr: str) -> float:
return eval(expr)
# Wrap it with chaos
agent = MyAgent()
wrapper = AgentWrapper(agent)
wrapper.configure_chaos(chaos_level=0.5) # 50% chaos intensity
# Now calls might fail randomly!
result = wrapper.call_tool("search", "test query")
Run an Experiment
from balaganagent import ChaosEngine
from balaganagent.runner import scenario, ExperimentRunner
# Define a test scenario
test = (
scenario("search-reliability")
.description("Test search under chaos")
.call("search", "AI safety")
.call("search", "machine learning")
.call("calculate", "2 + 2")
.with_chaos(level=0.75)
.build()
)
# Run it
runner = ExperimentRunner()
runner.set_agent(MyAgent())
result = runner.run_scenario(test)
print(f"Success Rate: {result.experiment_result.success_rate:.1%}")
print(f"MTTR: {result.mttr_stats['mttr_seconds']:.2f}s")
CLI Usage
# Run a demo
balaganagent demo --chaos-level 0.5
# Initialize a new project
balaganagent init my-chaos-tests
# Run a scenario file
balaganagent run scenarios/search_test.json --chaos-level 0.75
# Run stress tests
balaganagent stress scenarios/critical_path.json --iterations 100
Use Cases
BalaganAgent helps you answer critical questions about your agents:
- Pre-Production Validation: Will my agent handle API timeouts gracefully?
- Integration Testing: Does my agent recover when tools return malformed data?
- Load Testing: How does performance degrade under high failure rates?
- Reliability Engineering: What's my agent's actual MTTR and recovery rate?
- SLO Compliance: Can I maintain 99.9% availability under chaos?
- Regression Testing: Did my recent changes break failure handling?
Chaos Levels
The chaos_level parameter controls fault injection probability:
| Level | Base Failure Rate | Use Case |
|---|---|---|
| 0.0 | 0% | Baseline (no chaos) |
| 0.25 | 2.5% | Light testing |
| 0.5 | 5% | Moderate chaos |
| 1.0 | 10% | Standard chaos |
| 2.0 | 20% | Stress testing |
Fault Injectors
Tool Failure Injector
Simulates various tool failure modes:
from balaganagent.injectors import ToolFailureInjector
from balaganagent.injectors.tool_failure import ToolFailureConfig, FailureMode
injector = ToolFailureInjector(ToolFailureConfig(
probability=0.1,
failure_modes=[
FailureMode.TIMEOUT,
FailureMode.RATE_LIMIT,
FailureMode.SERVICE_UNAVAILABLE,
]
))
Delay Injector
Simulates network latency patterns:
from balaganagent.injectors import DelayInjector
from balaganagent.injectors.delay import DelayConfig, DelayPattern, LatencySimulator
# Use presets
injector = LatencySimulator.create("poor") # High latency, high jitter
# Or configure manually
injector = DelayInjector(DelayConfig(
pattern=DelayPattern.SPIKE,
min_delay_ms=50,
max_delay_ms=200,
spike_probability=0.1,
spike_multiplier=10,
))
Hallucination Injector
Corrupts data to test agent's ability to detect bad information:
from balaganagent.injectors import HallucinationInjector
from balaganagent.injectors.hallucination import HallucinationConfig, HallucinationType
injector = HallucinationInjector(HallucinationConfig(
probability=0.05,
severity=0.5, # 0=subtle, 1=obvious
hallucination_types=[
HallucinationType.WRONG_VALUE,
HallucinationType.FABRICATED_DATA,
HallucinationType.NONEXISTENT_REFERENCE,
]
))
Budget Exhaustion Injector
Tests behavior when resources run out:
from balaganagent.injectors import BudgetExhaustionInjector
from balaganagent.injectors.budget import BudgetExhaustionConfig
injector = BudgetExhaustionInjector(BudgetExhaustionConfig(
token_limit=10000,
cost_limit_dollars=1.00,
rate_limit_per_minute=60,
fail_hard=True, # Raise exception vs return error
))
Metrics
MTTR Calculator
from balaganagent.metrics import MTTRCalculator
calc = MTTRCalculator()
# Record failure and recovery
calc.record_failure("search", "timeout")
# ... agent recovers ...
calc.record_recovery("search", "timeout", retries=2)
stats = calc.get_recovery_stats()
print(f"MTTR: {stats['mttr_seconds']:.2f}s")
print(f"Recovery Rate: {stats['recovery_rate']:.1%}")
Reliability Scorer
from balaganagent.metrics import ReliabilityScorer
scorer = ReliabilityScorer(slos={
"availability": 0.99,
"latency_p99_ms": 2000,
})
# Record operations
for result in agent_results:
scorer.record_operation(
success=result.success,
latency_ms=result.latency,
)
report = scorer.calculate_score()
print(f"Grade: {report.grade.value}")
print(f"Availability: {report.availability:.3%}")
print(f"Error Budget Remaining: {report.error_budget_remaining:.1%}")
Scenarios
Scenarios can be defined in code or JSON:
{
"name": "critical-path-test",
"description": "Test the critical user journey",
"operations": [
{"tool": "authenticate", "args": ["user123"]},
{"tool": "fetch_profile", "args": ["user123"]},
{"tool": "search", "args": ["recent orders"]},
{"tool": "process_request", "kwargs": {"action": "refund"}}
],
"chaos_config": {
"chaos_level": 0.5,
"enable_tool_failures": true,
"enable_delays": true,
"enable_budget_exhaustion": true
}
}
Stress Testing
Find your agent's breaking point:
runner = ExperimentRunner()
runner.set_agent(agent)
results = runner.run_stress_test(
scenario,
iterations=100,
chaos_levels=[0.1, 0.25, 0.5, 0.75, 1.0, 1.5, 2.0],
)
for level, data in results["levels"].items():
print(f"Chaos {level}: {data['pass_rate']:.1%} pass rate")
Reports
Generate reports in multiple formats:
from balaganagent.reporting import ReportGenerator
gen = ReportGenerator()
report = gen.generate_from_results(results, metrics)
# Terminal (with colors)
print(gen.to_terminal(report))
# Save as files
gen.save(report, "report.json", format="json")
gen.save(report, "report.md", format="markdown")
gen.save(report, "report.html", format="html")
Example Output
============================================================
BALAGANAGENT EXPERIMENT REPORT
============================================================
Generated: 2024-01-15T10:30:00
Status: WARNING
SUMMARY
Experiments: 5 (Completed: 4, Failed: 1)
Operations: 150 (Success Rate: 87.3%)
Faults: 23 injected
RELIABILITY
Score: 0.82
Grade: 99%
MTTR: 1.3s
EXPERIMENTS
search-reliability [completed]
Duration: 12.34s | Success: 90.0% | Recovery: 85.0%
calculate-stress [completed]
Duration: 8.21s | Success: 95.0% | Recovery: 100.0%
RECOMMENDATIONS
1. Recovery rate is 85.0%. Agents should implement better recovery mechanisms.
2. Most frequent fault type: tool_failure. Focus testing on this failure mode.
============================================================
Project Structure
balaganagent/
├── __init__.py # Main exports
├── engine.py # Chaos engine core
├── experiment.py # Experiment definitions
├── wrapper.py # Agent wrapping
├── runner.py # Experiment runner
├── reporting.py # Report generation
├── cli.py # Command-line interface
├── injectors/ # Fault injectors
│ ├── base.py # Base injector class
│ ├── tool_failure.py # Tool failure injection
│ ├── delay.py # Latency injection
│ ├── hallucination.py # Data corruption
│ ├── context.py # Context corruption
│ └── budget.py # Budget exhaustion
└── metrics/ # Metrics collection
├── collector.py # General metrics
├── mttr.py # MTTR calculation
├── recovery.py # Recovery quality
└── reliability.py # Reliability scoring
Documentation
- Development Guide - Set up your development environment
- Contributing Guide - How to contribute to BalaganAgent
- Security Policy - Vulnerability reporting process
- Changelog - Version history and release notes
- CrewAI Integration - Step-by-step CrewAI setup
Examples
Check out real-world examples:
- Meeting Notes Agent - Real agent under chaos
- CrewAI Integration - CrewAI with chaos testing
- Stress Testing - Finding breaking points
- BDD Scenarios - Behavior-driven chaos scenarios
Community
- GitHub Discussions: Ask questions and share ideas
- Issue Tracker: Report bugs and request features
- Linkedin: Follow for updates ariel-shadkhan
Contributing
We welcome contributions of all kinds! Whether it's:
- Bug reports and feature requests
- Code contributions (new injectors, wrappers, metrics)
- Documentation improvements
- Example agents and scenarios
- Blog posts and tutorials
Please read our Contributing Guide and Development Guide to get started.
Contributors
Thanks to all our contributors!
Roadmap
Current (v0.1.x)
- ✅ Core chaos engine
- ✅ Basic injectors (tool failure, delay, hallucination, context, budget)
- ✅ CrewAI, AutoGen, LangChain wrappers
- ✅ MTTR and reliability metrics
- ✅ Multi-format reporting
Coming Soon (v0.2.x)
- 🔄 Real-time chaos injection during agent execution
- 🔄 Advanced metrics (latency percentiles, error budgets)
- 🔄 Chaos schedules and campaigns
- 🔄 Web dashboard for visualization
- 🔄 More agent framework integrations (LangGraph, AutoGPT)
Future (v0.3.x+)
- 📋 Distributed chaos experiments
- 📋 ML-powered failure prediction
- 📋 Custom injector plugins
- 📋 Production chaos (with safeguards)
- 📋 Cost impact analysis
Have an idea? Open a discussion!
Comparison
BalaganAgent vs Manual Testing
| Aspect | Manual Testing | BalaganAgent |
|---|---|---|
| Coverage | Limited scenarios | Comprehensive failure modes |
| Consistency | Varies by tester | Reproducible experiments |
| Metrics | Manual tracking | Automated MTTR, recovery rate |
| Scale | Time-consuming | Run 100s of tests easily |
| Integration | N/A | Built-in CI/CD support |
BalaganAgent vs Traditional Chaos Tools
Tools like Chaos Monkey and Gremlin are infrastructure-focused. BalaganAgent is purpose-built for AI agents:
- Agent-aware: Understands LLMs, tools, context, prompts
- Semantic failures: Injects hallucinations, not just network errors
- Agent metrics: MTTR, recovery quality, reliability scoring
- Framework integration: Works with CrewAI, AutoGen, LangChain
Credits
Built with inspiration from:
- Chaos Monkey - Netflix's pioneering chaos engineering
- Gremlin - Enterprise chaos engineering platform
- pytest - Python testing framework
- The entire chaos engineering community
License
Apache License 2.0 - see LICENSE for details
Star History
"Hope is not a strategy. Test your agents."
Made with ❤️ by the reliability community
⭐ Star on GitHub • 📦 PyPI Package • 🐛 Report Bug • 💡 Request Feature
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file balagan_agent-0.3.1.tar.gz.
File metadata
- Download URL: balagan_agent-0.3.1.tar.gz
- Upload date:
- Size: 134.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6f5d3a7bdce78dfa0b75074412c0f51c1bbe4b83b509aa171a3f5fe669ededb7
|
|
| MD5 |
44ab0b9f96c972e6be985fb4056b8109
|
|
| BLAKE2b-256 |
f72a6615f3e102441820c7962279be0d08dd2705ce509e06cef2e2ac021f2bdb
|
Provenance
The following attestation bundles were made for balagan_agent-0.3.1.tar.gz:
Publisher:
publish.yml on arielshad/balagan-agent
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
balagan_agent-0.3.1.tar.gz -
Subject digest:
6f5d3a7bdce78dfa0b75074412c0f51c1bbe4b83b509aa171a3f5fe669ededb7 - Sigstore transparency entry: 868843539
- Sigstore integration time:
-
Permalink:
arielshad/balagan-agent@31df9a6221a41c15626f1048a676b58eda86cfd7 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/arielshad
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@31df9a6221a41c15626f1048a676b58eda86cfd7 -
Trigger Event:
push
-
Statement type:
File details
Details for the file balagan_agent-0.3.1-py3-none-any.whl.
File metadata
- Download URL: balagan_agent-0.3.1-py3-none-any.whl
- Upload date:
- Size: 77.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ba57cbe74b6fb60822f6e1d2341b3cfdd2db4dde0f1629446d4fb818472d77c8
|
|
| MD5 |
1de0a799ca87426e4fb6354a0078c0de
|
|
| BLAKE2b-256 |
0e5ef1de37cda06e85954759e85f1bc862646da1763a89072038167df28374e6
|
Provenance
The following attestation bundles were made for balagan_agent-0.3.1-py3-none-any.whl:
Publisher:
publish.yml on arielshad/balagan-agent
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
balagan_agent-0.3.1-py3-none-any.whl -
Subject digest:
ba57cbe74b6fb60822f6e1d2341b3cfdd2db4dde0f1629446d4fb818472d77c8 - Sigstore transparency entry: 868843543
- Sigstore integration time:
-
Permalink:
arielshad/balagan-agent@31df9a6221a41c15626f1048a676b58eda86cfd7 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/arielshad
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@31df9a6221a41c15626f1048a676b58eda86cfd7 -
Trigger Event:
push
-
Statement type: