Agent reliability simulator — chaos engineering for AI agents

These details have not been verified by PyPI

Project links

Project description

cascade

Agent reliability simulator -- chaos engineering for AI agents.

cascade models what actually happens when multi-step agent systems fail: retries that help, retries that waste money, fallback paths that degrade quality, and corrupt intermediate outputs that poison downstream steps.

At a Glance

Monte Carlo simulation for multi-step agent pipelines
Failure injection for hallucination, refusal, tool failure, latency, and context loss
Strategy comparison across retry, fallback, checkpoint, parallel, human review, and adaptive control
Cost, latency, and reliability tradeoff analysis in one run
Report generation for engineering and decision-making, not just toy metrics

The Problem

Accuracy compounds catastrophically in multi-step agent pipelines:

Steps	Per-Step Accuracy	End-to-End Success
5	95%	77%
10	95%	60%
10	85%	20%
20	90%	12%
50	95%	8%

A 95%-accurate agent on a 50-step task succeeds 8% of the time. Netflix built Chaos Monkey to test distributed systems resilience. cascade is the equivalent for AI agents.

The Solution

cascade is a Monte Carlo simulation framework that models multi-step AI agent pipelines, injects realistic failure modes, and measures end-to-end reliability under different resilience strategies.

What you get:

Quantified reliability for any agent pipeline architecture
Strategy comparison with cost modeling (retry, fallback, parallel, checkpoint, adaptive)
Pareto frontier visualization: cost vs. reliability tradeoffs
Cascading corruption modeling -- the hardest failure mode, where bad output propagates

Quick Start

pip install cascade-agent-sim

Minimal simulation:

from cascade import Pipeline, Step, Simulator, FailureConfig
from cascade import strategies

# Define your agent pipeline
pipeline = Pipeline(steps=[
    Step(name="research", model="sonnet", tools=["web_search", "read_file"]),
    Step(name="analyze", model="sonnet", tools=["python_exec"], depends_on=["research"]),
    Step(name="draft", model="sonnet", tools=["write_file"], depends_on=["analyze"]),
    Step(name="review", model="opus", tools=["read_file"], depends_on=["draft"]),
    Step(name="revise", model="sonnet", tools=["write_file"], depends_on=["review"]),
    Step(name="publish", model="haiku", tools=["api_call"], depends_on=["revise"]),
])

# Configure failure injection
failures = FailureConfig(
    hallucination_rate=0.05,
    refusal_rate=0.02,
    tool_failure_rate=0.03,
    context_overflow_at=100_000,
    cascade_propagation=0.8,
)

# Run 10,000 simulations
sim = Simulator(pipeline, failures, n_simulations=10_000, seed=42)
results = sim.run()

# Compare resilience strategies
from cascade import Comparator
comp = Comparator(pipeline, failures, n_simulations=10_000, seed=42)
comparison = comp.compare([
    strategies.naive(),
    strategies.retry(max_attempts=3),
    strategies.parallel(n=3, vote="majority"),
    strategies.checkpoint(interval=2),
    strategies.adaptive(escalation_threshold=2),
])

comparison.print_table()
comparison.recommend()

Output:

Strategy Comparison (10,000 simulations each):
+-----------------------+----------+-----------+----------+------------+
| Strategy              | Success  | Avg Cost  | Avg Time | Failures   |
+-----------------------+----------+-----------+----------+------------+
| Naive                 |  54.0%   |  $0.0318  |   6.1s   |      4,599 |
| Retry(3)              |  99.3%   |  $0.0451  |   8.5s   |         73 |
| Parallel(3)           |  84.8%   |  $0.1146  |   7.3s   |      1,525 |
| Checkpoint(2)         |  99.9%   |  $0.0453  |   8.6s   |          8 |
| Adaptive              |  99.3%   |  $0.0451  |   8.5s   |         73 |
+-----------------------+----------+-----------+----------+------------+

Recommendation: Retry(3) (99.3% success at 1.4x baseline cost)

Architecture

graph TD
    A[Pipeline Definition] --> C[Simulation Engine]
    B[Failure Injector] --> C
    C --> D[Resilience Strategy Comparator]
    D --> E[Report Generator]

    subgraph "Failure Modes"
        B1[Hallucination]
        B2[Refusal]
        B3[Tool Failure]
        B4[Context Overflow]
        B5[Cascading Corruption]
        B6[Latency Spike]
    end

    subgraph "Strategies"
        S1[Naive]
        S2[Retry]
        S3[Fallback]
        S4[Parallel Redundancy]
        S5[Checkpoint + Rollback]
        S6[Human-in-the-Loop]
        S7[Adaptive]
    end

    B1 & B2 & B3 & B4 & B5 & B6 --> B
    S1 & S2 & S3 & S4 & S5 & S6 & S7 --> D

Failure Models

Failure Mode	Description	Default Rate
Hallucination	Agent produces plausible but incorrect output (wrong tool args, fabricated data, incorrect reasoning, format errors)	5%
Refusal	Safety filter blocks a legitimate action (false positive)	2%
Tool Failure	External API returns an error, timeout, or rate limit	3%
Context Overflow	Context window fills up, losing earlier information	At 128K tokens
Cascading Corruption	Hallucinated output propagates to downstream steps	80% propagation
Latency Spike	Individual step takes 10x longer than expected	1%

Resilience Strategies

from cascade import strategies

strategies.naive()                    # No retry, fail fast
strategies.retry(max_attempts=3)      # Simple retry
strategies.fallback(models=["sonnet", "haiku"])  # Try models in order
strategies.parallel(n=3, vote="majority")  # Run N agents, majority vote
strategies.checkpoint(interval=5)     # Checkpoint every N steps, rollback on failure
strategies.human_in_loop(at_steps=[5, 10])  # Human verification at key steps
strategies.adaptive(                  # Escalate after repeated failures
    escalation_threshold=2,
    escalation_strategy="parallel",
)

What It Helps You Answer

How fast does reliability collapse as workflows get longer?
Which strategy buys the most reliability per unit cost?
Where do checkpoint intervals actually matter?
How much damage does one bad intermediate result cause downstream?
When is human review worth the latency?

CLI

# Run a single simulation
cascade simulate pipeline.json --strategy retry --simulations 10000

# Compare strategies
cascade compare pipeline.json --strategies naive,retry,parallel,checkpoint,adaptive

# Export results
cascade compare pipeline.json -o results.json --pareto pareto.png --heatmap heatmap.png

API Reference

Core Classes

Pipeline -- DAG of Steps defining the agent workflow
Step -- Single agent action with model, tools, and dependencies
FailureConfig -- Failure injection probabilities and parameters
Simulator -- Monte Carlo simulation engine
Comparator -- Multi-strategy comparison orchestrator
StrategyComparison -- Results container with table, plot, and recommend methods

Key Functions

strategies.naive() / retry() / fallback() / parallel() / checkpoint() / human_in_loop() / adaptive() -- Strategy factories
build_report(result) -- Build structured report from SimulationResult
format_report(report) -- Format report as human-readable text
export_json(report, path) -- Export report to JSON

Statistical Utilities

proportion_ci(successes, total) -- Wilson score CI for success rates
mean_ci(values) -- t-distribution CI for means
summarize(values) -- Distribution summary (mean, median, percentiles)
pareto_frontier(costs, rates) -- Compute Pareto-optimal strategies

Examples

See the examples/ directory:

research_pipeline.py -- 6-step research agent with full strategy comparison
coding_pipeline.py -- 10-step coding agent demonstrating the compounding problem
customer_support.py -- Diamond-shaped pipeline with parallel research paths

Demo

Run the offline walkthrough with:

uv run python examples/demo.py

For larger reliability studies and strategy comparisons, see examples/.

Development

git clone https://github.com/sushaan-k/cascade.git
cd cascade
pip install -e ".[dev]"
pytest -v
ruff check src/ tests/
mypy src/cascade/

Contributing

Contributions are welcome. Please:

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Write tests for your changes
Ensure all tests pass (pytest -v)
Ensure code passes linting (ruff check .)
Submit a pull request

License

MIT License. See LICENSE for details.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

Apr 8, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

floww-0.1.0.tar.gz (135.7 kB view details)

Uploaded Apr 8, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

floww-0.1.0-py3-none-any.whl (29.6 kB view details)

Uploaded Apr 8, 2026 Python 3

File details

Details for the file floww-0.1.0.tar.gz.

File metadata

Download URL: floww-0.1.0.tar.gz
Upload date: Apr 8, 2026
Size: 135.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for floww-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`63d16e1404725650d72b3b7104603aa552bbbb75160a3c3bfb86960722adde51`
MD5	`e62a7578a8f9c6896609cd616620aa35`
BLAKE2b-256	`a26fcb812453869825b340609bd06770edfeb142d5e19acd2c8dca9348adfc06`

See more details on using hashes here.

File details

Details for the file floww-0.1.0-py3-none-any.whl.

File metadata

Download URL: floww-0.1.0-py3-none-any.whl
Upload date: Apr 8, 2026
Size: 29.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for floww-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`cbd1f4989eb22812595eee5c7f3de6653436ccb7d22808519932e6d152b45407`
MD5	`cc4823f83a89cf4a72c8513cc4799172`
BLAKE2b-256	`7fade70eacb1cd79883895465e4f8babda3a836559eaf8f9d6c4760f39033bb5`

See more details on using hashes here.

floww 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

cascade

At a Glance

The Problem

The Solution

Quick Start

Architecture

Failure Models

Resilience Strategies

What It Helps You Answer

CLI

API Reference

Core Classes

Key Functions

Statistical Utilities

Examples

Demo

Development

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes