Resilience middleware that makes AI agents self-healing — retries, checkpointing, circuit breakers, cost budgets, and graceful degradation for any agent framework.
Project description
🛡️ Tardigrade
Resilience middleware that makes AI agents self-healing.
Retries · Checkpointing · Circuit breakers · Cost budgets · Graceful degradation
Works with LangGraph, CrewAI, OpenAI SDK, or raw API calls.
An agent with 85% accuracy per step has just 20% end-to-end success over 10 steps.
Tardigrade wraps your agent with production-grade resilience so step 7 failing
does not mean starting over from scratch.
Install
pip install tardigrade-ai
# With the real-time dashboard:
pip install tardigrade-ai[dashboard]
Quickstart
from tardigrade import armor, Workflow, RetryConfig, BudgetConfig, StepCostReport
@armor(name="fetch", retry=RetryConfig(max_attempts=3))
def fetch_data(url: str) -> dict:
return {"result": "..."}, StepCostReport(input_tokens=500, output_tokens=200, model="gpt-5.4")
@armor(name="analyze")
def analyze(data: dict) -> str:
return "analysis", StepCostReport(input_tokens=2000, output_tokens=1000, model="gpt-5.4")
with Workflow("my_pipeline", budget=BudgetConfig(max_budget_usd=1.00)) as wf:
data = fetch_data("https://api.example.com")
result = analyze(data)
If the workflow crashes, rerun it with the same run_id and Tardigrade resumes
from the last checkpoint instead of starting from scratch.
Features
| Feature | What it does | Docs |
|---|---|---|
| Automatic retries | Exponential backoff with jitter. Configurable per step. | API |
| Checkpointing | Resume multi-step workflows from the last successful step. | API |
| Circuit breakers | Switch to a fallback model when your primary provider is down. | API |
| Cost budgets | Hard spending limits per workflow with policy-driven enforcement. | API |
| Graceful degradation | Collect partial results instead of crashing on step failures. | API |
| Real-time dashboard | Terminal UI showing workflow progress, costs, and circuit states. | API |
| Framework-agnostic | Works with any Python code. No lock-in to LangGraph or CrewAI. | Examples |
| Structured logging | Every event is a structured log event via structlog. |
API |
Before and After
| Without Tardigrade | With Tardigrade | |
|---|---|---|
| Step 7 of 10 fails | Start over. Waste $3 and 4 minutes. | Resume from step 7. About $0.15 and 30 seconds. |
| Provider goes down | Cascade failure across all agents. | Circuit breaker routes to a backup model. |
| Agent runs away on tokens | Surprise bills and no hard stop. | Hard budget cap with degradation or stop. |
| Debugging a failure | Wall of JSON, grep, and prayer. | Dashboard plus structured event history. |
Examples
Retry a flaky step
from tardigrade import RetryConfig, armor
class RateLimitedError(RuntimeError):
pass
@armor(retry=RetryConfig(max_attempts=4, retryable_exceptions=(RateLimitedError, TimeoutError)))
def call_provider(prompt: str) -> str:
...
Switch to a fallback model when the primary is down
from tardigrade import CircuitBreakerConfig, armor
def call_claude_haiku(prompt: str) -> str:
return "fallback response"
@armor(circuit_breaker=CircuitBreakerConfig(failure_threshold=3, fallback=call_claude_haiku))
def call_gpt54(prompt: str) -> str:
...
Resume a checkpointed workflow
from tardigrade import Workflow, armor
@armor(name="fetch")
def fetch() -> str:
return "data"
with Workflow("pipeline", run_id="run-001"):
fetch()
Enforce a workflow budget
from tardigrade import BudgetConfig, StepCostReport, Workflow, armor
@armor(name="summarize")
def summarize(text: str) -> str:
return "summary", StepCostReport(input_tokens=500, output_tokens=200, model="gpt-5.4")
with Workflow("pipeline", budget=BudgetConfig(max_budget_usd=0.50)):
summarize("hello")
Return partial results instead of crashing
from tardigrade import DegradationConfig, DegradationPolicy, Workflow, armor
@armor(name="enrich")
def enrich(data: dict) -> dict:
raise ConnectionError("provider down")
with Workflow("pipeline", degradation=DegradationConfig(policy=DegradationPolicy.COLLECT)) as wf:
enrich({"id": 1})
assert wf.result is not None
assert wf.result.status == "failed"
Extended runnable examples live in docs/examples.
Dashboard
# CLI
tardigrade dashboard
# Programmatic
from tardigrade import Dashboard
Dashboard().start_in_thread()
Demo
Run the self-contained demo script:
uv run demo/demo_workflow.py
Record the hero GIF when vhs is installed:
vhs demo/hero.tape
API and Project Docs
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file tardigrade_ai-0.1.0.tar.gz.
File metadata
- Download URL: tardigrade_ai-0.1.0.tar.gz
- Upload date:
- Size: 237.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3acd88f2fa8723848bef9c1ff42d19e84b66f5bdaba7762d933955e5bf13cc6b
|
|
| MD5 |
97738a91a2da3c0e80dcbf797be1e069
|
|
| BLAKE2b-256 |
1aac0ce74f9305cb625b4ca2084dbf486ad1d81f2c5b553becaae8e8beefb74a
|
Provenance
The following attestation bundles were made for tardigrade_ai-0.1.0.tar.gz:
Publisher:
ci.yml on Cole-Godfrey/tardigrade
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
tardigrade_ai-0.1.0.tar.gz -
Subject digest:
3acd88f2fa8723848bef9c1ff42d19e84b66f5bdaba7762d933955e5bf13cc6b - Sigstore transparency entry: 1154841818
- Sigstore integration time:
-
Permalink:
Cole-Godfrey/tardigrade@5322ca2e8bbdf00b660a79c49424e7a29e7dfe8f -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/Cole-Godfrey
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
ci.yml@5322ca2e8bbdf00b660a79c49424e7a29e7dfe8f -
Trigger Event:
push
-
Statement type:
File details
Details for the file tardigrade_ai-0.1.0-py3-none-any.whl.
File metadata
- Download URL: tardigrade_ai-0.1.0-py3-none-any.whl
- Upload date:
- Size: 31.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5ef5d09e854802a42ff7fbc5d4e4a6926ccb6fe700aea96925508f97f6466118
|
|
| MD5 |
50a25d12b81605a141f3e718fb570818
|
|
| BLAKE2b-256 |
21433da4ff5e62e3e469081a84b5bdcbb1df54d4d1524f5327535d82112c0171
|
Provenance
The following attestation bundles were made for tardigrade_ai-0.1.0-py3-none-any.whl:
Publisher:
ci.yml on Cole-Godfrey/tardigrade
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
tardigrade_ai-0.1.0-py3-none-any.whl -
Subject digest:
5ef5d09e854802a42ff7fbc5d4e4a6926ccb6fe700aea96925508f97f6466118 - Sigstore transparency entry: 1154841821
- Sigstore integration time:
-
Permalink:
Cole-Godfrey/tardigrade@5322ca2e8bbdf00b660a79c49424e7a29e7dfe8f -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/Cole-Godfrey
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
ci.yml@5322ca2e8bbdf00b660a79c49424e7a29e7dfe8f -
Trigger Event:
push
-
Statement type: