Skip to main content

Resilience middleware that makes AI agents self-healing — retries, checkpointing, circuit breakers, cost budgets, and graceful degradation for any agent framework.

Project description

🛡️ Tardigrade

Resilience middleware that makes AI agents self-healing.

Retries · Checkpointing · Circuit breakers · Cost budgets · Graceful degradation
Works with LangGraph, CrewAI, OpenAI SDK, or raw API calls.

PyPI version Python 3.11+ License: Apache 2.0 Tests

Tardigrade dashboard demo

An agent with 85% accuracy per step has just 20% end-to-end success over 10 steps.
Tardigrade wraps your agent with production-grade resilience so step 7 failing
does not mean starting over from scratch.

Install

pip install tardigrade-ai

# With the real-time dashboard:
pip install tardigrade-ai[dashboard]

Quickstart

from tardigrade import armor, Workflow, RetryConfig, BudgetConfig, StepCostReport

@armor(name="fetch", retry=RetryConfig(max_attempts=3))
def fetch_data(url: str) -> dict:
    return {"result": "..."}, StepCostReport(input_tokens=500, output_tokens=200, model="gpt-5.4")

@armor(name="analyze")
def analyze(data: dict) -> str:
    return "analysis", StepCostReport(input_tokens=2000, output_tokens=1000, model="gpt-5.4")

with Workflow("my_pipeline", budget=BudgetConfig(max_budget_usd=1.00)) as wf:
    data = fetch_data("https://api.example.com")
    result = analyze(data)

If the workflow crashes, rerun it with the same run_id and Tardigrade resumes from the last checkpoint instead of starting from scratch.

Features

Feature What it does Docs
Automatic retries Exponential backoff with jitter. Configurable per step. API
Checkpointing Resume multi-step workflows from the last successful step. API
Circuit breakers Switch to a fallback model when your primary provider is down. API
Cost budgets Hard spending limits per workflow with policy-driven enforcement. API
Graceful degradation Collect partial results instead of crashing on step failures. API
Real-time dashboard Terminal UI showing workflow progress, costs, and circuit states. API
Framework-agnostic Works with any Python code. No lock-in to LangGraph or CrewAI. Examples
Structured logging Every event is a structured log event via structlog. API

Before and After

Without Tardigrade With Tardigrade
Step 7 of 10 fails Start over. Waste $3 and 4 minutes. Resume from step 7. About $0.15 and 30 seconds.
Provider goes down Cascade failure across all agents. Circuit breaker routes to a backup model.
Agent runs away on tokens Surprise bills and no hard stop. Hard budget cap with degradation or stop.
Debugging a failure Wall of JSON, grep, and prayer. Dashboard plus structured event history.

Examples

Retry a flaky step

from tardigrade import RetryConfig, armor

class RateLimitedError(RuntimeError):
    pass

@armor(retry=RetryConfig(max_attempts=4, retryable_exceptions=(RateLimitedError, TimeoutError)))
def call_provider(prompt: str) -> str:
    ...

Switch to a fallback model when the primary is down

from tardigrade import CircuitBreakerConfig, armor

def call_claude_haiku(prompt: str) -> str:
    return "fallback response"

@armor(circuit_breaker=CircuitBreakerConfig(failure_threshold=3, fallback=call_claude_haiku))
def call_gpt54(prompt: str) -> str:
    ...

Resume a checkpointed workflow

from tardigrade import Workflow, armor

@armor(name="fetch")
def fetch() -> str:
    return "data"

with Workflow("pipeline", run_id="run-001"):
    fetch()

Enforce a workflow budget

from tardigrade import BudgetConfig, StepCostReport, Workflow, armor

@armor(name="summarize")
def summarize(text: str) -> str:
    return "summary", StepCostReport(input_tokens=500, output_tokens=200, model="gpt-5.4")

with Workflow("pipeline", budget=BudgetConfig(max_budget_usd=0.50)):
    summarize("hello")

Return partial results instead of crashing

from tardigrade import DegradationConfig, DegradationPolicy, Workflow, armor

@armor(name="enrich")
def enrich(data: dict) -> dict:
    raise ConnectionError("provider down")

with Workflow("pipeline", degradation=DegradationConfig(policy=DegradationPolicy.COLLECT)) as wf:
    enrich({"id": 1})

assert wf.result is not None
assert wf.result.status == "failed"

Extended runnable examples live in docs/examples.

Dashboard

Tardigrade dashboard demo

# CLI
tardigrade dashboard
# Programmatic
from tardigrade import Dashboard

Dashboard().start_in_thread()

Demo

Run the self-contained demo script:

uv run demo/demo_workflow.py

Record the hero GIF when vhs is installed:

vhs demo/hero.tape

API and Project Docs

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tardigrade_ai-0.1.0.tar.gz (237.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tardigrade_ai-0.1.0-py3-none-any.whl (31.3 kB view details)

Uploaded Python 3

File details

Details for the file tardigrade_ai-0.1.0.tar.gz.

File metadata

  • Download URL: tardigrade_ai-0.1.0.tar.gz
  • Upload date:
  • Size: 237.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for tardigrade_ai-0.1.0.tar.gz
Algorithm Hash digest
SHA256 3acd88f2fa8723848bef9c1ff42d19e84b66f5bdaba7762d933955e5bf13cc6b
MD5 97738a91a2da3c0e80dcbf797be1e069
BLAKE2b-256 1aac0ce74f9305cb625b4ca2084dbf486ad1d81f2c5b553becaae8e8beefb74a

See more details on using hashes here.

Provenance

The following attestation bundles were made for tardigrade_ai-0.1.0.tar.gz:

Publisher: ci.yml on Cole-Godfrey/tardigrade

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file tardigrade_ai-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: tardigrade_ai-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 31.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for tardigrade_ai-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5ef5d09e854802a42ff7fbc5d4e4a6926ccb6fe700aea96925508f97f6466118
MD5 50a25d12b81605a141f3e718fb570818
BLAKE2b-256 21433da4ff5e62e3e469081a84b5bdcbb1df54d4d1524f5327535d82112c0171

See more details on using hashes here.

Provenance

The following attestation bundles were made for tardigrade_ai-0.1.0-py3-none-any.whl:

Publisher: ci.yml on Cole-Godfrey/tardigrade

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page