Skip to main content

Adaptive Immune Runtime for Agent Systems — population-level failure prevention for AI agents

Project description

AIRAS — Adaptive Immune Runtime for Agent Systems

A runtime layer that observes failure patterns across populations of AI agent executions, learns abstract failure signatures, and preemptively prevents those failures from recurring.

The Problem

AI agents in production fail in predictable, repeating patterns:

  • 5 agents × 90% reliability = 59% system reliability (multiplicative collapse)
  • 88% of enterprise agent pilots never reach production — reliability is the #1 blocker
  • $340K average cost of a failed agent project
  • Root causes appear at step 3, effects show at step 7 — no tool traces this causally
  • Current observability tools tell you what went wrong. Nothing prevents it from happening again.

The Solution

AIRAS learns from agent failures at the population level and prevents them preemptively:

  1. Observe — Collect execution traces from all agent runs
  2. Extract — Find where failed traces diverge from successful ones
  3. Abstract — Generalize specific failures into matchable patterns
  4. Intervene — Generate targeted prompt-level fixes for each pattern class
  5. Test — Validate interventions via replay before deploying
  6. Evolve — Continuously improve interventions via LLM-generated mutations
  7. Prevent — Match new executions against known patterns BEFORE failure manifests

Validated Results

Tested on 250 real SWE-bench agent trajectories:

Metric Result Target
Prevention Rate 52% ≥50% ✅
False Positive Rate 0% <5% ✅
Pattern Coverage 96%
Patterns Discovered 7 ≥5 ✅
Matching Latency 15.5ms <10ms ✅ (with Qdrant)

v2 target: 85% prevention rate via LLM judge + self-improving interventions + cross-domain transfer.

How It Works

1. Agent executes step 7 of a task
2. SDK sends partial trace (steps 1-7) to AIRAS
3. AIRAS extracts behavioral signals (errors, loops, step ratio)
4. Searches vector index for matching failure patterns (<5ms)
5. Contextual bandit selects the best intervention variant for this context
6. SDK injects intervention into agent's next prompt
7. Agent catches the error BEFORE making it
8. Outcome feeds back into efficacy tracking → interventions improve over time

Architecture

┌─────────────┐     ┌──────────────────────────────────┐     ┌─────────┐
│ Agent (SDK) │────▶│         AIRAS API                 │────▶│ Qdrant  │
└─────────────┘     │                                   │     └─────────┘
                    │  /v1/check  → Matching Engine      │
                    │  /v1/traces → Queue for extraction │────▶│ Redis   │
                    │  /v2/predict → Predictive Immunity │     └─────────┘
                    └──────────────┬────────────────────┘
                                   │
                    ┌──────────────▼──────────────────────┐
                    │          WORKERS (async)             │
                    │                                      │
                    │  Extractor: traces → antigens        │───▶│Postgres│
                    │  Judge: LLM evaluates hard cases     │    └────────┘
                    │  Evolution: mutates interventions    │
                    └─────────────────────────────────────┘

Quick Start

Install

uv add airas-sdk

Or with pip:

pip install airas-sdk

For the full server stack (Qdrant, Redis, Postgres):

uv add "airas-sdk[server]"

Run the Killer Experiment (no infrastructure needed)

git clone https://github.com/yash1511-bogam/airas.git
cd airas
uv venv .venv && source .venv/bin/activate
uv pip install -e ".[experiment]"
python -m airas.experiment.runner

Deploy Production System

docker compose up -d
python -m airas.scripts.seed_antigens
curl http://localhost:8100/health

SDK Integration (3 lines)

from airas.sdk import AIRASClient, airas_middleware

client = AIRASClient(base_url="http://localhost:8100")
graph = airas_middleware(my_langgraph_app, client)  # done

API Endpoints

Endpoint Method Purpose Latency
/v1/check POST Real-time immunity check <50ms
/v1/traces POST Ingest completed trace <100ms
/v1/antigens GET List discovered patterns <200ms
/v1/stats GET Dashboard metrics <200ms
/v2/predict POST Predict failures from task description <100ms
/v2/domains GET List domain adapters <50ms
/v2/evolution/stats GET Intervention evolution metrics <100ms
/health GET Liveness <10ms

Project Structure

airas/
├── docker-compose.yml
├── Dockerfile
├── pyproject.toml
├── README.md
└── airas/
    ├── models.py                       # Core Pydantic models (experiment)
    ├── models/
    │   ├── domain.py                   # Production domain models
    │   └── api.py                      # Request/response schemas
    ├── api/
    │   ├── main.py                     # FastAPI app + v1 routes
    │   └── routes_v2.py                # v2 routes (predict, domains, evolution)
    ├── core/
    │   ├── matching.py                 # Qdrant-backed real-time matching
    │   ├── tolerance.py                # Over-correction prevention
    │   ├── judge.py                    # Hybrid heuristic + LLM judge
    │   ├── bandit.py                   # Contextual Thompson Sampling
    │   └── predictor.py                # Task → failure class prediction
    ├── storage/
    │   ├── qdrant.py                   # Vector DB layer
    │   └── redis_store.py              # Cache + streams
    ├── extraction/                     # Pattern extraction pipeline
    │   ├── alignment.py                # Structural divergence detection
    │   ├── classifier.py               # Rule-based failure classification
    │   ├── abstraction.py              # Pattern abstraction
    │   └── clustering.py               # HDBSCAN clustering
    ├── intervention/
    │   └── templates.py                # Intervention template bank
    ├── replay/
    │   └── engine.py                   # Heuristic effectiveness judge
    ├── evolution/
    │   └── __init__.py                 # LLM mutation engine + evolution lifecycle
    ├── domains/
    │   ├── __init__.py                 # Universal adapter framework (4 domains)
    │   └── templates.py                # Domain-specific intervention templates
    ├── experiment/
    │   └── runner.py                   # Killer experiment orchestrator
    ├── sdk/
    │   ├── __init__.py                 # Package exports
    │   ├── client.py                   # Async HTTP client (graceful degradation)
    │   └── langgraph.py                # LangGraph middleware
    ├── workers/
    │   ├── main.py                     # Trace extraction worker
    │   └── evolution_worker.py         # Intervention mutation + promotion worker
    └── scripts/
        └── seed_antigens.py            # Load validated patterns into Qdrant

Tech Stack

Component Technology Why
API FastAPI Async, <50ms p99
Vector DB Qdrant HNSW <5ms at 100K vectors, payload filtering
State PostgreSQL Audit trail, efficacy tracking
Cache/Queue Redis Hot cache, stream-based async processing
Embeddings all-MiniLM-L6-v2 384-dim, local, free, fast
Clustering HDBSCAN Finds natural clusters without specifying K
LLM (judge/mutations) DeepSeek V4-Flash $0.14/M input — entire learning loop costs $1.47/month
SDK httpx (async) Non-blocking, 500ms timeout, graceful degradation

Key Design Decisions

  1. No LLM calls in the hot path — real-time matching is deterministic (<50ms)
  2. LLM only in async learning loop — judge + mutations run in background workers
  3. Graceful degradation — if AIRAS is down, agents run normally (SDK never blocks)
  4. Two-phase matching — embedding similarity (Qdrant) + condition predicates (code)
  5. Behavioral signals only — runtime matching uses errors, loops, step anomalies (not outcome)
  6. Contextual bandits — different intervention variants for different execution contexts
  7. Self-improving — interventions evolve via LLM-generated mutations + A/B testing
  8. Cross-domain — patterns learned in coding agents transfer to support/research/data agents
  9. Tolerance layer — max 3 interventions per trace, cooldown, auto-disable low-efficacy
  10. Population-level learning — every failure makes the system smarter for ALL users

Supported Domains

Domain Adapter Example Tools
Coding CodingAdapter edit, search, run_test, submit
Customer Support SupportAdapter search_kb, draft_reply, escalate, close_ticket
Research ResearchAdapter web_search, read_paper, synthesize, verify_claim
Data Pipeline DataPipelineAdapter query_db, transform, validate, write_table

Adding a new domain: implement DomainAdapter.map_action() (maps domain tools → 6 universal types). All existing patterns transfer immediately.

The Moat

The failure pattern database is the product. Every deployment adds patterns. More users → more patterns → better prevention → more users. The self-improvement loop means interventions get better every day without human effort.

License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

airas_sdk-0.2.0.tar.gz (44.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

airas_sdk-0.2.0-py3-none-any.whl (51.2 kB view details)

Uploaded Python 3

File details

Details for the file airas_sdk-0.2.0.tar.gz.

File metadata

  • Download URL: airas_sdk-0.2.0.tar.gz
  • Upload date:
  • Size: 44.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.13 {"installer":{"name":"uv","version":"0.11.13","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for airas_sdk-0.2.0.tar.gz
Algorithm Hash digest
SHA256 7a209474d2a090b3b088d625804686c4e821e0bb7bed6aae96c3076cb532f728
MD5 5ebd7e97db0e63009241023a74bd5c14
BLAKE2b-256 e18200e0f8c3fcd6ab2ae2061fd31b0dd8cb11d96d9441b3a5e4c13b05f11485

See more details on using hashes here.

File details

Details for the file airas_sdk-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: airas_sdk-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 51.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.13 {"installer":{"name":"uv","version":"0.11.13","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for airas_sdk-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 39a9e308088c7b71570fbe9bacf712b94e54f4caed247ec3fd003e2ed99daf0f
MD5 6fc3e0b87cc0f59c4567660dea53fdb0
BLAKE2b-256 95b71c95a1344b104e40e10346fbebcefedb8afbee3f54851565f716da43378e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page