Adaptive Immune Runtime for Agent Systems — population-level failure prevention for AI agents
Project description
AIRAS — Adaptive Immune Runtime for Agent Systems
A runtime layer that observes failure patterns across populations of AI agent executions, learns abstract failure signatures, and preemptively prevents those failures from recurring.
The Problem
AI agents in production fail in predictable, repeating patterns:
- 5 agents × 90% reliability = 59% system reliability (multiplicative collapse)
- 88% of enterprise agent pilots never reach production — reliability is the #1 blocker
- $340K average cost of a failed agent project
- Root causes appear at step 3, effects show at step 7 — no tool traces this causally
- Current observability tools tell you what went wrong. Nothing prevents it from happening again.
The Solution
AIRAS learns from agent failures at the population level and prevents them preemptively:
- Observe — Collect execution traces from all agent runs
- Extract — Find where failed traces diverge from successful ones
- Abstract — Generalize specific failures into matchable patterns
- Intervene — Generate targeted prompt-level fixes for each pattern class
- Test — Validate interventions via replay before deploying
- Evolve — Continuously improve interventions via LLM-generated mutations
- Prevent — Match new executions against known patterns BEFORE failure manifests
Validated Results
Tested on 250 real SWE-bench agent trajectories:
| Metric | Result | Target |
|---|---|---|
| Prevention Rate | 52% | ≥50% ✅ |
| False Positive Rate | 0% | <5% ✅ |
| Pattern Coverage | 96% | — |
| Patterns Discovered | 7 | ≥5 ✅ |
| Matching Latency | 15.5ms | <10ms ✅ (with Qdrant) |
v2 target: 85% prevention rate via LLM judge + self-improving interventions + cross-domain transfer.
How It Works
1. Agent executes step 7 of a task
2. SDK sends partial trace (steps 1-7) to AIRAS
3. AIRAS extracts behavioral signals (errors, loops, step ratio)
4. Searches vector index for matching failure patterns (<5ms)
5. Contextual bandit selects the best intervention variant for this context
6. SDK injects intervention into agent's next prompt
7. Agent catches the error BEFORE making it
8. Outcome feeds back into efficacy tracking → interventions improve over time
Architecture
┌─────────────┐ ┌──────────────────────────────────┐ ┌─────────┐
│ Agent (SDK) │────▶│ AIRAS API │────▶│ Qdrant │
└─────────────┘ │ │ └─────────┘
│ /v1/check → Matching Engine │
│ /v1/traces → Queue for extraction │────▶│ Redis │
│ /v2/predict → Predictive Immunity │ └─────────┘
└──────────────┬────────────────────┘
│
┌──────────────▼──────────────────────┐
│ WORKERS (async) │
│ │
│ Extractor: traces → antigens │───▶│Postgres│
│ Judge: LLM evaluates hard cases │ └────────┘
│ Evolution: mutates interventions │
└─────────────────────────────────────┘
Quick Start
Install
uv add airas-sdk
Or with pip:
pip install airas-sdk
For the full server stack (Qdrant, Redis, Postgres):
uv add "airas-sdk[server]"
Run the Killer Experiment (no infrastructure needed)
git clone https://github.com/yash1511-bogam/airas.git
cd airas
uv venv .venv && source .venv/bin/activate
uv pip install -e ".[experiment]"
python -m airas.experiment.runner
Deploy Production System
docker compose up -d
python -m airas.scripts.seed_antigens
curl http://localhost:8100/health
SDK Integration (3 lines)
from airas.sdk import AIRASClient, airas_middleware
client = AIRASClient(base_url="http://localhost:8100")
graph = airas_middleware(my_langgraph_app, client) # done
API Endpoints
| Endpoint | Method | Purpose | Latency |
|---|---|---|---|
/v1/check |
POST | Real-time immunity check | <50ms |
/v1/traces |
POST | Ingest completed trace | <100ms |
/v1/antigens |
GET | List discovered patterns | <200ms |
/v1/stats |
GET | Dashboard metrics | <200ms |
/v2/predict |
POST | Predict failures from task description | <100ms |
/v2/domains |
GET | List domain adapters | <50ms |
/v2/evolution/stats |
GET | Intervention evolution metrics | <100ms |
/health |
GET | Liveness | <10ms |
Project Structure
airas/
├── docker-compose.yml
├── Dockerfile
├── pyproject.toml
├── README.md
└── airas/
├── models.py # Core Pydantic models (experiment)
├── models/
│ ├── domain.py # Production domain models
│ └── api.py # Request/response schemas
├── api/
│ ├── main.py # FastAPI app + v1 routes
│ └── routes_v2.py # v2 routes (predict, domains, evolution)
├── core/
│ ├── matching.py # Qdrant-backed real-time matching
│ ├── tolerance.py # Over-correction prevention
│ ├── judge.py # Hybrid heuristic + LLM judge
│ ├── bandit.py # Contextual Thompson Sampling
│ └── predictor.py # Task → failure class prediction
├── storage/
│ ├── qdrant.py # Vector DB layer
│ └── redis_store.py # Cache + streams
├── extraction/ # Pattern extraction pipeline
│ ├── alignment.py # Structural divergence detection
│ ├── classifier.py # Rule-based failure classification
│ ├── abstraction.py # Pattern abstraction
│ └── clustering.py # HDBSCAN clustering
├── intervention/
│ └── templates.py # Intervention template bank
├── replay/
│ └── engine.py # Heuristic effectiveness judge
├── evolution/
│ └── __init__.py # LLM mutation engine + evolution lifecycle
├── domains/
│ ├── __init__.py # Universal adapter framework (4 domains)
│ └── templates.py # Domain-specific intervention templates
├── experiment/
│ └── runner.py # Killer experiment orchestrator
├── sdk/
│ ├── __init__.py # Package exports
│ ├── client.py # Async HTTP client (graceful degradation)
│ └── langgraph.py # LangGraph middleware
├── workers/
│ ├── main.py # Trace extraction worker
│ └── evolution_worker.py # Intervention mutation + promotion worker
└── scripts/
└── seed_antigens.py # Load validated patterns into Qdrant
Tech Stack
| Component | Technology | Why |
|---|---|---|
| API | FastAPI | Async, <50ms p99 |
| Vector DB | Qdrant | HNSW <5ms at 100K vectors, payload filtering |
| State | PostgreSQL | Audit trail, efficacy tracking |
| Cache/Queue | Redis | Hot cache, stream-based async processing |
| Embeddings | all-MiniLM-L6-v2 | 384-dim, local, free, fast |
| Clustering | HDBSCAN | Finds natural clusters without specifying K |
| LLM (judge/mutations) | DeepSeek V4-Flash | $0.14/M input — entire learning loop costs $1.47/month |
| SDK | httpx (async) | Non-blocking, 500ms timeout, graceful degradation |
Key Design Decisions
- No LLM calls in the hot path — real-time matching is deterministic (<50ms)
- LLM only in async learning loop — judge + mutations run in background workers
- Graceful degradation — if AIRAS is down, agents run normally (SDK never blocks)
- Two-phase matching — embedding similarity (Qdrant) + condition predicates (code)
- Behavioral signals only — runtime matching uses errors, loops, step anomalies (not outcome)
- Contextual bandits — different intervention variants for different execution contexts
- Self-improving — interventions evolve via LLM-generated mutations + A/B testing
- Cross-domain — patterns learned in coding agents transfer to support/research/data agents
- Tolerance layer — max 3 interventions per trace, cooldown, auto-disable low-efficacy
- Population-level learning — every failure makes the system smarter for ALL users
Supported Domains
| Domain | Adapter | Example Tools |
|---|---|---|
| Coding | CodingAdapter |
edit, search, run_test, submit |
| Customer Support | SupportAdapter |
search_kb, draft_reply, escalate, close_ticket |
| Research | ResearchAdapter |
web_search, read_paper, synthesize, verify_claim |
| Data Pipeline | DataPipelineAdapter |
query_db, transform, validate, write_table |
Adding a new domain: implement DomainAdapter.map_action() (maps domain tools → 6 universal types). All existing patterns transfer immediately.
The Moat
The failure pattern database is the product. Every deployment adds patterns. More users → more patterns → better prevention → more users. The self-improvement loop means interventions get better every day without human effort.
License
MIT License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file airas_sdk-0.2.0.tar.gz.
File metadata
- Download URL: airas_sdk-0.2.0.tar.gz
- Upload date:
- Size: 44.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.13 {"installer":{"name":"uv","version":"0.11.13","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7a209474d2a090b3b088d625804686c4e821e0bb7bed6aae96c3076cb532f728
|
|
| MD5 |
5ebd7e97db0e63009241023a74bd5c14
|
|
| BLAKE2b-256 |
e18200e0f8c3fcd6ab2ae2061fd31b0dd8cb11d96d9441b3a5e4c13b05f11485
|
File details
Details for the file airas_sdk-0.2.0-py3-none-any.whl.
File metadata
- Download URL: airas_sdk-0.2.0-py3-none-any.whl
- Upload date:
- Size: 51.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.13 {"installer":{"name":"uv","version":"0.11.13","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
39a9e308088c7b71570fbe9bacf712b94e54f4caed247ec3fd003e2ed99daf0f
|
|
| MD5 |
6fc3e0b87cc0f59c4567660dea53fdb0
|
|
| BLAKE2b-256 |
95b71c95a1344b104e40e10346fbebcefedb8afbee3f54851565f716da43378e
|