Skip to main content

Adaptive Runtime Layer for Stateful AI Systems

Project description

Adaptive Runtime

Runtime Intelligence Layer for Stateful AI Systems


Not a chatbot framework. Not an LLM wrapper. Not a workflow builder.

An adaptive runtime intelligence layer โ€” the missing piece between your AI logic and production reality.


The Problem

Most AI frameworks solve the model problem.
Nobody solves the runtime problem.

Your AI agent in development:   Works perfectly.
Your AI agent in production:    Crashes. Forgets state. Retries blindly. Dies silently.

Production AI systems fail because of:

  • ๐Ÿ’ฅ No crash recovery โ€” state lost on restart
  • ๐Ÿง  No memory โ€” agent forgets context between sessions
  • ๐Ÿ” Retry chaos โ€” blind retries with no back-off
  • ๐Ÿ“‰ No confidence scoring โ€” decisions made without certainty
  • ๐ŸŒŠ No contextual awareness โ€” can't adapt to changing conditions

Adaptive Runtime fixes this.


See It Running

Adaptive Runtime Demo

[16:08:13][RUNTIME]          Event received: service_overload
[16:08:13][CONTEXT_ENGINE]   risk=high  stability=low  pressure=0.65
[16:08:13][CONFIDENCE_ENGINE] confidence=0.84
[16:08:13][DECISION_ENGINE]  ACTION: RESTART_SERVICE
[16:08:13][STATE_ENGINE]     State persisted
[16:08:13][RECOVERY_ENGINE]  Checkpoint #3 created

  โ†’ restart_service  [high]  conf=0.840

[16:08:14][RUNTIME]          Event received: anomaly_detected
[16:08:14][CONTEXT_ENGINE]   risk=low   stability=stable  pressure=0.32
[16:08:14][CONFIDENCE_ENGINE] confidence=0.62
[16:08:14][DECISION_ENGINE]  ACTION: FLAG_FOR_REVIEW
[16:08:14][STATE_ENGINE]     State persisted

  โ†’ flag_for_review  [low]   conf=0.620

The runtime thinks, decides, remembers, and recovers โ€” automatically.


How It Works

Event (CPU spike, anomaly, timeout, auth failure...)
  โ”‚
  โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  Context Engine โ”‚  โ†’ Analyzes conditions: risk, stability, pressure score
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
         โ”‚
         โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  Confidence Engine   โ”‚  โ†’ Calculates adaptive confidence (with decay + history)
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
         โ”‚
         โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  Decision Engine โ”‚  โ†’ Selects action: restart / throttle / rollback / recover...
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
         โ”‚
         โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚   State Engine   โ”‚  โ†’ Persists state to SQLite (survives crashes)
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
         โ”‚
         โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚   Recovery Engine    โ”‚  โ†’ Creates checkpoint, handles retry with back-off
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Quick Start

pip install adaptive-runtime
import asyncio
from adaptive_runtime import Runtime

async def main():
    runtime = Runtime(agent_id="my-agent")
    await runtime.start()

    result = await runtime.process({
        "type": "service_overload",
        "severity": 0.82,
        "cpu": 94,
        "memory": 88,
    })

    print(result.action)      # "restart_service"
    print(result.confidence)  # 0.7831
    print(result.reason)      # "high_resource_pressure"
    print(result.priority)    # "high"

    await runtime.stop()

asyncio.run(main())

That's it. No API keys. No cloud setup. No GPU. Runs on a $5 VPS.


Killer Example: Adaptive Monitoring System

import asyncio
from adaptive_runtime import Runtime

async def monitor():
    runtime = Runtime(agent_id="prod-monitor", checkpoint_every=5)

    # Subscribe to critical events
    @runtime.bus.subscribe("anomaly_detected")
    async def on_anomaly(event):
        print(f"  โš  Anomaly handler fired โ€” severity={event['severity']}")

    await runtime.start()

    # Simulate real production events
    events = [
        {"type": "service_overload", "severity": 0.91, "cpu": 96, "memory": 92},
        {"type": "anomaly_detected",  "severity": 0.74, "error_rate": 0.6},
        {"type": "auth_failure",      "severity": 0.55},
        {"type": "timeout",           "severity": 0.45, "latency_ms": 4200},
        {"type": "recovery_needed",   "severity": 0.30},
    ]

    for event in events:
        result = await runtime.process(event)
        print(f"  [{result.priority.upper()}] {event['type']:25s} โ†’ {result.action}")

    # Runtime remembers everything
    history = await runtime.event_history(limit=5)
    print(f"\n  Last {len(history)} events remembered across sessions.")

    await runtime.stop()

asyncio.run(monitor())

Output:

  [HIGH]    service_overload          โ†’ scale_up_immediate
  [NORMAL]  anomaly_detected          โ†’ flag_for_review
  โš  Anomaly handler fired โ€” severity=0.74
  [NORMAL]  auth_failure              โ†’ trigger_security_audit
  [LOW]     timeout                   โ†’ cache_warmup
  [LOW]     recovery_needed           โ†’ run_recovery

  Last 5 events remembered across sessions.

Why Not LangChain?

This question will come up. Here's the honest answer:

LangChain / AutoGen Adaptive Runtime
Purpose LLM orchestration Runtime behavior
Core abstraction Prompt chains Stateful events
Intelligence Language model Probabilistic engine
Dependencies Heavy (openai, tiktoken, ...) Minimal (pydantic, aiosqlite)
GPU required Sometimes Never
Crash recovery โŒ โœ… Built-in
State persistence External setup โœ… Built-in SQLite
Confidence scoring โŒ โœ… Adaptive
Runs on $5 VPS Barely โœ… Designed for it
Use case Chat, RAG, agents Runtime resilience

TL;DR: LangChain makes LLMs useful. Adaptive Runtime makes AI systems reliable.
They solve different problems. Use both, or use this standalone.


Runtime Philosophy

Most AI problems in production are not model problems.
They are runtime problems.

Adaptive Runtime is built around the belief that future AI systems need:

  • Memory โ€” state that survives crashes and restarts
  • Resilience โ€” self-healing with checkpoints and retry logic
  • Contextual behavior โ€” decisions that adapt to real conditions
  • Confidence awareness โ€” knowing how certain a decision is
  • Lightweight cognition โ€” intelligence without neural dependency

Not just prompts. Not just workflows. Runtime intelligence.


The 5 Core Engines

1. State Engine

Persistent agent memory. Survives crashes. SQLite by default.

await state_engine.save_state({"health": "ok", "version": "1.2"})
state = await state_engine.load_state()          # Restored after restart
await state_engine.patch_state({"last": "ok"})   # Partial update

2. Context Engine

Transforms raw signals into contextual understanding โ€” no ML needed.

ctx = context_engine.analyze({
    "type": "service_overload", "cpu": 94, "memory": 88, "severity": 0.82
})
# โ†’ risk="high", stability="low", context="resource_pressure", pressure=0.65

3. Confidence Engine

Adaptive probabilistic scoring with historical weighting and decay.

conf = confidence_engine.calculate(event, context_risk="high")
# โ†’ conf.final = 0.7831  (lower when risk is high, adapts from history)

confidence_engine.record_outcome(success=True, confidence=0.78, context_risk="high")

4. Decision Engine

Explainable rule-based action selection. Extensible with custom rules.

decision = decision_engine.decide(event, "resource_pressure", "high", 0.78)
# โ†’ action="restart_service", reason="high_resource_pressure", priority="high"

# Add your own rules:
custom_rules = [("my_context", "high", 0.70, "my_action", "my_reason")]
engine = DecisionEngine(custom_rules=custom_rules)

5. Recovery Engine

Crash recovery, checkpoint snapshots, exponential back-off retry.

await recovery_engine.create_checkpoint(state)    # Save checkpoint
state = await recovery_engine.restore_latest()    # Restore after crash
result = await recovery_engine.retry(fn, fallback=fallback_fn)  # Retry with back-off

Designed for Constrained Environments

โœ… Raspberry Pi
โœ… $5 VPS (512MB RAM)  
โœ… Old laptop
โœ… Edge devices
โœ… Offline / air-gapped systems
โœ… Serverless (cold start friendly)

No GPU. No cloud lock-in. No heavy ML frameworks.
Just Python + asyncio + SQLite.


Project Structure

adaptive_runtime/
โ”‚
โ”œโ”€โ”€ core/
โ”‚   โ”œโ”€โ”€ state_engine.py       # State persistence and memory
โ”‚   โ”œโ”€โ”€ context_engine.py     # Event โ†’ contextual classification
โ”‚   โ”œโ”€โ”€ confidence_engine.py  # Adaptive probabilistic confidence
โ”‚   โ”œโ”€โ”€ decision_engine.py    # Rule-based action selection
โ”‚   โ””โ”€โ”€ recovery_engine.py    # Crash recovery + retry orchestration
โ”‚
โ”œโ”€โ”€ runtime/
โ”‚   โ”œโ”€โ”€ runtime_manager.py    # Main orchestrator (Runtime class)
โ”‚   โ”œโ”€โ”€ event_bus.py          # Async pub/sub event bus
โ”‚   โ””โ”€โ”€ cache.py              # TTL-based in-memory cache
โ”‚
โ”œโ”€โ”€ storage/
โ”‚   โ”œโ”€โ”€ sqlite_store.py       # Async SQLite persistence
โ”‚   โ””โ”€โ”€ memory_store.py       # In-process ephemeral store (testing)
โ”‚
โ”œโ”€โ”€ observability/
โ”‚   โ”œโ”€โ”€ logger.py             # Structured color logger
โ”‚   โ””โ”€โ”€ metrics.py            # Lightweight in-memory metrics
โ”‚
โ”œโ”€โ”€ examples/
โ”‚   โ”œโ”€โ”€ agent_demo.py         # Basic event processing
โ”‚   โ”œโ”€โ”€ monitoring_demo.py    # Continuous monitoring + event bus
โ”‚   โ””โ”€โ”€ automation_demo.py    # Retry + crash recovery
โ”‚
โ””โ”€โ”€ tests/
    โ””โ”€โ”€ test_engines.py       # 12 unit tests โ€” all engines

Run the Examples

# Clone
git clone https://github.com/stateflow-dev/adaptive-runtime.git
cd adaptive-runtime

# Install
pip install pydantic aiosqlite

# Run demos
python examples/agent_demo.py
python examples/monitoring_demo.py
python examples/automation_demo.py

# Run tests
pip install pytest pytest-asyncio
pytest tests/ -v
# โ†’ 12 passed

Roadmap

Feature Status
โœ… 5 Core Engines Tier 1 โ€” Released
โœ… SQLite + Memory store Tier 1 โ€” Released
โœ… Async event bus Tier 1 โ€” Released
โœ… Retry + crash recovery Tier 1 โ€” Released
๐Ÿ”œ REST API adapter (FastAPI) Tier 2
๐Ÿ”œ Multi-agent orchestration Tier 2
๐Ÿ”œ Plugin system Tier 2
๐Ÿ”œ Real-time dashboard Tier 2
๐Ÿ”œ Distributed runtime Tier 3

Benchmarks

Measured on a mid-range Windows laptop (Python 3.10, SQLite, no GPU).

Metric Result
Cold start 446 ms
Idle memory 29 MB
CPU idle usage <0%
SQLite save latency 36.5 ms avg (n=50)
SQLite load latency 2.7 ms avg (n=50)
Event processing 109.2 ms avg (n=50)
GPU required โŒ Never

Runs comfortably on a $5 VPS (512MB RAM). No GPU. No cloud lock-in.


Contributing

Issues and PRs welcome. Please open an issue first for major changes.


License

MIT ยฉ Stateflow Labs


"The biggest AI problems in production are not model problems.
They are runtime problems."

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

adaptive_runtime-0.1.2.tar.gz (24.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

adaptive_runtime-0.1.2-py3-none-any.whl (23.6 kB view details)

Uploaded Python 3

File details

Details for the file adaptive_runtime-0.1.2.tar.gz.

File metadata

  • Download URL: adaptive_runtime-0.1.2.tar.gz
  • Upload date:
  • Size: 24.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.11

File hashes

Hashes for adaptive_runtime-0.1.2.tar.gz
Algorithm Hash digest
SHA256 922e8394af7a081f3ae6b97420c9170ee2926866ad45b331fcc06bcba24a9755
MD5 ade3802f3c8d58bbed21a71bf7f84750
BLAKE2b-256 d8da25d087f2f918aee74b4fa921a76acab3fbb05f0a670cda3eae3168a12a88

See more details on using hashes here.

File details

Details for the file adaptive_runtime-0.1.2-py3-none-any.whl.

File metadata

File hashes

Hashes for adaptive_runtime-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 5c32cd262504d92cef2015940260816c4a1f0b13faa2ec7122ba0e0b9058f7e8
MD5 577bbdb2d2c4440b7e8fcec975c5a60d
BLAKE2b-256 1309f44a3111371151f0a1abedc6b094e1c8bbc5ecc37fbec40ec4971c7486aa

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page