Skip to main content

Local-first, audit-grade stability/guard library for AI agents (with optional robotics extras)

Project description

Ultrastable

Ultrastable is a local-first, audit-grade guard layer for AI agents, designed for agent runtime governance. It keeps agents within viable boundaries by monitoring essential variables, running deterministic detectors, and applying typed interventions when tokens, spend, context pressure, or retries become risky. The implementation stays focused on guarding runtime behavior, enforcing viability, and producing finance-grade evidence — all while keeping the core NumPy-only and offline-first.

Agent Runtime Governance

  • Essential variables: spend_usd, tokens_total, retries, tool_calls, D(H)
  • Deterministic detectors: lexical repeat, tool loop, context pressure, budget breach
  • Typed interventions: RESET_REPLAN, RESET_CONTEXT_TRIM, BUDGET_HARD_STOP
  • Evidence & replay: tamper-evident JSONL ledgers, hash-chain validation, deterministic replay

Why Ultrastable?

Pipeline Coverage Docs PyPI

CI matrix: Ubuntu 22.04 (Python 3.10) and Ubuntu 24.04 (Python 3.12).

  • Guard runtime behavior: detect lexical repeats, runaway tool loops, or escalating costs and respond with deterministic interventions.
  • Enforce viability: encode budgets/caps as PolicyPacks, hash them, and log every Run/Step/Trigger with schema v1.2 RunEvents that include policy_hash.
  • Finance-grade evidence: append-only JSONL ledgers with per-entry event_hash/prev_hash hash chains plus CLI spend/unit-econ/zombie reports for CFO-ready attribution.
  • Offline-first + minimal core: ultrastable.core depends only on NumPy/stdlib; CLI/exporters/robotics live behind extras.
  • Dual-domain: the same primitives stabilize agent loops (tokens/$) and robotics/control loops (battery, torque, temperature).

Installation & Quick Check

User install (PyPI):

pip install ultrastable

Extras (optional):

# CLI + reports
pip install "ultrastable[cli]"
# Full agent/cortex toolchain (CLI + OTLP telemetry)
pip install "ultrastable[cortex]"
# Robotics demos
pip install "ultrastable[robotics]"

Development install:

pip install -e .[dev]

To mirror the dependency set used in CI/experiments (core + extras):

pip install -r requirements.txt

Quick sanity check:

python -c "import ultrastable, ultrastable.core; print(ultrastable.core.ping())"

Run the automated experiment suite (writes ledgers/reports under runs/experiments):

python scripts/run_ultrastable_experiments.py --output-dir runs/experiments --keep-artifacts

Need a fast CI sanity check? Use the curated experiments.json smoke plan:

python scripts/run_ultrastable_experiments.py --plan experiments.json --output-dir runs/smoke --keep-artifacts

Want to exercise every major feature (agent demos, reports, exports, robotics examples) in under 45 minutes? Use the full-suite runner:

python scripts/run_ultrastable_full_suite.py --output-dir runs/full-suite --keep-artifacts

Need the AFMB failure-mode smoke suite to run offline in under 10 minutes (no paid APIs)?

ultrastable benchmark run \
  --config benchmarks/afmb_baselines.json \
  --output-dir runs/afmb-suite --seed 123

Or use the convenience wrapper script (delegates to the packaged runner): python scripts/run_afmb_suite.py --output-dir runs/afmb-suite.

Install Matrix

Goal Command Extras pulled in Notes
Core library / embeddable guards pip install ultrastable NumPy only Minimal footprint; tests/test_imports.py ensures no optional deps are loaded.
CLI + demos + reports pip install "ultrastable[cli]" typer, rich Enables the Typer CLI with colorized output + interactive prompts.
AI Cortex / FinOps guardrails pip install "ultrastable[cortex]" pydantic, httpx, rich, typer, opentelemetry-* Full AgentGuard stack including report/export tooling and telemetry hooks.
Robotics demos + DriveWrapper pip install "ultrastable[robotics]" gymnasium, torch Heavier extra; only required for DriveReward/DriveWrapper/MobileHomeostat2D.

Extras can be combined (e.g., pip install "ultrastable[cortex,robotics]"). Development installs still use pip install -e .[dev] for lint/test tooling.

60-second LangChain integration

Guard an existing LangChain or LangGraph workflow without rewriting it—install the connector and attach a callback:

pip install ultrastable ultrastable-langchain langchain-openai

Set your model provider credentials (e.g., OPENAI_API_KEY) and drop this snippet into your chain runner:

from langchain_openai import ChatOpenAI
from ultrastable.agent import AgentGuard
from ultrastable.cli.demos import build_agent_loop_controller
from ultrastable.ledger import JsonlLedger
from ultrastable_langchain import (
    UltrastableCallbackHandler,
    llm_run_to_guard_step,
    pre_step_context_from_prompts,
)

ledger = JsonlLedger("runs/langchain_demo.jsonl", redaction="metadata-only")
guard = AgentGuard(
    controller=build_agent_loop_controller(),
    ledger=ledger,
    context_budget_chars=2000,
)
handler = UltrastableCallbackHandler()
llm = ChatOpenAI(model="gpt-4o-mini", callbacks=[handler], tags=["support-desk"])

guard.start_run(run_id="support-demo", tags={"agent": "tickets"})
llm.invoke("Draft a 1 sentence welcome.")
context = pre_step_context_from_prompts(handler.llm_runs[-1].prompts, base_id="welcome")
if context:
    guard.pre_step(context)
guard_step = llm_run_to_guard_step(handler.llm_runs[-1])
guard.post_step(
    step_id=guard_step.result.step_id,
    role=guard_step.result.role,
    kind=guard_step.result.kind,
    model=guard_step.result.model,
    prompt_text=guard_step.result.prompt_text,
    response_text=guard_step.result.response_text,
    metrics=guard_step.metrics,
    tags={"tenant": "acme-co", **(guard_step.result.tags or {})},
)
guard.end_run()
ledger.close()

UltrastableCallbackHandler records LLM/tool spans, the bridge helpers convert them into AgentGuard payloads, and JsonlLedger writes a tamper-evident log under runs/. Iterate over handler.tool_runs with tool_run_to_guard_step to emit tool calls (with deterministic tool_args_hash) so ToolLoopDetector and spend reports see the full interaction history.

Documentation

Baseline harness helpers live under the optional ultrastable.benchmark namespace and are only imported when explicitly requested, so importing the core ultrastable package never pulls in baseline/batch experiment code.

CLI Highlights

# Validate and hash a PolicyPack
ultrastable validate policy configs/guard.json

# Tamper-evident ledger validation (hash chain)
ultrastable ledger validate runs/agent_loop.jsonl --hash-chain

# Run demos (built-in or PolicyPack-backed)
ultrastable demo agent-loop --output runs/agent_loop.jsonl --policy-pack configs/guard.json
ultrastable demo budget-cap --output runs/budget_cap.jsonl

# Replay ledgers deterministically
ultrastable replay runs/agent_loop.jsonl --policy-pack configs/guard.json --deterministic

# Export deterministic routing + budget configs
ultrastable export routing-policy runs/agent_loop.jsonl --output configs/routing.json
ultrastable export budget-policy runs/agent_loop.jsonl --output configs/budget_policy.json

# Finance-grade reports
ultrastable report spend runs/agent_loop.jsonl --by customer --filter agent=loop
ultrastable report unit-econ runs/agent_loop.jsonl --metric cost_per_task --task-map mappings/tasks.json
ultrastable report zombie runs/agent_loop.jsonl --start-time 2025-06-01T00:00:00Z --end-time 2025-06-07T23:59:59Z

Spend reports now emit a health block that surfaces the latest D(H) trend (start/end/min/max plus median and p90) alongside an intervention_effect_size summary of the recorded ΔD(H) deltas. Use it to see whether interventions are actually reducing the health distance—even when you are only skimming the CLI output. When you pass --format text, the CLI prints a short D(H) trend plus ΔD(H) summary line so on-call engineers can confirm recovery direction at a glance without scrolling through JSON.

ultrastable ledger validate also supports --hash-chain to recompute every event_hash and prove the log has not been modified—any mid-chain modification, reordering, or insertion yields a prev_hash mismatch; omit the flag for a quick JSON/schema sanity check. The legacy ultrastable validate ledger remains as an alias for existing scripts. Ledgers written prior to the hash-chain rollout still pass this command—the CLI simply reports that the hash chain is absent so older archives remain readable. ultrastable replay performs the same hash-chain verification (when ledgers carry event_hash/prev_hash pairs) before running controllers and includes a hash_chain block in the emitted report metadata so audits can prove the replay consumed unmodified evidence; legacy ledgers that predate hash chaining are noted as such instead of failing.

inspect and all report subcommands support --format json|text plus --output FILE to control machine-readable artifacts. The inspect summary highlights steps.tool_calls and interventions.outcomes so tool loops and their intervention results are visible without opening the raw JSONL.

PolicyPacks must be JSON; YAML parsing has been removed to keep the core dependency-free.

CLI demos/dashboards honor --redaction metadata-only|selective-text|full-text|none (with none acting as a convenience alias for full-text). Use --redaction none only when you explicitly want prompt/response bodies persisted to disk; the default metadata-only mode hashes/redacts those fields. selective-text keeps error strings for debugging while still hashing prompts/responses, and full-text (none) preserves every raw body for short-lived investigations—treat ledgers produced in that mode as sensitive. See docs/privacy.md for the full tradeoff matrix.

Observability (Grafana/OTEL) Quick Start

Ultrastable emits spans/metrics via OTLP/HTTP when telemetry is enabled. For a fast local check:

  1. Start a local OTEL Collector (HTTP receiver → debug exporter):
docker run --rm -p 4318:4318 \
  -v "$(pwd)/collector-minimal.yaml:/etc/otelcol/config.yaml:ro" \
  otel/opentelemetry-collector:latest --config /etc/otelcol/config.yaml
  1. Point Ultrastable to it and send telemetry:
export ULTRASTABLE_OTLP_ENDPOINT=http://localhost:4318
python3 scripts/run_integrations_smoke.py --otel auto
  1. Import the dashboard template:
  • Open Grafana and import docs/grafana_dashboard.json (see docs/grafana_dashboard.md).
  • For hosted setups (Grafana Cloud), configure your collector to export traces via otlphttp and metrics via prometheusremotewrite using your Cloud endpoints/tokens.

Examples

Run any example with python examples/<name>.py:

  • agent_loop_offline.py, budget_cap.py, tool_loop.py, context_pressure.py
  • battery_agent.py — trust-battery agent that halts when feedback is poor
  • robotics_drive_demo.py, mobile_homeostat_demo.py (writes runs/mobile_homeostat_trajectory.svg), dipaolo_replication.py
  • replay_policy_change.py

Robotics (optional)

Robotics/control primitives live behind the robotics extra and are not part of the core agent governance path. See docs under docs/robotics/* and the examples above for optional demos.

Benchmark Quickstart (AFMB)

Run the offline AFMB smoke suite and emit artifacts in one go using the CLI:

ultrastable benchmark run \
  --config benchmarks/afmb_baselines.json \
  --output-dir runs/afmb --seed 123 --emit-kpis

Validate artifacts:

ultrastable benchmark validate config benchmarks/afmb_baselines.json
ultrastable benchmark validate manifest runs/afmb/manifest.json
ultrastable benchmark validate results runs/afmb/results.json
ultrastable benchmark validate cases benchmarks/afmb_cases.json

Artifacts in runs/afmb/ include manifest.json, results.json with a kpis block, report.md, a ledgers/ directory per case, and a combined ledger for aggregation.

See the case registry at benchmarks/afmb_cases.json and the packaged runner in ultrastable/benchmark/afmb_suite.py for scenario definitions and programmatic access.

License & Contributions

Ultrastable is released under the MIT License (see LICENSE). Contributions are welcome—see CONTRIBUTING.md for branching/testing/style guidelines before opening a merge request.

Acknowledgement: A Note on the Name

The name “Ultrastable” is a small salute to W. Ross Ashby and his work on ultrastability and homeostatic systems (e.g., An Introduction to Cybernetics; Design for a Brain - The origin of adaptive behaviour). Ideas such as essential variables, viability, and homeostasis inform the way we model agent health and design controllers in this library.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ultrastable-0.4.1.tar.gz (157.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ultrastable-0.4.1-py3-none-any.whl (134.9 kB view details)

Uploaded Python 3

File details

Details for the file ultrastable-0.4.1.tar.gz.

File metadata

  • Download URL: ultrastable-0.4.1.tar.gz
  • Upload date:
  • Size: 157.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for ultrastable-0.4.1.tar.gz
Algorithm Hash digest
SHA256 e988dbee7b986d3c843b814de2ee6f19ef1c3de8102cef932831a9732b8c316a
MD5 ea29d7a81bbf65f2f3f67b1092da507b
BLAKE2b-256 61c9ccde7234233f7afc5f5e18ec306b7e3f63e1d47d1da98c319d054c914fe1

See more details on using hashes here.

File details

Details for the file ultrastable-0.4.1-py3-none-any.whl.

File metadata

  • Download URL: ultrastable-0.4.1-py3-none-any.whl
  • Upload date:
  • Size: 134.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for ultrastable-0.4.1-py3-none-any.whl
Algorithm Hash digest
SHA256 ef42d19d82f3ecc7b1f2534ba040cdc61305b3549e63e64c2f140159b7c73965
MD5 d1cdf8141dec2b55daa5e2f91847c33e
BLAKE2b-256 0e7254c6a17233d747f0822e26e64f9c52b3e778d6ce016b9bf22aeb0e3594e5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page