Skip to main content

Local-first, audit-grade stability/guard library for AI agents (with optional robotics extras)

Project description

Ultrastable

Ultrastable is a local-first, audit-grade guard layer for AI agents, designed for agent runtime governance. It keeps agents within viable boundaries by monitoring essential variables, running deterministic detectors, and applying typed interventions when tokens, spend, context pressure, or retries become risky. The implementation stays focused on guarding runtime behavior, enforcing viability, and producing finance-grade evidence — all while keeping the core NumPy-only and offline-first.

Agent Runtime Governance

  • Essential variables: spend_usd, tokens_total, retries, tool_calls, D(H)
  • Deterministic detectors: lexical repeat, tool loop, context pressure, budget breach
  • Typed interventions: RESET_REPLAN, RESET_CONTEXT_TRIM, BUDGET_HARD_STOP
  • Evidence & replay: tamper-evident JSONL ledgers, hash-chain validation, deterministic replay

Why Ultrastable?

Pipeline Coverage Docs PyPI
See also: Technical note on contracts and compatibility: docs/technical_note.md

CI matrix: Ubuntu 22.04 (Python 3.10) and Ubuntu 24.04 (Python 3.12).

  • Guard runtime behavior: detect lexical repeats, runaway tool loops, or escalating costs and respond with deterministic interventions.
  • Enforce viability: encode budgets/caps as PolicyPacks, hash them, and log every Run/Step/Trigger with schema v1.2 RunEvents that include policy_hash.
  • Finance-grade evidence: append-only JSONL ledgers with per-entry event_hash/prev_hash hash chains plus CLI spend/unit-econ/zombie reports for CFO-ready attribution.
  • Offline-first + minimal core: ultrastable.core depends only on NumPy/stdlib; CLI/exporters/robotics live behind extras.
  • Dual-domain: the same primitives stabilize agent loops (tokens/$) and robotics/control loops (battery, torque, temperature).

Installation & Quick Check

User install (PyPI):

pip install ultrastable

Extras (optional):

# CLI + reports
pip install "ultrastable[cli]"
# Full agent/cortex toolchain (CLI + OTLP telemetry)
pip install "ultrastable[cortex]"
# Robotics demos
pip install "ultrastable[robotics]"

Development install:

pip install -e .[dev]

To mirror the dependency set used in CI/experiments (core + extras):

pip install -r requirements.txt

Quick sanity check:

python -c "import ultrastable, ultrastable.core; print(ultrastable.core.ping())"

Run the automated experiment suite (writes ledgers/reports under runs/experiments):

python scripts/run_ultrastable_experiments.py --output-dir runs/experiments --keep-artifacts

Need a fast CI sanity check? Use the curated experiments.json smoke plan:

python scripts/run_ultrastable_experiments.py --plan experiments.json --output-dir runs/smoke --keep-artifacts

Want to exercise every major feature (agent demos, reports, exports, robotics examples) in under 45 minutes? Use the full-suite runner:

python scripts/run_ultrastable_full_suite.py --output-dir runs/full-suite --keep-artifacts

Need the AFMB failure-mode smoke suite to run offline in under 10 minutes (no paid APIs)?

ultrastable benchmark run \
  --config benchmarks/afmb_baselines.json \
  --output-dir runs/afmb-suite --seed 123

Or use the convenience wrapper script (delegates to the packaged runner): python scripts/run_afmb_suite.py --output-dir runs/afmb-suite.

Install Matrix

Goal Command Extras pulled in Notes
Core library / embeddable guards pip install ultrastable NumPy only Minimal footprint; tests/test_imports.py ensures no optional deps are loaded.
CLI + demos + reports pip install "ultrastable[cli]" typer, rich Enables the Typer CLI with colorized output + interactive prompts.
AI Cortex / FinOps guardrails pip install "ultrastable[cortex]" pydantic, httpx, rich, typer, opentelemetry-* Full AgentGuard stack including report/export tooling and telemetry hooks.
Robotics demos + DriveWrapper pip install "ultrastable[robotics]" gymnasium, torch Heavier extra; only required for DriveReward/DriveWrapper/MobileHomeostat2D.

Extras can be combined (e.g., pip install "ultrastable[cortex,robotics]"). Development installs still use pip install -e .[dev] for lint/test tooling.

Dependency Matrix (integration smoke)

Run the optional dependency-version matrix locally to sanity-check minimum vs. latest versions of key extras. This creates ephemeral, per-scenario virtualenvs under .venv_matrix/ and executes lightweight smoke tests:

python scripts/run_dependency_matrix.py            # core, events, cli, otlp, cortex @ min/latest
python scripts/run_dependency_matrix.py --recreate # drop/recreate venvs
python scripts/run_dependency_matrix.py --include-robotics  # also exercise robotics extra

Minimum versions are pinned in constraints/min-versions.txt and used with pip's -c flag to force resolver decisions toward those minima.

Exporters & Backpressure

Ultrastable includes non-blocking exporters for forwarding events (stdout, JSONL files, HTTP/OTLP). Exporters use a bounded in-memory queue with a hard cap; when the queue is full, the default behavior is to drop the incoming batch and return accepted=False in ExportResult so producers can detect loss. You can opt into backpressure by setting on_full="block" (with an optional block_timeout) on exporter constructors to make export() block until space is available or the timeout elapses.

60-second LangChain integration

Guard an existing LangChain or LangGraph workflow without rewriting it—install the connector and attach a callback:

pip install ultrastable ultrastable-langchain langchain-openai

Set your model provider credentials (e.g., OPENAI_API_KEY) and drop this snippet into your chain runner:

from langchain_openai import ChatOpenAI
from ultrastable.agent import AgentGuard
from ultrastable.cli.demos import build_agent_loop_controller
from ultrastable.ledger import JsonlLedger
from ultrastable_langchain import (
    UltrastableCallbackHandler,
    llm_run_to_guard_step,
    pre_step_context_from_prompts,
)

ledger = JsonlLedger("runs/langchain_demo.jsonl", redaction="metadata-only")
guard = AgentGuard(
    controller=build_agent_loop_controller(),
    ledger=ledger,
    context_budget_chars=2000,
)
handler = UltrastableCallbackHandler()
llm = ChatOpenAI(model="gpt-4o-mini", callbacks=[handler], tags=["support-desk"])

guard.start_run(run_id="support-demo", tags={"agent": "tickets"})
llm.invoke("Draft a 1 sentence welcome.")
context = pre_step_context_from_prompts(handler.llm_runs[-1].prompts, base_id="welcome")
if context:
    guard.pre_step(context)
guard_step = llm_run_to_guard_step(handler.llm_runs[-1])
guard.post_step(
    step_id=guard_step.result.step_id,
    role=guard_step.result.role,
    kind=guard_step.result.kind,
    model=guard_step.result.model,
    prompt_text=guard_step.result.prompt_text,
    response_text=guard_step.result.response_text,
    metrics=guard_step.metrics,
    tags={"tenant": "acme-co", **(guard_step.result.tags or {})},
)
guard.end_run()
ledger.close()

UltrastableCallbackHandler records LLM/tool spans, the bridge helpers convert them into AgentGuard payloads, and JsonlLedger writes a tamper-evident log under runs/. Iterate over handler.tool_runs with tool_run_to_guard_step to emit tool calls (with deterministic tool_args_hash) so ToolLoopDetector and spend reports see the full interaction history.

Documentation

Baseline harness helpers live under the optional ultrastable.benchmark namespace and are only imported when explicitly requested, so importing the core ultrastable package never pulls in baseline/batch experiment code.

CLI Highlights

# Validate and hash a PolicyPack
ultrastable validate policy configs/guard.json

# Tamper-evident ledger validation (hash chain)
ultrastable ledger validate runs/agent_loop.jsonl --hash-chain

# Run demos (built-in or PolicyPack-backed)
ultrastable demo agent-loop --output runs/agent_loop.jsonl --policy-pack configs/guard.json
ultrastable demo budget-cap --output runs/budget_cap.jsonl

# Replay ledgers deterministically
ultrastable replay runs/agent_loop.jsonl --policy-pack configs/guard.json --deterministic

# Export deterministic routing + budget configs
ultrastable export routing-policy runs/agent_loop.jsonl --output configs/routing.json
ultrastable export budget-policy runs/agent_loop.jsonl --output configs/budget_policy.json

# Finance-grade reports
ultrastable report spend runs/agent_loop.jsonl --by customer --filter agent=loop
ultrastable report unit-econ runs/agent_loop.jsonl --metric cost_per_task --task-map mappings/tasks.json
ultrastable report zombie runs/agent_loop.jsonl --start-time 2025-06-01T00:00:00Z --end-time 2025-06-07T23:59:59Z

# Package run artifacts into an evidence bundle
ultrastable evidence bundle create runs/afmb --output runs/afmb_bundle.tar.gz
# Verify an evidence bundle deterministically
ultrastable evidence bundle verify runs/afmb_bundle.tar.gz

# Compare PolicyPacks at field level
ultrastable policy diff configs/base_policy.json configs/candidate_policy.json --format text

# Lint a PolicyPack before rollout (treat warnings as errors)
ultrastable policy lint configs/guard.json --strict

Evidence bundling is privacy-hardened: any JSONL ledgers included in the archive are redacted to metadata-only (prompt/response bodies removed, hashes preserved) and their hash chain is recomputed for determinism inside the bundle. Original files on disk are not modified.

Spend reports now emit a health block that surfaces the latest D(H) trend (start/end/min/max plus median and p90) alongside an intervention_effect_size summary of the recorded ΔD(H) deltas. Use it to see whether interventions are actually reducing the health distance—even when you are only skimming the CLI output. When you pass --format text, the CLI prints a short D(H) trend plus ΔD(H) summary line so on-call engineers can confirm recovery direction at a glance without scrolling through JSON.

ultrastable ledger validate also supports --hash-chain to recompute every event_hash and prove the log has not been modified—any mid-chain modification, reordering, or insertion yields a prev_hash mismatch; omit the flag for a quick JSON/schema sanity check. The legacy ultrastable validate ledger remains as an alias for existing scripts. Ledgers written prior to the hash-chain rollout still pass this command—the CLI simply reports that the hash chain is absent so older archives remain readable. ultrastable replay performs the same hash-chain verification (when ledgers carry event_hash/prev_hash pairs) before running controllers and includes a hash_chain block in the emitted report metadata so audits can prove the replay consumed unmodified evidence; legacy ledgers that predate hash chaining are noted as such instead of failing.

inspect and all report subcommands support --format json|text plus --output FILE to control machine-readable artifacts. The inspect summary highlights steps.tool_calls and interventions.outcomes so tool loops and their intervention results are visible without opening the raw JSONL.

PolicyPacks must be JSON; YAML parsing has been removed to keep the core dependency-free.

CLI demos/dashboards honor --redaction metadata-only|selective-text|full-text|none (with none acting as a convenience alias for full-text). Use --redaction none only when you explicitly want prompt/response bodies persisted to disk; the default metadata-only mode hashes/redacts those fields. selective-text keeps error strings for debugging while still hashing prompts/responses, and full-text (none) preserves every raw body for short-lived investigations—treat ledgers produced in that mode as sensitive. See docs/privacy.md for the full tradeoff matrix.

Observability (Grafana/OTEL) Quick Start

Ultrastable emits spans/metrics via OTLP/HTTP when telemetry is enabled. For a fast local check:

  1. Start a local OTEL Collector (HTTP receiver → debug exporter):
docker run --rm -p 4318:4318 \
  -v "$(pwd)/collector-minimal.yaml:/etc/otelcol/config.yaml:ro" \
  otel/opentelemetry-collector:latest --config /etc/otelcol/config.yaml
  1. Point Ultrastable to it and send telemetry:
export ULTRASTABLE_OTLP_ENDPOINT=http://localhost:4318
python3 scripts/run_integrations_smoke.py --otel auto
  1. Import the dashboard template:
  • Open Grafana and import docs/grafana_dashboard.json (see docs/grafana_dashboard.md).
  • For hosted setups (Grafana Cloud), configure your collector to export traces via otlphttp and metrics via prometheusremotewrite using your Cloud endpoints/tokens.

Examples

Run any example with python examples/<name>.py:

  • agent_loop_offline.py, budget_cap.py, tool_loop.py, context_pressure.py
  • battery_agent.py — trust-battery agent that halts when feedback is poor
  • robotics_drive_demo.py, mobile_homeostat_demo.py (writes runs/mobile_homeostat_trajectory.svg), dipaolo_replication.py
  • replay_policy_change.py

Robotics (optional)

Robotics/control primitives live behind the robotics extra and are not part of the core agent governance path. See docs under docs/robotics/* and the examples above for optional demos.

Benchmark Quickstart (AFMB)

Run the offline AFMB smoke suite and emit artifacts in one go using the CLI:

ultrastable benchmark run \
  --config benchmarks/afmb_baselines.json \
  --output-dir runs/afmb --seed 123 --emit-kpis

Validate artifacts:

ultrastable benchmark validate config benchmarks/afmb_baselines.json
ultrastable benchmark validate manifest runs/afmb/manifest.json
ultrastable benchmark validate results runs/afmb/results.json
ultrastable benchmark validate cases benchmarks/afmb_cases.json

Artifacts in runs/afmb/ include manifest.json, results.json with a kpis block, report.md, a ledgers/ directory per case, and a combined ledger for aggregation.

See the case registry at benchmarks/afmb_cases.json and the packaged runner in ultrastable/benchmark/afmb_suite.py for scenario definitions and programmatic access.

License & Contributions

Ultrastable is released under the MIT License (see LICENSE). Contributions are welcome—see CONTRIBUTING.md for branching/testing/style guidelines before opening a merge request.

Acknowledgement: A Note on the Name

The name “Ultrastable” is a small salute to W. Ross Ashby and his work on ultrastability and homeostatic systems (e.g., An Introduction to Cybernetics; Design for a Brain - The origin of adaptive behaviour). Ideas such as essential variables, viability, and homeostasis inform the way we model agent health and design controllers in this library.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ultrastable-0.4.2.tar.gz (182.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ultrastable-0.4.2-py3-none-any.whl (152.6 kB view details)

Uploaded Python 3

File details

Details for the file ultrastable-0.4.2.tar.gz.

File metadata

  • Download URL: ultrastable-0.4.2.tar.gz
  • Upload date:
  • Size: 182.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for ultrastable-0.4.2.tar.gz
Algorithm Hash digest
SHA256 c8b4f3f2ed88bb4883009628157dd652dd4909ae322d3d364dad79ddcdde9b81
MD5 1c84fd4cc306c5a6dd7058b9caaa3e2d
BLAKE2b-256 366f6c0dc71d7a09bb5eb294f0604f9573b5341cb97e7d9df022e7222b540b95

See more details on using hashes here.

File details

Details for the file ultrastable-0.4.2-py3-none-any.whl.

File metadata

  • Download URL: ultrastable-0.4.2-py3-none-any.whl
  • Upload date:
  • Size: 152.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for ultrastable-0.4.2-py3-none-any.whl
Algorithm Hash digest
SHA256 66f3aae35e3340d0c8f503d8df334cfde322710cea9252e31d543da085349b3c
MD5 511916af887d5a731e8eb68acbc469b0
BLAKE2b-256 d4b1f2594608d9e52447a4759affa8ded9df0538e3a1e55ac15d7075cac178b4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page