Skip to main content

Local-first, audit-grade stability/guard library for AI agents (with optional robotics extras)

Project description

Ultrastable

Ultrastable is a local-first, audit-grade guard layer for AI agents. It keeps agents within viable boundaries by monitoring essential variables, running deterministic detectors, and applying typed interventions when tokens, spend, context pressure, or retries become risky. The implementation stays focused on guarding runtime behavior, enforcing viability, and producing finance-grade evidence—all while keeping the core NumPy-only and offline-first.

Why Ultrastable?

Pipeline

  • Guard runtime behavior: detect lexical repeats, runaway tool loops, or escalating costs and respond with deterministic interventions.
  • Enforce viability: encode budgets/caps as PolicyPacks, hash them, and log every Run/Step/Trigger with schema v1.2 RunEvents that include policy_hash.
  • Finance-grade evidence: append-only JSONL ledgers with per-entry event_hash/prev_hash hash chains plus CLI spend/unit-econ/zombie reports for CFO-ready attribution.
  • Offline-first + minimal core: ultrastable.core depends only on NumPy/stdlib; CLI/exporters/robotics live behind extras.
  • Dual-domain: the same primitives stabilize agent loops (tokens/$) and robotics/control loops (battery, torque, temperature).

Installation & Quick Check

User install (PyPI):

pip install ultrastable

Extras (optional):

# CLI + reports
pip install "ultrastable[cli]"
# Full agent/cortex toolchain (CLI + OTLP telemetry)
pip install "ultrastable[cortex]"
# Robotics demos
pip install "ultrastable[robotics]"

Development install:

pip install -e .[dev]

To mirror the dependency set used in CI/experiments (core + extras):

pip install -r requirements.txt

Quick sanity check:

python -c "import ultrastable, ultrastable.core; print(ultrastable.core.ping())"

Run the automated experiment suite (writes ledgers/reports under runs/experiments):

python scripts/run_ultrastable_experiments.py --output-dir runs/experiments --keep-artifacts

Need a fast CI sanity check? Use the curated experiments.json smoke plan:

python scripts/run_ultrastable_experiments.py --plan experiments.json --output-dir runs/smoke --keep-artifacts

Want to exercise every major feature (agent demos, reports, exports, robotics examples) in under 45 minutes? Use the full-suite runner:

python scripts/run_ultrastable_full_suite.py --output-dir runs/full-suite --keep-artifacts

Need the AFMB failure-mode smoke suite to run offline in under 10 minutes (no paid APIs)?

python scripts/run_afmb_suite.py --output-dir runs/afmb-suite --keep-artifacts

Install Matrix

Goal Command Extras pulled in Notes
Core library / embeddable guards pip install ultrastable NumPy only Minimal footprint; tests/test_imports.py ensures no optional deps are loaded.
CLI + demos + reports pip install "ultrastable[cli]" typer, rich Enables the Typer CLI with colorized output + interactive prompts.
AI Cortex / FinOps guardrails pip install "ultrastable[cortex]" pydantic, httpx, rich, typer, opentelemetry-* Full AgentGuard stack including report/export tooling and telemetry hooks.
Robotics demos + DriveWrapper pip install "ultrastable[robotics]" gymnasium, torch Heavier extra; only required for DriveReward/DriveWrapper/MobileHomeostat2D.

Extras can be combined (e.g., pip install "ultrastable[cortex,robotics]"). Development installs still use pip install -e .[dev] for lint/test tooling.

60-second LangChain integration

Guard an existing LangChain or LangGraph workflow without rewriting it—install the connector and attach a callback:

pip install ultrastable ultrastable-langchain langchain-openai

Set your model provider credentials (e.g., OPENAI_API_KEY) and drop this snippet into your chain runner:

from langchain_openai import ChatOpenAI
from ultrastable.agent import AgentGuard
from ultrastable.cli.demos import build_agent_loop_controller
from ultrastable.ledger import JsonlLedger
from ultrastable_langchain import (
    UltrastableCallbackHandler,
    llm_run_to_guard_step,
    pre_step_context_from_prompts,
)

ledger = JsonlLedger("runs/langchain_demo.jsonl", redaction="metadata-only")
guard = AgentGuard(
    controller=build_agent_loop_controller(),
    ledger=ledger,
    context_budget_chars=2000,
)
handler = UltrastableCallbackHandler()
llm = ChatOpenAI(model="gpt-4o-mini", callbacks=[handler], tags=["support-desk"])

guard.start_run(run_id="support-demo", tags={"agent": "tickets"})
llm.invoke("Draft a 1 sentence welcome.")
context = pre_step_context_from_prompts(handler.llm_runs[-1].prompts, base_id="welcome")
if context:
    guard.pre_step(context)
guard_step = llm_run_to_guard_step(handler.llm_runs[-1])
guard.post_step(
    step_id=guard_step.result.step_id,
    role=guard_step.result.role,
    kind=guard_step.result.kind,
    model=guard_step.result.model,
    prompt_text=guard_step.result.prompt_text,
    response_text=guard_step.result.response_text,
    metrics=guard_step.metrics,
    tags={"tenant": "acme-co", **(guard_step.result.tags or {})},
)
guard.end_run()
ledger.close()

UltrastableCallbackHandler records LLM/tool spans, the bridge helpers convert them into AgentGuard payloads, and JsonlLedger writes a tamper-evident log under runs/. Iterate over handler.tool_runs with tool_run_to_guard_step to emit tool calls (with deterministic tool_args_hash) so ToolLoopDetector and spend reports see the full interaction history.

Documentation

Baseline harness helpers live under the optional ultrastable.benchmark namespace and are only imported when explicitly requested, so importing the core ultrastable package never pulls in baseline/batch experiment code.

CLI Highlights

# Validate and hash a PolicyPack
ultrastable validate policy configs/guard.json

# Tamper-evident ledger validation (hash chain)
ultrastable ledger validate runs/agent_loop.jsonl --hash-chain

# Run demos (built-in or PolicyPack-backed)
ultrastable demo agent-loop --output runs/agent_loop.jsonl --policy-pack configs/guard.json
ultrastable demo budget-cap --output runs/budget_cap.jsonl

# Replay ledgers deterministically
ultrastable replay runs/agent_loop.jsonl --policy-pack configs/guard.json --deterministic

# Export deterministic routing + budget configs
ultrastable export routing-policy runs/agent_loop.jsonl --output configs/routing.json
ultrastable export budget-policy runs/agent_loop.jsonl --output configs/budget_policy.json

# Finance-grade reports
ultrastable report spend runs/agent_loop.jsonl --by customer --filter agent=loop
ultrastable report unit-econ runs/agent_loop.jsonl --metric cost_per_task --task-map mappings/tasks.json
ultrastable report zombie runs/agent_loop.jsonl --start-time 2025-06-01T00:00:00Z --end-time 2025-06-07T23:59:59Z

Spend reports now emit a health block that surfaces the latest D(H) trend (start/end/min/max plus median and p90) alongside an intervention_effect_size summary of the recorded ΔD(H) deltas. Use it to see whether interventions are actually reducing the health distance—even when you are only skimming the CLI output. When you pass --format text, the CLI prints a short D(H) trend plus ΔD(H) summary line so on-call engineers can confirm recovery direction at a glance without scrolling through JSON.

ultrastable ledger validate also supports --hash-chain to recompute every event_hash and prove the log has not been modified—any mid-chain modification, reordering, or insertion yields a prev_hash mismatch; omit the flag for a quick JSON/schema sanity check. The legacy ultrastable validate ledger remains as an alias for existing scripts. Ledgers written prior to the hash-chain rollout still pass this command—the CLI simply reports that the hash chain is absent so older archives remain readable. ultrastable replay performs the same hash-chain verification (when ledgers carry event_hash/prev_hash pairs) before running controllers and includes a hash_chain block in the emitted report metadata so audits can prove the replay consumed unmodified evidence; legacy ledgers that predate hash chaining are noted as such instead of failing.

inspect and all report subcommands support --format json|text plus --output FILE to control machine-readable artifacts. The inspect summary highlights steps.tool_calls and interventions.outcomes so tool loops and their intervention results are visible without opening the raw JSONL.

PolicyPacks must be JSON; YAML parsing has been removed to keep the core dependency-free.

CLI demos/dashboards honor --redaction metadata-only|selective-text|full-text|none (with none acting as a convenience alias for full-text). Use --redaction none only when you explicitly want prompt/response bodies persisted to disk; the default metadata-only mode hashes/redacts those fields. selective-text keeps error strings for debugging while still hashing prompts/responses, and full-text (none) preserves every raw body for short-lived investigations—treat ledgers produced in that mode as sensitive. See docs/privacy.md for the full tradeoff matrix.

Observability (Grafana/OTEL) Quick Start

Ultrastable emits spans/metrics via OTLP/HTTP when telemetry is enabled. For a fast local check:

  1. Start a local OTEL Collector (HTTP receiver → debug exporter):
docker run --rm -p 4318:4318 \
  -v "$(pwd)/collector-minimal.yaml:/etc/otelcol/config.yaml:ro" \
  otel/opentelemetry-collector:latest --config /etc/otelcol/config.yaml
  1. Point Ultrastable to it and send telemetry:
export ULTRASTABLE_OTLP_ENDPOINT=http://localhost:4318
python3 scripts/run_integrations_smoke.py --otel auto
  1. Import the dashboard template:
  • Open Grafana and import docs/grafana_dashboard.json (see docs/grafana_dashboard.md).
  • For hosted setups (Grafana Cloud), configure your collector to export traces via otlphttp and metrics via prometheusremotewrite using your Cloud endpoints/tokens.

Examples

Run any example with python examples/<name>.py:

  • agent_loop_offline.py, budget_cap.py, tool_loop.py, context_pressure.py
  • battery_agent.py — trust-battery agent that halts when feedback is poor
  • robotics_drive_demo.py, mobile_homeostat_demo.py (writes runs/mobile_homeostat_trajectory.svg), dipaolo_replication.py
  • replay_policy_change.py

License & Contributions

Ultrastable is released under the MIT License (see LICENSE). Contributions are welcome—see CONTRIBUTING.md for branching/testing/style guidelines before opening a merge request.

Acknowledgement: A Note on the Name

The name “Ultrastable” is a small salute to W. Ross Ashby and his work on ultrastability and homeostatic systems (e.g., An Introduction to Cybernetics; Design for a Brain - The origin of adaptive behaviour). Ideas such as essential variables, viability, and homeostasis inform the way we model agent health and design controllers in this library.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ultrastable-0.4.0.tar.gz (146.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ultrastable-0.4.0-py3-none-any.whl (126.1 kB view details)

Uploaded Python 3

File details

Details for the file ultrastable-0.4.0.tar.gz.

File metadata

  • Download URL: ultrastable-0.4.0.tar.gz
  • Upload date:
  • Size: 146.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for ultrastable-0.4.0.tar.gz
Algorithm Hash digest
SHA256 e1b197442c8525bedb9ac5e4d3591183d1e5c92b83816726b180d5e0acfa8442
MD5 c9ec5d99ee6d97bd242108a45b37b2f9
BLAKE2b-256 a49e155f719663f9fcc50478752c6f9bac754ecc4492676393203a8431544c78

See more details on using hashes here.

File details

Details for the file ultrastable-0.4.0-py3-none-any.whl.

File metadata

  • Download URL: ultrastable-0.4.0-py3-none-any.whl
  • Upload date:
  • Size: 126.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for ultrastable-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c782ea3a7be76f46dfcf7ba900cb2339c6838f2f1d7b2c8f5dd0e06cc2eed2e0
MD5 e6468af04027454f0918d584d6ac8699
BLAKE2b-256 99da994503075bcf8053afb8e3e8bf73e725bde352901d83f3695902f7aa5247

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page