Local-first, audit-grade stability/guard library for AI agents (with optional robotics extras)
Project description
Ultrastable
Ultrastable is a local-first, audit-grade guard layer for AI agents. It keeps agents within viable boundaries by monitoring essential variables, running deterministic detectors, and applying typed interventions when tokens, spend, context pressure, or retries become risky. The implementation stays focused on guarding runtime behavior, enforcing viability, and producing finance-grade evidence—all while keeping the core NumPy-only and offline-first.
Why Ultrastable?
- Guard runtime behavior: detect lexical repeats, runaway tool loops, or escalating costs and respond with deterministic interventions.
- Enforce viability: encode budgets/caps as PolicyPacks, hash them, and log every Run/Step/Trigger with schema v1.2 RunEvents that include
policy_hash. - Finance-grade evidence: append-only JSONL ledgers with per-entry
event_hash/prev_hashhash chains plus CLI spend/unit-econ/zombie reports for CFO-ready attribution. - Offline-first + minimal core:
ultrastable.coredepends only on NumPy/stdlib; CLI/exporters/robotics live behind extras. - Dual-domain: the same primitives stabilize agent loops (tokens/$) and robotics/control loops (battery, torque, temperature).
Installation & Quick Check
User install (PyPI):
pip install ultrastable
Extras (optional):
# CLI + reports
pip install "ultrastable[cli]"
# Full agent/cortex toolchain (CLI + OTLP telemetry)
pip install "ultrastable[cortex]"
# Robotics demos
pip install "ultrastable[robotics]"
Development install:
pip install -e .[dev]
To mirror the dependency set used in CI/experiments (core + extras):
pip install -r requirements.txt
Quick sanity check:
python -c "import ultrastable, ultrastable.core; print(ultrastable.core.ping())"
Run the automated experiment suite (writes ledgers/reports under runs/experiments):
python scripts/run_ultrastable_experiments.py --output-dir runs/experiments --keep-artifacts
Need a fast CI sanity check? Use the curated experiments.json smoke plan:
python scripts/run_ultrastable_experiments.py --plan experiments.json --output-dir runs/smoke --keep-artifacts
Want to exercise every major feature (agent demos, reports, exports, robotics examples) in under 45 minutes? Use the full-suite runner:
python scripts/run_ultrastable_full_suite.py --output-dir runs/full-suite --keep-artifacts
Need the AFMB failure-mode smoke suite to run offline in under 10 minutes (no paid APIs)?
python scripts/run_afmb_suite.py --output-dir runs/afmb-suite --keep-artifacts
Install Matrix
| Goal | Command | Extras pulled in | Notes |
|---|---|---|---|
| Core library / embeddable guards | pip install ultrastable |
NumPy only | Minimal footprint; tests/test_imports.py ensures no optional deps are loaded. |
| CLI + demos + reports | pip install "ultrastable[cli]" |
typer, rich |
Enables the Typer CLI with colorized output + interactive prompts. |
| AI Cortex / FinOps guardrails | pip install "ultrastable[cortex]" |
pydantic, httpx, rich, typer, opentelemetry-* |
Full AgentGuard stack including report/export tooling and telemetry hooks. |
| Robotics demos + DriveWrapper | pip install "ultrastable[robotics]" |
gymnasium, torch |
Heavier extra; only required for DriveReward/DriveWrapper/MobileHomeostat2D. |
Extras can be combined (e.g., pip install "ultrastable[cortex,robotics]"). Development installs still use pip install -e .[dev] for lint/test tooling.
60-second LangChain integration
Guard an existing LangChain or LangGraph workflow without rewriting it—install the connector and attach a callback:
pip install ultrastable ultrastable-langchain langchain-openai
Set your model provider credentials (e.g., OPENAI_API_KEY) and drop this snippet into your chain runner:
from langchain_openai import ChatOpenAI
from ultrastable.agent import AgentGuard
from ultrastable.cli.demos import build_agent_loop_controller
from ultrastable.ledger import JsonlLedger
from ultrastable_langchain import (
UltrastableCallbackHandler,
llm_run_to_guard_step,
pre_step_context_from_prompts,
)
ledger = JsonlLedger("runs/langchain_demo.jsonl", redaction="metadata-only")
guard = AgentGuard(
controller=build_agent_loop_controller(),
ledger=ledger,
context_budget_chars=2000,
)
handler = UltrastableCallbackHandler()
llm = ChatOpenAI(model="gpt-4o-mini", callbacks=[handler], tags=["support-desk"])
guard.start_run(run_id="support-demo", tags={"agent": "tickets"})
llm.invoke("Draft a 1 sentence welcome.")
context = pre_step_context_from_prompts(handler.llm_runs[-1].prompts, base_id="welcome")
if context:
guard.pre_step(context)
guard_step = llm_run_to_guard_step(handler.llm_runs[-1])
guard.post_step(
step_id=guard_step.result.step_id,
role=guard_step.result.role,
kind=guard_step.result.kind,
model=guard_step.result.model,
prompt_text=guard_step.result.prompt_text,
response_text=guard_step.result.response_text,
metrics=guard_step.metrics,
tags={"tenant": "acme-co", **(guard_step.result.tags or {})},
)
guard.end_run()
ledger.close()
UltrastableCallbackHandler records LLM/tool spans, the bridge helpers convert them into AgentGuard payloads, and JsonlLedger writes a tamper-evident log under runs/. Iterate over handler.tool_runs with tool_run_to_guard_step to emit tool calls (with deterministic tool_args_hash) so ToolLoopDetector and spend reports see the full interaction history.
Documentation
docs/concepts.md— essential variables, viability policies, interventions.docs/quickstart.md— guided CLI walkthrough (mirrors CI smoke tests).docs/ai/agent_guard.md,docs/ai/detectors_interventions.md,docs/ai/reports_and_gitops.md— AI guard tutorials.docs/robotics/homeostatic_reward.md,docs/robotics/mobile_homeostat.md,docs/robotics/plasticity_resets.md— robotics demos + extras.docs/registry.md— coupling/plasticity registries and IDs.docs/howto_detectors.md,docs/howto_interventions.md,docs/cli.md— practical guides.docs/event_schema.md— schema v1.2 (RunEventpolicy_hash, PolicySwitchEvent provenance).docs/experiments.md— instructions for the experiment runner.docs/langchain_connector_limitations.md— known limitations of theultrastable-langchainconnector’s first public alpha.docs/benchmark_manifest.md—manifest.jsonschema + helpers for benchmark runs.docs/benchmark_results.md—results.jsonschema + helpers for benchmark outputs.docs/benchmark_harness_config.md— AFMB harness config format (afmb_baselines.json) covering timeout/retry/tool/budget baselines.ROADMAP.md— current milestone focus and upcoming goals.
Baseline harness helpers live under the optional ultrastable.benchmark namespace and are only
imported when explicitly requested, so importing the core ultrastable package never pulls in
baseline/batch experiment code.
CLI Highlights
# Validate and hash a PolicyPack
ultrastable validate policy configs/guard.json
# Tamper-evident ledger validation (hash chain)
ultrastable ledger validate runs/agent_loop.jsonl --hash-chain
# Run demos (built-in or PolicyPack-backed)
ultrastable demo agent-loop --output runs/agent_loop.jsonl --policy-pack configs/guard.json
ultrastable demo budget-cap --output runs/budget_cap.jsonl
# Replay ledgers deterministically
ultrastable replay runs/agent_loop.jsonl --policy-pack configs/guard.json --deterministic
# Export deterministic routing + budget configs
ultrastable export routing-policy runs/agent_loop.jsonl --output configs/routing.json
ultrastable export budget-policy runs/agent_loop.jsonl --output configs/budget_policy.json
# Finance-grade reports
ultrastable report spend runs/agent_loop.jsonl --by customer --filter agent=loop
ultrastable report unit-econ runs/agent_loop.jsonl --metric cost_per_task --task-map mappings/tasks.json
ultrastable report zombie runs/agent_loop.jsonl --start-time 2025-06-01T00:00:00Z --end-time 2025-06-07T23:59:59Z
Spend reports now emit a health block that surfaces the latest D(H) trend
(start/end/min/max plus median and p90) alongside an intervention_effect_size
summary of the recorded ΔD(H) deltas. Use it to see whether interventions are
actually reducing the health distance—even when you are only skimming the CLI
output. When you pass --format text, the CLI prints a short D(H) trend plus
ΔD(H) summary line so on-call engineers can confirm recovery direction at a
glance without scrolling through JSON.
ultrastable ledger validate also supports --hash-chain to recompute every
event_hash and prove the log has not been modified—any mid-chain modification,
reordering, or insertion yields a prev_hash mismatch; omit the flag for a quick
JSON/schema sanity check. The legacy ultrastable validate ledger remains as an
alias for existing scripts. Ledgers written prior to the hash-chain rollout still
pass this command—the CLI simply reports that the hash chain is absent so older
archives remain readable.
ultrastable replay performs the same hash-chain verification (when ledgers
carry event_hash/prev_hash pairs) before running controllers and includes a
hash_chain block in the emitted report metadata so audits can prove the replay
consumed unmodified evidence; legacy ledgers that predate hash chaining are
noted as such instead of failing.
inspect and all report subcommands support --format json|text plus --output FILE to control machine-readable artifacts.
The inspect summary highlights steps.tool_calls and interventions.outcomes
so tool loops and their intervention results are visible without opening the
raw JSONL.
PolicyPacks must be JSON; YAML parsing has been removed to keep the core dependency-free.
CLI demos/dashboards honor --redaction metadata-only|selective-text|full-text|none
(with none acting as a convenience alias for full-text). Use --redaction none
only when you explicitly want prompt/response bodies persisted to disk; the
default metadata-only mode hashes/redacts those fields. selective-text keeps
error strings for debugging while still hashing prompts/responses, and full-text
(none) preserves every raw body for short-lived investigations—treat ledgers
produced in that mode as sensitive. See docs/privacy.md for the full tradeoff
matrix.
Observability (Grafana/OTEL) Quick Start
Ultrastable emits spans/metrics via OTLP/HTTP when telemetry is enabled. For a fast local check:
- Start a local OTEL Collector (HTTP receiver → debug exporter):
docker run --rm -p 4318:4318 \
-v "$(pwd)/collector-minimal.yaml:/etc/otelcol/config.yaml:ro" \
otel/opentelemetry-collector:latest --config /etc/otelcol/config.yaml
- Point Ultrastable to it and send telemetry:
export ULTRASTABLE_OTLP_ENDPOINT=http://localhost:4318
python3 scripts/run_integrations_smoke.py --otel auto
- Import the dashboard template:
- Open Grafana and import
docs/grafana_dashboard.json(seedocs/grafana_dashboard.md). - For hosted setups (Grafana Cloud), configure your collector to export traces via
otlphttpand metrics viaprometheusremotewriteusing your Cloud endpoints/tokens.
Examples
Run any example with python examples/<name>.py:
agent_loop_offline.py,budget_cap.py,tool_loop.py,context_pressure.pybattery_agent.py— trust-battery agent that halts when feedback is poorrobotics_drive_demo.py,mobile_homeostat_demo.py(writesruns/mobile_homeostat_trajectory.svg),dipaolo_replication.pyreplay_policy_change.py
License & Contributions
Ultrastable is released under the MIT License (see LICENSE). Contributions are
welcome—see CONTRIBUTING.md for branching/testing/style guidelines before
opening a merge request.
Acknowledgement: A Note on the Name
The name “Ultrastable” is a small salute to W. Ross Ashby and his work on ultrastability and homeostatic systems (e.g., An Introduction to Cybernetics; Design for a Brain - The origin of adaptive behaviour). Ideas such as essential variables, viability, and homeostasis inform the way we model agent health and design controllers in this library.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ultrastable-0.4.0.tar.gz.
File metadata
- Download URL: ultrastable-0.4.0.tar.gz
- Upload date:
- Size: 146.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e1b197442c8525bedb9ac5e4d3591183d1e5c92b83816726b180d5e0acfa8442
|
|
| MD5 |
c9ec5d99ee6d97bd242108a45b37b2f9
|
|
| BLAKE2b-256 |
a49e155f719663f9fcc50478752c6f9bac754ecc4492676393203a8431544c78
|
File details
Details for the file ultrastable-0.4.0-py3-none-any.whl.
File metadata
- Download URL: ultrastable-0.4.0-py3-none-any.whl
- Upload date:
- Size: 126.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c782ea3a7be76f46dfcf7ba900cb2339c6838f2f1d7b2c8f5dd0e06cc2eed2e0
|
|
| MD5 |
e6468af04027454f0918d584d6ac8699
|
|
| BLAKE2b-256 |
99da994503075bcf8053afb8e3e8bf73e725bde352901d83f3695902f7aa5247
|