Local-first, audit-grade stability/guard library for AI agents (with optional robotics extras)

These details have not been verified by PyPI

Project description

Ultrastable

Ultrastable is a local-first, audit-grade guard layer for AI agents, designed for agent runtime governance. It keeps agents within viable boundaries by monitoring essential variables, running deterministic detectors, and applying typed interventions when tokens, spend, context pressure, or retries become risky. The implementation stays focused on guarding runtime behavior, enforcing viability, and producing finance-grade evidence — all while keeping the core NumPy-only and offline-first.

Agent Runtime Governance

Essential variables: spend_usd, tokens_total, retries, tool_calls, D(H)
Deterministic detectors: lexical repeat, tool loop, context pressure, budget breach
Typed interventions: RESET_REPLAN, RESET_CONTEXT_TRIM, BUDGET_HARD_STOP
Evidence & replay: tamper-evident JSONL ledgers, hash-chain validation, deterministic replay

Why Ultrastable?

See also: Technical note on contracts and compatibility: docs/technical_note.md

CI matrix: Ubuntu 22.04 (Python 3.10) and Ubuntu 24.04 (Python 3.12).

Guard runtime behavior: detect lexical repeats, runaway tool loops, or escalating costs and respond with deterministic interventions.
Enforce viability: encode budgets/caps as PolicyPacks, hash them, and log every Run/Step/Trigger with schema v1.2 RunEvents that include policy_hash.
Finance-grade evidence: append-only JSONL ledgers with per-entry event_hash/prev_hash hash chains plus CLI spend/unit-econ/zombie reports for CFO-ready attribution.
Offline-first + minimal core: ultrastable.core depends only on NumPy/stdlib; CLI/exporters/robotics live behind extras.
Dual-domain: the same primitives stabilize agent loops (tokens/$) and robotics/control loops (battery, torque, temperature).

Installation & Quick Check

User install (PyPI):

pip install ultrastable

Extras (optional):

# CLI + reports
pip install "ultrastable[cli]"
# Full agent/cortex toolchain (CLI + OTLP telemetry)
pip install "ultrastable[cortex]"
# Robotics demos
pip install "ultrastable[robotics]"

Development install:

pip install -e .[dev]

To mirror the dependency set used in CI/experiments (core + extras):

pip install -r requirements.txt

Quick sanity check:

python -c "import ultrastable, ultrastable.core; print(ultrastable.core.ping())"

Run the automated experiment suite (writes ledgers/reports under runs/experiments):

python scripts/run_ultrastable_experiments.py --output-dir runs/experiments --keep-artifacts

Need a fast CI sanity check? Use the curated experiments.json smoke plan:

python scripts/run_ultrastable_experiments.py --plan experiments.json --output-dir runs/smoke --keep-artifacts

Want to exercise every major feature (agent demos, reports, exports, robotics examples) in under 45 minutes? Use the full-suite runner:

python scripts/run_ultrastable_full_suite.py --output-dir runs/full-suite --keep-artifacts

Need the AFMB failure-mode smoke suite to run offline in under 10 minutes (no paid APIs)?

ultrastable benchmark run \
  --config benchmarks/afmb_baselines.json \
  --output-dir runs/afmb-suite --seed 123

Or use the convenience wrapper script (delegates to the packaged runner): python scripts/run_afmb_suite.py --output-dir runs/afmb-suite.

Install Matrix

Goal	Command	Extras pulled in	Notes
Core library / embeddable guards	`pip install ultrastable`	NumPy only	Minimal footprint; `tests/test_imports.py` ensures no optional deps are loaded.
CLI + demos + reports	`pip install "ultrastable[cli]"`	`typer`, `rich`	Enables the Typer CLI with colorized output + interactive prompts.
AI Cortex / FinOps guardrails	`pip install "ultrastable[cortex]"`	`pydantic`, `httpx`, `rich`, `typer`, `opentelemetry-*`	Full AgentGuard stack including report/export tooling and telemetry hooks.
Robotics demos + DriveWrapper	`pip install "ultrastable[robotics]"`	`gymnasium`, `torch`	Heavier extra; only required for DriveReward/DriveWrapper/MobileHomeostat2D.

Extras can be combined (e.g., pip install "ultrastable[cortex,robotics]"). Development installs still use pip install -e .[dev] for lint/test tooling.

Dependency Matrix (integration smoke)

Run the optional dependency-version matrix locally to sanity-check minimum vs. latest versions of key extras. This creates ephemeral, per-scenario virtualenvs under .venv_matrix/ and executes lightweight smoke tests:

python scripts/run_dependency_matrix.py            # core, events, cli, otlp, cortex @ min/latest
python scripts/run_dependency_matrix.py --recreate # drop/recreate venvs
python scripts/run_dependency_matrix.py --include-robotics  # also exercise robotics extra

Minimum versions are pinned in constraints/min-versions.txt and used with pip's -c flag to force resolver decisions toward those minima.

Exporters & Backpressure

Ultrastable includes non-blocking exporters for forwarding events (stdout, JSONL files, HTTP/OTLP). Exporters use a bounded in-memory queue with a hard cap; when the queue is full, the default behavior is to drop the incoming batch and return accepted=False in ExportResult so producers can detect loss. You can opt into backpressure by setting on_full="block" (with an optional block_timeout) on exporter constructors to make export() block until space is available or the timeout elapses.

60-second LangChain integration

Guard an existing LangChain or LangGraph workflow without rewriting it—install the connector and attach a callback:

pip install ultrastable ultrastable-langchain langchain-openai

Set your model provider credentials (e.g., OPENAI_API_KEY) and drop this snippet into your chain runner:

from langchain_openai import ChatOpenAI
from ultrastable.agent import AgentGuard
from ultrastable.cli.demos import build_agent_loop_controller
from ultrastable.ledger import JsonlLedger
from ultrastable_langchain import (
    UltrastableCallbackHandler,
    llm_run_to_guard_step,
    pre_step_context_from_prompts,
)

ledger = JsonlLedger("runs/langchain_demo.jsonl", redaction="metadata-only")
guard = AgentGuard(
    controller=build_agent_loop_controller(),
    ledger=ledger,
    context_budget_chars=2000,
)
handler = UltrastableCallbackHandler()
llm = ChatOpenAI(model="gpt-4o-mini", callbacks=[handler], tags=["support-desk"])

guard.start_run(run_id="support-demo", tags={"agent": "tickets"})
llm.invoke("Draft a 1 sentence welcome.")
context = pre_step_context_from_prompts(handler.llm_runs[-1].prompts, base_id="welcome")
if context:
    guard.pre_step(context)
guard_step = llm_run_to_guard_step(handler.llm_runs[-1])
guard.post_step(
    step_id=guard_step.result.step_id,
    role=guard_step.result.role,
    kind=guard_step.result.kind,
    model=guard_step.result.model,
    prompt_text=guard_step.result.prompt_text,
    response_text=guard_step.result.response_text,
    metrics=guard_step.metrics,
    tags={"tenant": "acme-co", **(guard_step.result.tags or {})},
)
guard.end_run()
ledger.close()

UltrastableCallbackHandler records LLM/tool spans, the bridge helpers convert them into AgentGuard payloads, and JsonlLedger writes a tamper-evident log under runs/. Iterate over handler.tool_runs with tool_run_to_guard_step to emit tool calls (with deterministic tool_args_hash) so ToolLoopDetector and spend reports see the full interaction history.

Documentation

docs/concepts.md — essential variables, viability policies, interventions.
docs/quickstart.md — guided CLI walkthrough (mirrors CI smoke tests).
docs/ai/agent_guard.md, docs/ai/detectors_interventions.md, docs/ai/reports_and_gitops.md — AI guard tutorials.
docs/robotics/homeostatic_reward.md, docs/robotics/mobile_homeostat.md, docs/robotics/plasticity_resets.md — robotics demos + extras.
docs/registry.md — coupling/plasticity registries and IDs.
docs/howto_detectors.md, docs/howto_interventions.md, docs/cli.md — practical guides.
docs/event_schema.md — schema v1.2 (RunEvent policy_hash, PolicySwitchEvent provenance).
docs/experiments.md — instructions for the experiment runner.
docs/langchain_connector_limitations.md — known limitations of the ultrastable-langchain connector’s first public alpha.
docs/benchmark_manifest.md — manifest.json schema + helpers for benchmark runs.
docs/benchmark_results.md — results.json schema + helpers for benchmark outputs.
docs/benchmark_harness_config.md — AFMB harness config format (afmb_baselines.json) covering timeout/retry/tool/budget baselines.
ROADMAP.md — current milestone focus and upcoming goals.

Baseline harness helpers live under the optional ultrastable.benchmark namespace and are only imported when explicitly requested, so importing the core ultrastable package never pulls in baseline/batch experiment code.

CLI Highlights

# Validate and hash a PolicyPack
ultrastable validate policy configs/guard.json

# Tamper-evident ledger validation (hash chain)
ultrastable ledger validate runs/agent_loop.jsonl --hash-chain

# Run demos (built-in or PolicyPack-backed)
ultrastable demo agent-loop --output runs/agent_loop.jsonl --policy-pack configs/guard.json
ultrastable demo budget-cap --output runs/budget_cap.jsonl

# Replay ledgers deterministically
ultrastable replay runs/agent_loop.jsonl --policy-pack configs/guard.json --deterministic

# Export deterministic routing + budget configs
ultrastable export routing-policy runs/agent_loop.jsonl --output configs/routing.json
ultrastable export budget-policy runs/agent_loop.jsonl --output configs/budget_policy.json

# Finance-grade reports
ultrastable report spend runs/agent_loop.jsonl --by customer --filter agent=loop
ultrastable report unit-econ runs/agent_loop.jsonl --metric cost_per_task --task-map mappings/tasks.json
ultrastable report zombie runs/agent_loop.jsonl --start-time 2025-06-01T00:00:00Z --end-time 2025-06-07T23:59:59Z

# Package run artifacts into an evidence bundle
ultrastable evidence bundle create runs/afmb --output runs/afmb_bundle.tar.gz
# Verify an evidence bundle deterministically
ultrastable evidence bundle verify runs/afmb_bundle.tar.gz

# Compare PolicyPacks at field level
ultrastable policy diff configs/base_policy.json configs/candidate_policy.json --format text

# Lint a PolicyPack before rollout (treat warnings as errors)
ultrastable policy lint configs/guard.json --strict

Evidence bundling is privacy-hardened: any JSONL ledgers included in the archive are redacted to metadata-only (prompt/response bodies removed, hashes preserved) and their hash chain is recomputed for determinism inside the bundle. Original files on disk are not modified.

Spend reports now emit a health block that surfaces the latest D(H) trend (start/end/min/max plus median and p90) alongside an intervention_effect_size summary of the recorded ΔD(H) deltas. Use it to see whether interventions are actually reducing the health distance—even when you are only skimming the CLI output. When you pass --format text, the CLI prints a short D(H) trend plus ΔD(H) summary line so on-call engineers can confirm recovery direction at a glance without scrolling through JSON.

ultrastable ledger validate also supports --hash-chain to recompute every event_hash and prove the log has not been modified—any mid-chain modification, reordering, or insertion yields a prev_hash mismatch; omit the flag for a quick JSON/schema sanity check. The legacy ultrastable validate ledger remains as an alias for existing scripts. Ledgers written prior to the hash-chain rollout still pass this command—the CLI simply reports that the hash chain is absent so older archives remain readable. ultrastable replay performs the same hash-chain verification (when ledgers carry event_hash/prev_hash pairs) before running controllers and includes a hash_chain block in the emitted report metadata so audits can prove the replay consumed unmodified evidence; legacy ledgers that predate hash chaining are noted as such instead of failing.

inspect and all report subcommands support --format json|text plus --output FILE to control machine-readable artifacts. The inspect summary highlights steps.tool_calls and interventions.outcomes so tool loops and their intervention results are visible without opening the raw JSONL.

PolicyPacks must be JSON; YAML parsing has been removed to keep the core dependency-free.

CLI demos/dashboards honor --redaction metadata-only|selective-text|full-text|none (with none acting as a convenience alias for full-text). Use --redaction none only when you explicitly want prompt/response bodies persisted to disk; the default metadata-only mode hashes/redacts those fields. selective-text keeps error strings for debugging while still hashing prompts/responses, and full-text (none) preserves every raw body for short-lived investigations—treat ledgers produced in that mode as sensitive. See docs/privacy.md for the full tradeoff matrix.

Observability (Grafana/OTEL) Quick Start

Ultrastable emits spans/metrics via OTLP/HTTP when telemetry is enabled. For a fast local check:

Start a local OTEL Collector (HTTP receiver → debug exporter):

docker run --rm -p 4318:4318 \
  -v "$(pwd)/collector-minimal.yaml:/etc/otelcol/config.yaml:ro" \
  otel/opentelemetry-collector:latest --config /etc/otelcol/config.yaml

Point Ultrastable to it and send telemetry:

export ULTRASTABLE_OTLP_ENDPOINT=http://localhost:4318
python3 scripts/run_integrations_smoke.py --otel auto

Import the dashboard template:

Open Grafana and import docs/grafana_dashboard.json (see docs/grafana_dashboard.md).
For hosted setups (Grafana Cloud), configure your collector to export traces via otlphttp and metrics via prometheusremotewrite using your Cloud endpoints/tokens.

Examples

Run any example with python examples/<name>.py:

agent_loop_offline.py, budget_cap.py, tool_loop.py, context_pressure.py
battery_agent.py — trust-battery agent that halts when feedback is poor
robotics_drive_demo.py, mobile_homeostat_demo.py (writes runs/mobile_homeostat_trajectory.svg), dipaolo_replication.py
replay_policy_change.py

Robotics (optional)

Robotics/control primitives live behind the robotics extra and are not part of the core agent governance path. See docs under docs/robotics/* and the examples above for optional demos.

Benchmark Quickstart (AFMB)

Run the offline AFMB smoke suite and emit artifacts in one go using the CLI:

ultrastable benchmark run \
  --config benchmarks/afmb_baselines.json \
  --output-dir runs/afmb --seed 123 --emit-kpis

Validate artifacts:

ultrastable benchmark validate config benchmarks/afmb_baselines.json
ultrastable benchmark validate manifest runs/afmb/manifest.json
ultrastable benchmark validate results runs/afmb/results.json
ultrastable benchmark validate cases benchmarks/afmb_cases.json

Artifacts in runs/afmb/ include manifest.json, results.json with a kpis block, report.md, a ledgers/ directory per case, and a combined ledger for aggregation.

See the case registry at benchmarks/afmb_cases.json and the packaged runner in ultrastable/benchmark/afmb_suite.py for scenario definitions and programmatic access.

License & Contributions

Ultrastable is released under the MIT License (see LICENSE). Contributions are welcome—see CONTRIBUTING.md for branching/testing/style guidelines before opening a merge request.

Acknowledgement: A Note on the Name

The name “Ultrastable” is a small salute to W. Ross Ashby and his work on ultrastability and homeostatic systems (e.g., An Introduction to Cybernetics; Design for a Brain - The origin of adaptive behaviour). Ideas such as essential variables, viability, and homeostasis inform the way we model agent health and design controllers in this library.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.4.2

Mar 5, 2026

0.4.1

Mar 2, 2026

0.4.0

Mar 1, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ultrastable-0.4.2.tar.gz (182.2 kB view details)

Uploaded Mar 5, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

ultrastable-0.4.2-py3-none-any.whl (152.6 kB view details)

Uploaded Mar 5, 2026 Python 3

File details

Details for the file ultrastable-0.4.2.tar.gz.

File metadata

Download URL: ultrastable-0.4.2.tar.gz
Upload date: Mar 5, 2026
Size: 182.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for ultrastable-0.4.2.tar.gz
Algorithm	Hash digest
SHA256	`c8b4f3f2ed88bb4883009628157dd652dd4909ae322d3d364dad79ddcdde9b81`
MD5	`1c84fd4cc306c5a6dd7058b9caaa3e2d`
BLAKE2b-256	`366f6c0dc71d7a09bb5eb294f0604f9573b5341cb97e7d9df022e7222b540b95`

See more details on using hashes here.

File details

Details for the file ultrastable-0.4.2-py3-none-any.whl.

File metadata

Download URL: ultrastable-0.4.2-py3-none-any.whl
Upload date: Mar 5, 2026
Size: 152.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for ultrastable-0.4.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`66f3aae35e3340d0c8f503d8df334cfde322710cea9252e31d543da085349b3c`
MD5	`511916af887d5a731e8eb68acbc469b0`
BLAKE2b-256	`d4b1f2594608d9e52447a4759affa8ded9df0538e3a1e55ac15d7075cac178b4`

See more details on using hashes here.

ultrastable 0.4.2

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

Ultrastable

Agent Runtime Governance

Why Ultrastable?

Installation & Quick Check

Install Matrix

Dependency Matrix (integration smoke)

Exporters & Backpressure

60-second LangChain integration

Documentation

CLI Highlights

Observability (Grafana/OTEL) Quick Start

Examples

Robotics (optional)

Benchmark Quickstart (AFMB)

License & Contributions

Acknowledgement: A Note on the Name

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes