Skip to main content

Agent Stability Engine: reproducible stability testing primitives for autonomous agents.

Project description

Agent Stability Engine (ASE)

Week 1-12 implementation for ASE core metrics, mutation stress testing, arbitration, contradiction analysis, taxonomy severity scoring, drift tracking, long-horizon stability, self-healing remediation, benchmark runner, regression gating, release/demo packaging, and CLI.

Quickstart

python -m venv .venv
source .venv/bin/activate
pip install -e .[dev]
python -m pytest
python -m ruff check .
python -m black --check .
python -m mypy src

CLI

export OPENAI_API_KEY="..."

python -m agent_stability_engine.cli evaluate --prompt "Explain checksums" --run-count 5 --seed 42 --asi-profile reasoning_focus --mutation-limit 6 --output out/eval.json --manifest-output out/eval.manifest.json
python -m agent_stability_engine.cli evaluate --agent-provider openai --agent-model gpt-4o-mini --prompt "Explain checksums" --run-count 3 --seed 42 --output out/eval-openai.json
python -m agent_stability_engine.cli benchmark --suite examples/benchmarks/default_suite.json --run-count 5 --seed 42 --asi-profile safety_strict --mutation-limit 6 --output out/bench.json --manifest-output out/bench.manifest.json
python -m agent_stability_engine.cli regress --suite examples/benchmarks/reasoning_suite.json --baseline examples/baselines/reasoning_suite.baseline.json --run-count 3 --seed 42 --output out/regress-reasoning.json
python -m agent_stability_engine.cli drift --current-report out/eval.json --baseline-report out/baseline_eval.json --output out/drift.json
python -m agent_stability_engine.cli horizon --prompt "Plan migration strategy" --horizon 6 --run-count 5 --seed 42 --output out/horizon.json
python -m agent_stability_engine.cli heal --prompt "Provide triage steps" --run-count 5 --seed 42 --max-attempts 2 --output out/heal.json --manifest-output out/heal.manifest.json
python -m agent_stability_engine.cli demo --output-dir out/demo --run-count 3 --seed 42 --horizon 4 --manifest-output out/demo.manifest.json

GitHub Action (Regression Gate)

Use the bundled action to gate PRs on benchmark ASI regressions.

name: ASE Regression Gate

on:
  pull_request:

jobs:
  ase-regress:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: ruthwikdovala/Stabilium@main
        with:
          suite: examples/benchmarks/reasoning_suite.json
          baseline: examples/baselines/reasoning_suite.baseline.json
          run-count: "3"
          seed: "42"

For OpenAI-backed runs, add:

      - uses: ruthwikdovala/Stabilium@main
        with:
          suite: examples/benchmarks/reasoning_suite.json
          baseline: examples/baselines/reasoning_suite.baseline.json
          agent-provider: openai
          openai-api-key: ${{ secrets.OPENAI_API_KEY }}

Release Docs

  • docs/RELEASE_CHECKLIST.md
  • docs/DEMO_RUNBOOK.md

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agent_stability_engine-0.1.0.tar.gz (38.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

agent_stability_engine-0.1.0-py3-none-any.whl (38.9 kB view details)

Uploaded Python 3

File details

Details for the file agent_stability_engine-0.1.0.tar.gz.

File metadata

  • Download URL: agent_stability_engine-0.1.0.tar.gz
  • Upload date:
  • Size: 38.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for agent_stability_engine-0.1.0.tar.gz
Algorithm Hash digest
SHA256 21602eaa5ba142ebff2437afe9f70c801f7390c8edaa87d03ef8a36aaf73ac57
MD5 64ca41b05a97a3e059ec8970ef9c4fb6
BLAKE2b-256 f0bd9bf55441a650fd5e094c6c9486b8c202e91c713c169726050b8d60301cb8

See more details on using hashes here.

File details

Details for the file agent_stability_engine-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for agent_stability_engine-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5f2bbeb4c912c05a82b49bc5df99f585199da3ecddefb3267852007da2aca29a
MD5 7f275b5ee23393a2db3d808223b3ed72
BLAKE2b-256 324a2b069ea8f818c84d8e2192957a6f64e8661356c7cbbacb4913ea0651e36a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page