Skip to main content

CLI-first LLM stability analyzer for measuring output consistency across repeated prompt runs.

Project description

ai-stability

Tests PyPI version License: MIT Python

ai-stability is a CLI-first LLM stability analyzer for developers who want to measure output consistency, detect prompt variance, and inspect unstable model behavior locally.

It runs the same prompt multiple times against the same model, compares the responses, computes a simple stability score, and saves a local JSON artifact for replay and debugging.

Why It Exists

LLM outputs often vary even when the prompt, model, and calling code stay the same. That makes it harder to:

  • evaluate prompt reliability
  • spot regressions during model upgrades
  • understand whether output drift is minor wording variance or meaningful behavior change
  • build confidence in AI-powered developer tooling

ai-stability is intentionally narrow and local-first:

  • one prompt file in
  • repeated model calls
  • simple, explicit similarity scoring
  • readable terminal output
  • JSON artifact saved locally for replay and debugging

Features

  • CLI-first workflow with no database, dashboard, or hosted backend
  • repeated prompt execution against the same model
  • explicit pairwise similarity and aggregate stability scoring
  • run-by-run output review
  • inline reference-vs-run diffing for fast variance inspection
  • local JSON artifact saving for debugging and replay
  • provider abstraction with OpenAI implemented first

Requirements

  • Python 3.11+
  • An OpenAI API key in OPENAI_API_KEY

Install

Recommended for end users

pipx install ai-stability

For development

python -m venv .venv
.venv\Scripts\activate
python -m pip install -e .[dev]

Configure

Set your API key in the shell:

$env:OPENAI_API_KEY="your_api_key"

You can copy .env.example for reference, but the CLI reads the key from the environment.

Quick Start

Create a prompt file:

Example prompt.txt:

Explain the tradeoffs between unit tests and integration tests in five bullet points.

Run the analyzer:

ai-stability run prompt.txt --n 5 --provider openai --model gpt-4.1-mini

If you want to invoke it through the module instead of the installed script:

python -m ai_stability run prompt.txt --n 5 --provider openai --model gpt-4.1-mini

Example with a custom JSON output path:

ai-stability run prompt.txt --n 5 --provider openai --model gpt-4.1-mini --out results\sample-run.json

CLI Command

ai-stability run PROMPT_FILE --n 5 --provider openai --model MODEL_NAME

Current options:

  • --n: number of repeated runs, minimum 2
  • --provider: currently openai
  • --model: target model name
  • --temperature: sampling temperature, default 1.0
  • --out: optional output file or output directory for the JSON artifact

How Scoring Works

The v1 scoring heuristic is intentionally simple and inspectable:

  1. normalize whitespace in each output
  2. compute pairwise text similarity with Python's difflib.SequenceMatcher
  3. average all pairwise similarity scores
  4. convert the average to a 0-100 stability score

Stability labels:

  • 80-100: High stability
  • 50-79: Medium stability
  • 0-49: Low stability

What the CLI Prints

  • summary first
  • average and pairwise similarity
  • final stability score and label
  • each run output
  • a simple reference-vs-run diff for variation review

JSON Artifact

By default, results are written to results/ai-stability-YYYYMMDD-HHMMSS.json.

The JSON artifact includes:

  • prompt metadata
  • provider and model
  • all collected outputs
  • pairwise similarities
  • stability score and label
  • human-readable diffs

Example Workflow

ai-stability run prompt.txt --n 5 --provider openai --model gpt-4.1-mini

Use this when you want to compare how stable a model is for a fixed prompt before shipping a prompt change, swapping models, or debugging flaky output behavior.

Run Tests

python -m pytest

Repository Structure

src/ai_stability/
  cli.py
  runner.py
  scoring.py
  diffing.py
  output.py
  storage.py
  providers/
    base.py
    openai_provider.py
tests/
  test_scoring.py
  test_runner.py

Release Process

ai-stability is published on PyPI:

Future releases are intended to be published from GitHub Actions with PyPI Trusted Publishing.

Typical release flow:

  1. update the version in pyproject.toml and src/ai_stability/__init__.py
  2. commit and push the release commit
  3. create and push a Git tag like v0.1.1
  4. let the publish.yml workflow run tests, build distributions, publish to PyPI, and create or update the matching GitHub release automatically

PyPI Trusted Publishing still requires one-time configuration on PyPI for this repository before automated publishing will succeed.

Example:

git tag v0.1.1
git push origin v0.1.1

Files to Review First

  • src/ai_stability/cli.py
  • src/ai_stability/runner.py
  • src/ai_stability/scoring.py
  • src/ai_stability/providers/openai_provider.py

Roadmap Notes

  • V1 runs requests sequentially on purpose.
  • Only OpenAI is implemented, but the provider boundary is small and ready for Anthropic later.
  • The scoring heuristic is intentionally simple and inspectable rather than statistically sophisticated.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ai_stability-0.1.2.tar.gz (13.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ai_stability-0.1.2-py3-none-any.whl (13.8 kB view details)

Uploaded Python 3

File details

Details for the file ai_stability-0.1.2.tar.gz.

File metadata

  • Download URL: ai_stability-0.1.2.tar.gz
  • Upload date:
  • Size: 13.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ai_stability-0.1.2.tar.gz
Algorithm Hash digest
SHA256 4143ff59a2db71c1f4c0a7a9c5eb507446de0791ceafa63b80b1c806c021f3b7
MD5 12a6f2397f3466a10450c9392a0ed7c9
BLAKE2b-256 ebd393a01cecbd6092f8af06f17c6c4b9346b32ec6b554adf8199d072f23ac6c

See more details on using hashes here.

Provenance

The following attestation bundles were made for ai_stability-0.1.2.tar.gz:

Publisher: publish.yml on BuildWithAbid/ai-stability

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ai_stability-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: ai_stability-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 13.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ai_stability-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 c18ec0538a2f2339a30620a9e7f27e51ad4974d0314167a824914ae3c1165a99
MD5 b899389c271b05dd18cd78b0753ffe9f
BLAKE2b-256 fab49b88441d91488628145d9bddfd931ff9a0237acdeec51a384bee63db4035

See more details on using hashes here.

Provenance

The following attestation bundles were made for ai_stability-0.1.2-py3-none-any.whl:

Publisher: publish.yml on BuildWithAbid/ai-stability

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page