CLI-first LLM stability analyzer for measuring output consistency across repeated prompt runs.

These details have not been verified by PyPI

Project links

Project description

ai-stability

ai-stability is a CLI-first LLM stability analyzer for developers who want to measure output consistency, detect prompt variance, and inspect unstable model behavior locally.

It runs the same prompt multiple times against the same model, compares the responses, computes a simple stability score, and saves a local JSON artifact for replay and debugging.

Why It Exists

LLM outputs often vary even when the prompt, model, and calling code stay the same. That makes it harder to:

evaluate prompt reliability
spot regressions during model upgrades
understand whether output drift is minor wording variance or meaningful behavior change
build confidence in AI-powered developer tooling

ai-stability is intentionally narrow and local-first:

one prompt file in
repeated model calls
simple, explicit similarity scoring
readable terminal output
JSON artifact saved locally for replay and debugging

Features

CLI-first workflow with no database, dashboard, or hosted backend
repeated prompt execution against the same model
explicit pairwise similarity and aggregate stability scoring
run-by-run output review
inline reference-vs-run diffing for fast variance inspection
local JSON artifact saving for debugging and replay
provider abstraction with OpenAI implemented first

Requirements

Python 3.11+
An OpenAI API key in OPENAI_API_KEY

Install

python -m venv .venv
.venv\Scripts\activate
python -m pip install -e .[dev]

Configure

Set your API key in the shell:

$env:OPENAI_API_KEY="your_api_key"

You can copy .env.example for reference, but the CLI reads the key from the environment.

Quick Start

Create a prompt file:

Example prompt.txt:

Explain the tradeoffs between unit tests and integration tests in five bullet points.

Run the analyzer:

ai-stability run prompt.txt --n 5 --provider openai --model gpt-4.1-mini

If you want to invoke it through the module instead of the installed script:

python -m ai_stability run prompt.txt --n 5 --provider openai --model gpt-4.1-mini

Example with a custom JSON output path:

ai-stability run prompt.txt --n 5 --provider openai --model gpt-4.1-mini --out results\sample-run.json

CLI Command

ai-stability run PROMPT_FILE --n 5 --provider openai --model MODEL_NAME

Current options:

--n: number of repeated runs, minimum 2
--provider: currently openai
--model: target model name
--temperature: sampling temperature, default 1.0
--out: optional output file or output directory for the JSON artifact

How Scoring Works

The v1 scoring heuristic is intentionally simple and inspectable:

normalize whitespace in each output
compute pairwise text similarity with Python's difflib.SequenceMatcher
average all pairwise similarity scores
convert the average to a 0-100 stability score

Stability labels:

80-100: High stability
50-79: Medium stability
0-49: Low stability

What the CLI Prints

summary first
average and pairwise similarity
final stability score and label
each run output
a simple reference-vs-run diff for variation review

JSON Artifact

By default, results are written to results/ai-stability-YYYYMMDD-HHMMSS.json.

The JSON artifact includes:

prompt metadata
provider and model
all collected outputs
pairwise similarities
stability score and label
human-readable diffs

Example Workflow

ai-stability run prompt.txt --n 5 --provider openai --model gpt-4.1-mini

Use this when you want to compare how stable a model is for a fixed prompt before shipping a prompt change, swapping models, or debugging flaky output behavior.

Run Tests

python -m pytest

Repository Structure

src/ai_stability/
  cli.py
  runner.py
  scoring.py
  diffing.py
  output.py
  storage.py
  providers/
    base.py
    openai_provider.py
tests/
  test_scoring.py
  test_runner.py

Files to Review First

src/ai_stability/cli.py
src/ai_stability/runner.py
src/ai_stability/scoring.py
src/ai_stability/providers/openai_provider.py

Roadmap Notes

V1 runs requests sequentially on purpose.
Only OpenAI is implemented, but the provider boundary is small and ready for Anthropic later.
The scoring heuristic is intentionally simple and inspectable rather than statistically sophisticated.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.1.2

Apr 6, 2026

0.1.1

Apr 6, 2026

This version

0.1.0

Apr 6, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ai_stability-0.1.0.tar.gz (13.6 kB view details)

Uploaded Apr 6, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

ai_stability-0.1.0-py3-none-any.whl (13.4 kB view details)

Uploaded Apr 6, 2026 Python 3

File details

Details for the file ai_stability-0.1.0.tar.gz.

File metadata

Download URL: ai_stability-0.1.0.tar.gz
Upload date: Apr 6, 2026
Size: 13.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for ai_stability-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`3dc6454026f3bb578d8b609b35dd918ae35fb6c567ddd56ac6b481095c4aa50d`
MD5	`111d39aa802d29890e9c902018881d25`
BLAKE2b-256	`05c55ac04ce9c7a0105ba69d7847635f04c187c77061dd436df4d645a1bf0ee5`

See more details on using hashes here.

File details

Details for the file ai_stability-0.1.0-py3-none-any.whl.

File metadata

Download URL: ai_stability-0.1.0-py3-none-any.whl
Upload date: Apr 6, 2026
Size: 13.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for ai_stability-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`803b25c4b1a5a95e3f1125b5342cd955e5ad0061de67955cf7f220b3ece67d27`
MD5	`963880b4a925d6ab0a6d594dee93c7d7`
BLAKE2b-256	`a53461f7fc4e28fbc9aa4538d281ecd79cb6eb6b74ab32d07fbcbdcba8855b79`

See more details on using hashes here.

ai-stability 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

ai-stability

Why It Exists

Features

Requirements

Install

Configure

Quick Start

CLI Command

How Scoring Works

What the CLI Prints

JSON Artifact

Example Workflow

Run Tests

Repository Structure

Files to Review First

Roadmap Notes

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes