Skip to main content

Package for generating Prompt Stability Score (PSS). PSS estimates the stability of outcomes resulting from variations in language model prompt specifications.

Project description

promptstability

PyPI Tests Changelog License

Package for generating Prompt Stability Scores (PSS). See paper here outlining technique for investigating the stability of outcomes resulting from variations in language model prompt specifications. Replication material here.

The current library supports both:

  • cumulative intra-PSS, where repeated runs are accumulated over time
  • adjacent intra-PSS, where each run is compared with the immediately previous run

It also supports post hoc rescoring from saved annotation tables, which is useful when you want to recompute stability summaries without rerunning the model.

Table of Contents

Requirements

  • Python 3.8 to 3.10 (Python 3.11 and above are not supported due to dependency limitations)
  • Other dependencies are installed automatically via pip

Installation

Install this library using pip:

pip install promptstability

Example Usage

Here we provide instructions for using promptstability with OpenAI and Ollama.

import pandas as pd
from promptstability.core import get_api_key
from promptstability.core import PromptStabilityAnalysis
from promptstability.core import load_example_data
import os

# Load data (news articles)
df = load_example_data()
print(df.head())
example_data = list(df['body'].values) # Take a subsample

# Define the prompt texts
original_text = 'The following are some news articles about the economy.'
prompt_postfix = 'Respond 0 for positive news, or 1 for negative news. Guess if you do not know. Respond nothing else.'

a) OpenAI Example (e.g., GPT-4o-mini)

from openai import OpenAI

# Initialize OpenAI client
# First set the OPENAI_API_KEY environment variable
APIKEY = get_api_key('openai')
client = OpenAI(api_key=APIKEY)

OPENAI_MODEL = 'gpt-4o-mini'

# Define the OpenAI annotation function
def annotate_openai(text, prompt, temperature=0.1):
    try:
        response = client.chat.completions.create(
            model=OPENAI_MODEL,
            temperature=temperature,
            messages=[
                {"role": "system", "content": prompt},
                {"role": "user", "content": text}
            ]
        )
    except Exception as e:
        print(f"OpenAI exception: {e}")
        raise e

    return ''.join(choice.message.content for choice in response.choices)

# Instantiate the analysis class using OpenAI’s annotation function (Note on warnings: Pegasus comes with automated warning about model weights, which you can ignore)
psa_openai = PromptStabilityAnalysis(annotation_function=annotate_openai, data=example_data)

# Run intra-prompt stability analysis using the method `intra_pss`
print("Running OpenAI intra-prompt analysis...")
ka_openai_intra, annotated_openai_intra = psa_openai.intra_pss(
    original_text,
    prompt_postfix,
    iterations=5,   # minimal iterations
    plot=True,
    save_path='news_intra.png',
    save_csv="news_intra.csv"
)
print("OpenAI intra-prompt KA scores:", ka_openai_intra)

# Optional: compute both cumulative and adjacent intra-PSS plus summary diagnostics
score_map, rescored_annotations, intra_summaries = psa_openai.intra_pss(
    original_text,
    prompt_postfix,
    iterations=5,
    analysis_modes=["cumulative_alpha", "adjacent_alpha"],
    return_summaries=True
)
print("Intra summaries:", intra_summaries)

# Run inter-prompt stability analysis using the method `inter_pss`
print("Running OpenAI inter-prompt analysis...")
temperatures = [0.1, 0.5, 2.0] # in practice, you would set more temperatures than this
ka_openai_inter, annotated_openai_inter = psa_openai.inter_pss(
    original_text,
    prompt_postfix,
    nr_variations=3,
    temperatures=temperatures,
    iterations=1,
    plot=True,
    save_path='news_inter.png',
    save_csv="news_inter.csv"
)
print("OpenAI inter-prompt KA scores:", ka_openai_inter)
print("Inter summaries:", psa_openai.summarize_inter_scores(ka_openai_inter))

b) Ollama Example (e.g., your local deepseek-r1:8b)

import ollama

# Make sure that your Ollama server is running locally and that 'deepseek-r1:8b' is available.
OLLAMA_MODEL = 'deepseek-r1:8b'

# Define the Ollama annotation function
def annotate_ollama(text, prompt, temperature=0.1):
    try:
        response = ollama.chat(model=OLLAMA_MODEL, messages=[
            {"role": "system", "content": prompt},
            {"role": "user", "content": text}
        ])
    except Exception as e:
        print(f"Ollama exception: {e}")
        raise e
    return response['message']['content']

# Instantiate the analysis class using Ollama’s annotation function (Note on warnings: Pegasus comes with automated warning about model weights, which you can ignore)
psa_ollama = PromptStabilityAnalysis(annotation_function=annotate_ollama, data=example_data)

# Run intra-prompt stability analysis using the method `intra_pss`
print("Running Ollama intra-prompt analysis...")
ka_ollama_intra, annotated_ollama_intra = psa_ollama.intra_pss(
    original_text,
    prompt_postfix,
    iterations=5,
    plot=False
)
print("Ollama intra-prompt KA scores:", ka_ollama_intra)

# Run inter-prompt stability analysis using the method `inter_pss`
temperatures = [0.1, 2.0, 5.0]  # or whichever temperatures you want to test
print("Running Ollama inter-prompt analysis...")
ka_ollama_inter, annotated_ollama_inter = psa_ollama.inter_pss(
    original_text,
    prompt_postfix,
    nr_variations=3,
    temperatures=temperatures,
    iterations=1,
    plot=False
)
print("Ollama inter-prompt KA scores:", ka_ollama_inter)

Post hoc rescoring from saved annotations

If you already have a long-format annotation table with id, annotation, and iteration columns, you can rescore it directly:

rescored_map, rescored_df = psa_ollama.score_intra_annotations(
    annotated_ollama_intra,
    bootstrap_samples=100,
    analysis_modes=["cumulative_alpha", "adjacent_alpha"]
)

print(psa_ollama.summarize_intra_scores(rescored_map))

API Documentation

Our full API reference documentation is hosted on Read the Docs and includes detailed information on all modules, classes, and functions.

You can access the documentation here:

PromptStability API Documentation

This documentation is automatically updated whenever changes are pushed to the repository.

Development

To contribute to this library, first checkout the code. Then create a new virtual environment:

cd promptstability
python -m venv venv
source venv/bin/activate

Now install the dependencies and test dependencies:

pip install -e '.[test]'

To run the tests:

pytest

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

promptstability-0.1.5.tar.gz (43.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

promptstability-0.1.5-py3-none-any.whl (41.9 kB view details)

Uploaded Python 3

File details

Details for the file promptstability-0.1.5.tar.gz.

File metadata

  • Download URL: promptstability-0.1.5.tar.gz
  • Upload date:
  • Size: 43.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.11.0 Darwin/25.2.0

File hashes

Hashes for promptstability-0.1.5.tar.gz
Algorithm Hash digest
SHA256 2c666cbbbc9363a7ac80641c7e383c8a3d12ffee25f9a42b2454ae972db52b09
MD5 b2e759ced6682ccfc8532bea01215bb0
BLAKE2b-256 03daa5eeccf3fc9e6a8a6f5d4e7baf7281e9d12a27dd3d5cadde9a823d16436f

See more details on using hashes here.

File details

Details for the file promptstability-0.1.5-py3-none-any.whl.

File metadata

  • Download URL: promptstability-0.1.5-py3-none-any.whl
  • Upload date:
  • Size: 41.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.11.0 Darwin/25.2.0

File hashes

Hashes for promptstability-0.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 536681e498aa724b46e7219050a1359265bccfc422b4102cec41b711696f6292
MD5 e0e47c16cdb3bc4f6701c6504696c134
BLAKE2b-256 49b1f559ed7adbbdc8848074139e274766f4fb28a774fb3c3c4089a4a2b3db23

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page