Skip to main content

Prompt Sensitivity Analyzer — measure robustness, entropy, and token importance of LLM prompts.

Project description

promptgrad

Prompt Sensitivity Analyzer — measure how robust your LLM prompts are to linguistic perturbations.

PyPI version Python 3.9+ License: MIT CI


Prompt engineering is chaos. The same instruction, phrased slightly differently, can produce wildly different outputs. promptgrad quantifies this instability.

Given a prompt, it:

  1. Perturbs it with six linguistic strategies (synonym swap, word deletion, paraphrase, casing, punctuation, sentence reorder)
  2. Embeds every variant using TF-IDF (built-in, zero dependencies) or semantic models
  3. Computes cosine similarity shifts, output variance, and Shannon entropy
  4. Detects unstable instructions and surfaces which tokens drive the instability
  5. Outputs a robustness score (0–1), a per-token sensitivity heatmap, and a strategy-level radar chart

Installation

# Minimal install (TF-IDF embeddings, no ML dependencies)
pip install promptgrad

# With local semantic embeddings (recommended)
pip install "promptgrad[local]"

# With OpenAI embeddings
pip install "promptgrad[openai]"

# With visualisation
pip install "promptgrad[viz]"

# Everything
pip install "promptgrad[all]"

Quick Start

from promptgrad import PromptAnalyzer

analyzer = PromptAnalyzer()

report = analyzer.analyze("Summarize the document in three sentences.")

print(report.robustness_score)      # e.g. 0.847
print(report.stability_label)       # "STABLE"
print(report.top_sensitive_tokens()) # [("Summarize", 0.91), ("three", 0.72), ...]

for warning in report.warnings:
    print(warning)

Visualise

analyzer.plot(report)                 # All four charts
analyzer.plot(report, kind="heatmap") # Just the token heatmap
analyzer.plot(report, kind="gauge")   # Just the robustness gauge
analyzer.plot(report, save_path="report.png", show=False)

Compare multiple prompts

result = analyzer.compare([
    "Summarize the document in three sentences.",
    "Please provide a summary of the document.",
    "What are the key points in this document?",
])

for prompt, score in result["ranked"]:
    print(f"{score:.3f}  {prompt}")

CLI

# Analyse a prompt
promptgrad analyze "List the top 5 programming languages."

# With semantic embeddings and save plot
promptgrad analyze "Explain quantum computing." \
    --backend sentence_transformers \
    --plot report.png

# Export JSON report
promptgrad analyze "Write a haiku about rain." --json report.json

# Compare prompts from a file (one per line)
promptgrad compare prompts.txt

Outputs

Field Type Description
robustness_score float 0.0 (unstable) → 1.0 (robust)
stability_label str VERY STABLE / STABLE / FRAGILE / UNSTABLE
mean_cosine_similarity float Average similarity to original across all perturbations
embedding_shift_std float Standard deviation of L2 shifts
entropy float Shannon entropy of the similarity distribution
output_variance float | None Variance of LLM output embeddings (if provided)
token_importance dict[str, float] Token → sensitivity score (0–1)
per_strategy dict[str, float] Mean cosine similarity per perturbation type
warnings list[str] Human-readable instability warnings

Embedding Backends

Backend Requires Quality Use when
tfidf nothing lexical only Fast tests, CI, offline
sentence_transformers pip install promptgrad[local] semantic Recommended default
openai pip install promptgrad[openai] + API key semantic Production / highest quality
# Auto-selects sentence_transformers if installed, else tfidf
analyzer = PromptAnalyzer(embedding_backend="auto")

# Explicit
analyzer = PromptAnalyzer(embedding_backend="sentence_transformers",
                          model_name="all-MiniLM-L6-v2")

analyzer = PromptAnalyzer(embedding_backend="openai",
                          model="text-embedding-3-small",
                          api_key="sk-...")

Perturbation Strategies

Strategy What it does
synonym_substitution Swaps words with near-synonyms
word_deletion Removes individual non-stopword tokens
paraphrase Structural rewrites (imperative ↔ question, etc.)
casing_variation Title, UPPER, lower, mIxEd case
punctuation_variation Adds, removes, or changes terminal punctuation
order_shuffle Shuffles sentence order in multi-sentence prompts

Use all of them (default) or pick a subset:

analyzer = PromptAnalyzer(perturbations=["synonym_substitution", "word_deletion"])

Bring your own:

from promptgrad import Perturbation, PerturbationResult

class BackTranslation(Perturbation):
    name = "back_translation"

    def apply(self, prompt, n=5, seed=None):
        # ... call translation API ...
        return [PerturbationResult(original=prompt, perturbed=translated, strategy=self.name)]

analyzer = PromptAnalyzer(perturbations=[BackTranslation()])

Measuring Output Variance

If you have LLM responses for the perturbed prompts, pass them in to compute semantic output variance:

import openai

client = openai.OpenAI()

def complete(prompt):
    return client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": prompt}]
    ).choices[0].message.content

# Generate outputs for each perturbed variant
from promptgrad.perturbations import SynonymSubstitution
perturbed = SynonymSubstitution().apply(my_prompt, n=10)
outputs = [complete(r.perturbed) for r in perturbed]

report = analyzer.analyze(my_prompt, output_texts=outputs)
print(report.output_variance)

Development

git clone https://github.com/Liodon-AI/promptgrad
cd promptgrad
pip install -e ".[dev]"
pytest -v

License

MIT © Liodon AI

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

promptgrad-0.1.0.tar.gz (17.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

promptgrad-0.1.0-py3-none-any.whl (21.0 kB view details)

Uploaded Python 3

File details

Details for the file promptgrad-0.1.0.tar.gz.

File metadata

  • Download URL: promptgrad-0.1.0.tar.gz
  • Upload date:
  • Size: 17.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for promptgrad-0.1.0.tar.gz
Algorithm Hash digest
SHA256 45bb6a5c27968573e2121ee88428506238104e30e90322db633ef0ceeb659ff6
MD5 b6a6503e0d22f2297b5df39c9939be16
BLAKE2b-256 6f3d37136e6f4f8fa30e58df8899353a5d14d67e1ddfc8efdeb8701d2f210855

See more details on using hashes here.

File details

Details for the file promptgrad-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: promptgrad-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 21.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for promptgrad-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 274f62f48fdc828e2b5fc1e1d4157b6a7abb2de7db4764695b9f30075ebf2027
MD5 274f4b6717048bfe2572453386b9e1b7
BLAKE2b-256 983fb5238c8b30e860c7a67affb2704864f7d7a053906e0c7e5b662a117acc6f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page