Prompt Sensitivity Analyzer — measure robustness, entropy, and token importance of LLM prompts.
Project description
promptgrad
Prompt Sensitivity Analyzer — measure how robust your LLM prompts are to linguistic perturbations.
Prompt engineering is chaos. The same instruction, phrased slightly differently, can produce wildly different outputs. promptgrad quantifies this instability.
Given a prompt, it:
- Perturbs it with six linguistic strategies (synonym swap, word deletion, paraphrase, casing, punctuation, sentence reorder)
- Embeds every variant using TF-IDF (built-in, zero dependencies) or semantic models
- Computes cosine similarity shifts, output variance, and Shannon entropy
- Detects unstable instructions and surfaces which tokens drive the instability
- Outputs a robustness score (0–1), a per-token sensitivity heatmap, and a strategy-level radar chart
Installation
# Minimal install (TF-IDF embeddings, no ML dependencies)
pip install promptgrad
# With local semantic embeddings (recommended)
pip install "promptgrad[local]"
# With OpenAI embeddings
pip install "promptgrad[openai]"
# With visualisation
pip install "promptgrad[viz]"
# Everything
pip install "promptgrad[all]"
Quick Start
from promptgrad import PromptAnalyzer
analyzer = PromptAnalyzer()
report = analyzer.analyze("Summarize the document in three sentences.")
print(report.robustness_score) # e.g. 0.847
print(report.stability_label) # "STABLE"
print(report.top_sensitive_tokens()) # [("Summarize", 0.91), ("three", 0.72), ...]
for warning in report.warnings:
print(warning)
Visualise
analyzer.plot(report) # All four charts
analyzer.plot(report, kind="heatmap") # Just the token heatmap
analyzer.plot(report, kind="gauge") # Just the robustness gauge
analyzer.plot(report, save_path="report.png", show=False)
Compare multiple prompts
result = analyzer.compare([
"Summarize the document in three sentences.",
"Please provide a summary of the document.",
"What are the key points in this document?",
])
for prompt, score in result["ranked"]:
print(f"{score:.3f} {prompt}")
CLI
# Analyse a prompt
promptgrad analyze "List the top 5 programming languages."
# With semantic embeddings and save plot
promptgrad analyze "Explain quantum computing." \
--backend sentence_transformers \
--plot report.png
# Export JSON report
promptgrad analyze "Write a haiku about rain." --json report.json
# Compare prompts from a file (one per line)
promptgrad compare prompts.txt
Outputs
| Field | Type | Description |
|---|---|---|
robustness_score |
float |
0.0 (unstable) → 1.0 (robust) |
stability_label |
str |
VERY STABLE / STABLE / FRAGILE / UNSTABLE |
mean_cosine_similarity |
float |
Average similarity to original across all perturbations |
embedding_shift_std |
float |
Standard deviation of L2 shifts |
entropy |
float |
Shannon entropy of the similarity distribution |
output_variance |
float | None |
Variance of LLM output embeddings (if provided) |
token_importance |
dict[str, float] |
Token → sensitivity score (0–1) |
per_strategy |
dict[str, float] |
Mean cosine similarity per perturbation type |
warnings |
list[str] |
Human-readable instability warnings |
Embedding Backends
| Backend | Requires | Quality | Use when |
|---|---|---|---|
tfidf |
nothing | lexical only | Fast tests, CI, offline |
sentence_transformers |
pip install promptgrad[local] |
semantic | Recommended default |
openai |
pip install promptgrad[openai] + API key |
semantic | Production / highest quality |
# Auto-selects sentence_transformers if installed, else tfidf
analyzer = PromptAnalyzer(embedding_backend="auto")
# Explicit
analyzer = PromptAnalyzer(embedding_backend="sentence_transformers",
model_name="all-MiniLM-L6-v2")
analyzer = PromptAnalyzer(embedding_backend="openai",
model="text-embedding-3-small",
api_key="sk-...")
Perturbation Strategies
| Strategy | What it does |
|---|---|
synonym_substitution |
Swaps words with near-synonyms |
word_deletion |
Removes individual non-stopword tokens |
paraphrase |
Structural rewrites (imperative ↔ question, etc.) |
casing_variation |
Title, UPPER, lower, mIxEd case |
punctuation_variation |
Adds, removes, or changes terminal punctuation |
order_shuffle |
Shuffles sentence order in multi-sentence prompts |
Use all of them (default) or pick a subset:
analyzer = PromptAnalyzer(perturbations=["synonym_substitution", "word_deletion"])
Bring your own:
from promptgrad import Perturbation, PerturbationResult
class BackTranslation(Perturbation):
name = "back_translation"
def apply(self, prompt, n=5, seed=None):
# ... call translation API ...
return [PerturbationResult(original=prompt, perturbed=translated, strategy=self.name)]
analyzer = PromptAnalyzer(perturbations=[BackTranslation()])
Measuring Output Variance
If you have LLM responses for the perturbed prompts, pass them in to compute semantic output variance:
import openai
client = openai.OpenAI()
def complete(prompt):
return client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": prompt}]
).choices[0].message.content
# Generate outputs for each perturbed variant
from promptgrad.perturbations import SynonymSubstitution
perturbed = SynonymSubstitution().apply(my_prompt, n=10)
outputs = [complete(r.perturbed) for r in perturbed]
report = analyzer.analyze(my_prompt, output_texts=outputs)
print(report.output_variance)
Development
git clone https://github.com/Liodon-AI/promptgrad
cd promptgrad
pip install -e ".[dev]"
pytest -v
License
MIT © Liodon AI
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file promptgrad-0.1.0.tar.gz.
File metadata
- Download URL: promptgrad-0.1.0.tar.gz
- Upload date:
- Size: 17.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
45bb6a5c27968573e2121ee88428506238104e30e90322db633ef0ceeb659ff6
|
|
| MD5 |
b6a6503e0d22f2297b5df39c9939be16
|
|
| BLAKE2b-256 |
6f3d37136e6f4f8fa30e58df8899353a5d14d67e1ddfc8efdeb8701d2f210855
|
File details
Details for the file promptgrad-0.1.0-py3-none-any.whl.
File metadata
- Download URL: promptgrad-0.1.0-py3-none-any.whl
- Upload date:
- Size: 21.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
274f62f48fdc828e2b5fc1e1d4157b6a7abb2de7db4764695b9f30075ebf2027
|
|
| MD5 |
274f4b6717048bfe2572453386b9e1b7
|
|
| BLAKE2b-256 |
983fb5238c8b30e860c7a67affb2704864f7d7a053906e0c7e5b662a117acc6f
|