Git-style diff and version control for LLM prompts
Project description
Git-style diff and version control for LLM prompts
The Problem
Prompts are code. Treat them like it.
You iterate on prompts dozens of times. You tweak a system message, change a few words, restructure instructions. But you have no history, no way to compare versions, and no idea if the new version is actually better.
promptdiff fixes that. Track every version, see exactly what changed (both textually and semantically), evaluate regressions, and maintain a changelog. All from the command line.
Features
- 📦 Version Control - Store and track every prompt version with messages and metadata
- 🔀 Smart Diffs - Line-level text diffs with additions, deletions, and similarity scores
- 🧠 Similarity Scoring - Word-overlap (Jaccard) similarity built in, OpenAI embeddings optional
- 🏷️ Tags & Registry - Organize prompts with tags, find them by name or label
- 📊 Evaluation - Run prompt versions against test cases and score results
- 📋 Changelog - Auto-generate version history with diff stats
- 💻 CLI First - Beautiful terminal output powered by Rich
Quick Start
pip install llm-promptdiff
Initialize and start tracking
# Initialize a promptdiff repo
> **New here?** Start with the [Getting Started Guide](GETTING_STARTED.md).
promptdiff init
# Add your first prompt version
echo "Summarize this text: {text}" | promptdiff add summarizer -m "Initial version"
# Iterate on it
echo "You are an expert summarizer. Summarize the text below in 2 sentences.
Text: {text}
Summary:" | promptdiff add summarizer -m "Added role and structure"
# See what changed
promptdiff diff summarizer 1 2
Terminal Output
Diff: summarizer v1 -> v2
- Summarize this text: {text}
+ You are an expert summarizer. Summarize the text below in 2 sentences.
+
+ Text: {text}
+
+ Summary:
Text similarity: 32.5%
Word-overlap similarity: 54.2%
Changes: +4 -1
# View version history
promptdiff log summarizer
# List all tracked prompts
promptdiff list
# Generate a changelog
promptdiff changelog summarizer
Python API
from promptdiff import PromptStore, PromptDiff, PromptRegistry
# Initialize
store = PromptStore(".")
store.init()
# Track versions
store.add("my-prompt", "Hello {name}", message="v1")
store.add("my-prompt", "Hi there, {name}!", message="More friendly")
# Compare
differ = PromptDiff()
v1 = store.get_version("my-prompt", 1)
v2 = store.get_version("my-prompt", 2)
result = differ.full_diff(v1.content, v2.content, 1, 2)
print(f"Similarity: {result.similarity_ratio:.1%}")
print(f"Semantic: {result.semantic_similarity:.1%}")
Version Control
Every prompt gets its own directory with numbered versions and metadata:
.promptdiff/
prompts/
summarizer/
meta.json # name, tags, version history
v1.txt # version 1 content
v2.txt # version 2 content
v3.txt # version 3 content
Each version stores a content hash, timestamp, message, and arbitrary metadata. Duplicate content is detected and skipped automatically.
Similarity Scoring
Beyond line-level text diffs, promptdiff computes similarity between versions:
- Built-in: Jaccard word-overlap similarity (zero dependencies)
- Optional: OpenAI embedding cosine similarity for true semantic comparison (
pip install llm-promptdiff[embeddings])
The built-in scorer measures word overlap, which is useful for detecting surface-level changes. For actual semantic similarity (detecting meaning changes), use the optional embeddings integration.
Evaluation
Run prompt versions against test cases to catch regressions:
from promptdiff.eval import PromptEvaluator, TestCase
evaluator = PromptEvaluator(
runner=my_llm_runner, # your function: (template, vars) -> output
scorer=my_custom_scorer, # your function: (output, expected) -> float
)
cases = [
TestCase("short_text", {"text": "AI is cool."}, "AI is interesting."),
TestCase("long_text", {"text": long_article}, expected_summary),
]
result = evaluator.evaluate("summarizer", 3, prompt_content, cases)
print(f"Score: {result.mean_score:.1%}")
Built-in scorers: exact_match_scorer, contains_scorer, similarity_scorer.
Changelog
Auto-generate changelogs from your version history:
promptdiff changelog summarizer
## v3 (2025-01-15)
**Added constraint to focus on facts**
- Text similarity: 92.3%
- Semantic similarity: 87.1%
- Changes: +2 -0
## v2 (2025-01-14)
**Improved with role and clearer instructions**
- Text similarity: 32.5%
- Semantic similarity: 54.2%
- Changes: +4 -1
CI Integration
Add prompt regression checks to your CI pipeline:
# .github/workflows/prompt-check.yml
- name: Check prompt quality
run: |
pip install llm-promptdiff
promptdiff eval summarizer 3
Or use the Python API in your test suite:
def test_prompt_similarity():
"""Ensure new version isn't too different from production."""
store = PromptStore(".")
differ = PromptDiff()
v_prod = store.get_version("summarizer", 2)
v_new = store.get_version("summarizer", 3)
result = differ.full_diff(v_prod.content, v_new.content)
assert result.similarity_ratio > 0.7, "Prompt changed too much!"
CLI Reference
| Command | Description |
|---|---|
promptdiff init |
Initialize a new promptdiff repository |
promptdiff add <name> -m "msg" |
Add a new prompt version |
promptdiff diff <name> <v1> <v2> |
Show diff between versions |
promptdiff log <name> |
Show version history |
promptdiff list |
List all tracked prompts |
promptdiff changelog <name> |
Generate changelog |
promptdiff eval <name> <version> |
Evaluate a prompt version |
License
MIT License. Copyright (c) 2025 Manas Vardhan.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file llm_promptdiff-0.1.1.tar.gz.
File metadata
- Download URL: llm_promptdiff-0.1.1.tar.gz
- Upload date:
- Size: 15.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a5cdedd6385300a235ca375f75d2d4e63539519e69cfb4670164e980900906d1
|
|
| MD5 |
784783cf65f5787a64c08ca492d7de87
|
|
| BLAKE2b-256 |
0a0f14ba721e9797150194339f32df6b446cb8fde5a83b71320d8b13d5de475e
|
File details
Details for the file llm_promptdiff-0.1.1-py3-none-any.whl.
File metadata
- Download URL: llm_promptdiff-0.1.1-py3-none-any.whl
- Upload date:
- Size: 14.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
89dac043189390bee5ba21ef12b4b30abe485feb73c1a67ce97cb8c711d22c19
|
|
| MD5 |
23d1c2924b406be780d3b6941edbe85a
|
|
| BLAKE2b-256 |
713c7556889dd1704bceaf44c3a164440ad6831a873332030aba397372315903
|