Skip to main content

Git-style diff and version control for LLM prompts

Project description

Git-style diff and version control for LLM prompts

CI PyPI Python License


The Problem

Prompts are code. Treat them like it.

You iterate on prompts dozens of times. You tweak a system message, change a few words, restructure instructions. But you have no history, no way to compare versions, and no idea if the new version is actually better.

promptdiff fixes that. Track every version, see exactly what changed (both textually and semantically), evaluate regressions, and maintain a changelog. All from the command line.

Features

  • 📦 Version Control - Store and track every prompt version with messages and metadata
  • 🔀 Smart Diffs - Line-level text diffs with additions, deletions, and similarity scores
  • 🧠 Similarity Scoring - Word-overlap (Jaccard) similarity built in, OpenAI embeddings optional
  • 🏷️ Tags & Registry - Organize prompts with tags, find them by name or label
  • 📊 Evaluation - Run prompt versions against test cases and score results
  • 📋 Changelog - Auto-generate version history with diff stats
  • 💻 CLI First - Beautiful terminal output powered by Rich

Quick Start

pip install promptdiff

Initialize and start tracking

# Initialize a promptdiff repo

> **New here?** Start with the [Getting Started Guide](GETTING_STARTED.md).
promptdiff init

# Add your first prompt version
echo "Summarize this text: {text}" | promptdiff add summarizer -m "Initial version"

# Iterate on it
echo "You are an expert summarizer. Summarize the text below in 2 sentences.

Text: {text}

Summary:" | promptdiff add summarizer -m "Added role and structure"

# See what changed
promptdiff diff summarizer 1 2

Terminal Output

Diff: summarizer v1 -> v2

- Summarize this text: {text}
+ You are an expert summarizer. Summarize the text below in 2 sentences.
+
+ Text: {text}
+
+ Summary:

Text similarity:       32.5%
Word-overlap similarity: 54.2%
Changes: +4 -1
# View version history
promptdiff log summarizer

# List all tracked prompts
promptdiff list

# Generate a changelog
promptdiff changelog summarizer

Python API

from promptdiff import PromptStore, PromptDiff, PromptRegistry

# Initialize
store = PromptStore(".")
store.init()

# Track versions
store.add("my-prompt", "Hello {name}", message="v1")
store.add("my-prompt", "Hi there, {name}!", message="More friendly")

# Compare
differ = PromptDiff()
v1 = store.get_version("my-prompt", 1)
v2 = store.get_version("my-prompt", 2)
result = differ.full_diff(v1.content, v2.content, 1, 2)

print(f"Similarity: {result.similarity_ratio:.1%}")
print(f"Semantic:   {result.semantic_similarity:.1%}")

Version Control

Every prompt gets its own directory with numbered versions and metadata:

.promptdiff/
  prompts/
    summarizer/
      meta.json      # name, tags, version history
      v1.txt         # version 1 content
      v2.txt         # version 2 content
      v3.txt         # version 3 content

Each version stores a content hash, timestamp, message, and arbitrary metadata. Duplicate content is detected and skipped automatically.

Similarity Scoring

Beyond line-level text diffs, promptdiff computes similarity between versions:

  • Built-in: Jaccard word-overlap similarity (zero dependencies)
  • Optional: OpenAI embedding cosine similarity for true semantic comparison (pip install promptdiff[embeddings])

The built-in scorer measures word overlap, which is useful for detecting surface-level changes. For actual semantic similarity (detecting meaning changes), use the optional embeddings integration.

Evaluation

Run prompt versions against test cases to catch regressions:

from promptdiff.eval import PromptEvaluator, TestCase

evaluator = PromptEvaluator(
    runner=my_llm_runner,       # your function: (template, vars) -> output
    scorer=my_custom_scorer,    # your function: (output, expected) -> float
)

cases = [
    TestCase("short_text", {"text": "AI is cool."}, "AI is interesting."),
    TestCase("long_text", {"text": long_article}, expected_summary),
]

result = evaluator.evaluate("summarizer", 3, prompt_content, cases)
print(f"Score: {result.mean_score:.1%}")

Built-in scorers: exact_match_scorer, contains_scorer, similarity_scorer.

Changelog

Auto-generate changelogs from your version history:

promptdiff changelog summarizer
## v3 (2025-01-15)
**Added constraint to focus on facts**
- Text similarity: 92.3%
- Semantic similarity: 87.1%
- Changes: +2 -0

## v2 (2025-01-14)
**Improved with role and clearer instructions**
- Text similarity: 32.5%
- Semantic similarity: 54.2%
- Changes: +4 -1

CI Integration

Add prompt regression checks to your CI pipeline:

# .github/workflows/prompt-check.yml
- name: Check prompt quality
  run: |
    pip install promptdiff
    promptdiff eval summarizer 3

Or use the Python API in your test suite:

def test_prompt_similarity():
    """Ensure new version isn't too different from production."""
    store = PromptStore(".")
    differ = PromptDiff()
    v_prod = store.get_version("summarizer", 2)
    v_new = store.get_version("summarizer", 3)
    result = differ.full_diff(v_prod.content, v_new.content)
    assert result.similarity_ratio > 0.7, "Prompt changed too much!"

CLI Reference

Command Description
promptdiff init Initialize a new promptdiff repository
promptdiff add <name> -m "msg" Add a new prompt version
promptdiff diff <name> <v1> <v2> Show diff between versions
promptdiff log <name> Show version history
promptdiff list List all tracked prompts
promptdiff changelog <name> Generate changelog
promptdiff eval <name> <version> Evaluate a prompt version

License

MIT License. Copyright (c) 2025 Manas Vardhan.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llm_promptdiff-0.1.0.tar.gz (15.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llm_promptdiff-0.1.0-py3-none-any.whl (14.2 kB view details)

Uploaded Python 3

File details

Details for the file llm_promptdiff-0.1.0.tar.gz.

File metadata

  • Download URL: llm_promptdiff-0.1.0.tar.gz
  • Upload date:
  • Size: 15.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for llm_promptdiff-0.1.0.tar.gz
Algorithm Hash digest
SHA256 08f063efc59752ee4ab9352e85a7e1a75562eac9d5d10249630edfce8eabe689
MD5 6df3ad4785a0f7aeb58338330d2fc1bb
BLAKE2b-256 6a37330784944a3f7a401ec5bb08c4a2f80c750b6b11224cb1717338842fe028

See more details on using hashes here.

File details

Details for the file llm_promptdiff-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: llm_promptdiff-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 14.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for llm_promptdiff-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 8fc1391fe19bab8f8c4575524d44d7f0615434dc10fcbd243210123422af0d3b
MD5 742306d77f762a94f49758eb673eb97b
BLAKE2b-256 67c7b771c9d15e4b4f75155fef37763369b942adf99942bb62916a402a1991fe

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page