Skip to main content

Git for prompts - version, diff, and test your LLM prompts across any model

Project description

PromptDiff

Git for prompts. Version your LLM prompts, diff them word-by-word, and run any version against Gemini, Groq, or any other LiteLLM-supported model — all from the terminal.

promptdiff commit summarizer "Summarize: {text}" -m "v1: baseline"
promptdiff commit summarizer "Summarize concisely in one sentence: {text}" -m "v2: tighter constraint"
promptdiff diff summarizer 1 2
promptdiff run summarizer 2 --model gemini/gemini-2.5-flash

Why PromptDiff

If you've ever overwritten a prompt and lost the version that actually worked, or pasted two prompt drafts into a doc just to eyeball what changed — PromptDiff is the tool for that. It treats prompts as versioned, diffable artifacts the same way Git treats code, and adds one thing Git can't: running any version against a real LLM and recording exactly what it produced, so you can trace output back to the exact prompt and model that generated it.

How it's different from other prompt tools

The prompt-tooling space has grown a lot — Promptfoo is a mature, YAML-config-based open-source tool with strong CI/eval integration, and platforms like PromptLayer, Langfuse, and Braintrust offer hosted, full-lifecycle prompt management with branching, RBAC, and observability.

PromptDiff isn't trying to compete with those. It's deliberately smaller: a local-first CLI for a single developer iterating on prompts, with zero config files, zero hosted accounts, and zero cost beyond free-tier LLM API usage. If you outgrow it — multiple collaborators, non-engineer prompt editors, compliance requirements — those other tools are the right next step. If you're a solo developer who just wants git commit-style version control for prompts without standing up infrastructure, PromptDiff is built for exactly that.

Features

  • Version control for prompts — every commit is an immutable, numbered version (v1, v2, v3...) scoped per prompt
  • Word-level diffs — see exactly which words changed between versions, rendered with color in your terminal (green additions, red strikethrough removals)
  • Multi-model runs — execute any prompt version against multiple LLMs in a single command and compare cost, latency, and output side by side
  • Output comparisondiff-output shows what two versions actually produced, not just what their text looks like
  • Free-tier friendly — built and tested against Gemini and Groq's free tiers; rate-limit and quota failures are recorded gracefully, never crash the tool
  • Local-first — everything lives in a local SQLite database, no account or server required

Installation

pip install promptdiff-cli

(Or, to run from source — see Development below.)

No API key is required to install or to use commit, log, and diff. You'll only need one for run and diff-output, since those actually call a model.

Quick start

commit, log, and diff work immediately after install — no API key needed. You'll only need a key for the run and diff-output commands, which actually call a model. See API keys below.

# Create a project and switch to it
promptdiff project create my-app
promptdiff use my-app

# Create a prompt and commit your first version
promptdiff add summarizer
promptdiff commit summarizer "Summarize this in one sentence: {text}" -m "v1: baseline"

# Edit and commit a new version
promptdiff commit summarizer "Summarize concisely, max 20 words: {text}" -m "v2: tighter constraint"

# See version history
promptdiff log summarizer

# See exactly what changed
promptdiff diff summarizer 1 2

# Run a version against a model (needs an API key — see below)
promptdiff run summarizer 2 --model gemini/gemini-2.5-flash

# Run the same version against multiple models at once
promptdiff run summarizer 2 --model gemini/gemini-2.5-flash --model groq/llama-3.3-70b-versatile

# Compare what two versions actually produced
promptdiff diff-output summarizer 1 2 --model gemini/gemini-2.5-flash

API keys

PromptDiff uses LiteLLM under the hood, so it works with any LiteLLM-supported provider. It's built and tested primarily against free tiers:

Copy .env.example to .env in your project directory and fill in the keys you have:

GEMINI_API_KEY=your_key_here
GROQ_API_KEY=your_key_here

Commands

Command What it does
promptdiff project create <name> Create a new project
promptdiff project list List all projects
promptdiff use <project> Set the current project (so you don't need --project on every command)
promptdiff add <prompt> Add a new prompt to the current project
promptdiff commit <prompt> <content_or_file> -m "<message>" Commit a new version. Accepts a literal string or a path to a file
promptdiff log <prompt> Show version history
promptdiff diff <prompt> <v1> <v2> Word-level diff between two versions
promptdiff run <prompt> <version> --model <model> Run a version against one or more models (repeat --model to run against several at once)
promptdiff diff-output <prompt> <v1> <v2> [--model <model>] Compare what two versions actually produced. If a version was run against multiple models, --model is required to avoid an ambiguous comparison

Every command accepts -p / --project to override the current project for that one call.

Architecture

  • Database: SQLite, 5 tables (projects, prompts, versions, runs, outputs) via SQLAlchemy + Alembic migrations
  • Diff engine: Python's difflib, line-level structure with word-level precision inside changed lines
  • Runner: LiteLLM for provider-agnostic model calls; failures (rate limits, timeouts, bad keys) are recorded as failed runs rather than crashing the CLI
  • CLI: Typer + Rich

Development

git clone https://github.com/Ragu3175/promptdiff.git
cd promptdiff
python -m venv venv
venv\Scripts\activate   # Windows
# source venv/bin/activate   # macOS/Linux
pip install -r requirements.txt
pip install -e .
pytest -v

Status

Core CLI is complete and tested: commit, log, diff, run, and diff-output all work end-to-end against real Gemini and Groq calls. A REST API layer and web UI are planned but not yet built — the CLI is fully usable on its own today.

Contributing

Issues and PRs welcome. This is an early-stage solo project — if you run into something broken or have an idea, please open an issue.

License

MIT — see LICENSE.

Author

Built by Raguram R.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

promptdiff_cli-0.1.2.tar.gz (25.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

promptdiff_cli-0.1.2-py3-none-any.whl (18.8 kB view details)

Uploaded Python 3

File details

Details for the file promptdiff_cli-0.1.2.tar.gz.

File metadata

  • Download URL: promptdiff_cli-0.1.2.tar.gz
  • Upload date:
  • Size: 25.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for promptdiff_cli-0.1.2.tar.gz
Algorithm Hash digest
SHA256 ef3ff03527403e75c90d6b5dff1d3aabb9b3fd9be8082f770b5e91af58f6697c
MD5 6d4385ee19be91bd69ed6d1356634909
BLAKE2b-256 c53ef66db23cf79e6fe3b278698fac149d824b9ee24c52c49fa9e33477d657a7

See more details on using hashes here.

File details

Details for the file promptdiff_cli-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: promptdiff_cli-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 18.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for promptdiff_cli-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 7abd6afdb170260f715f2a7982b8795f43791a07c94846c6c27da00ae7c88a43
MD5 f6cdc85174ba09906451df0fdf282bc7
BLAKE2b-256 6a61e89ad8471ded28d408a2f0eb68782df2a3f6a57d0e54c1545b9f2c32bfa2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page