Skip to main content

Version, diff, and A/B test your LLM prompts โ€” like git for prompts.

Project description

promptlab

PyPI version Python License: MIT Tests

Git for your prompts. Version, diff, validate, and A/B test LLM prompts with confidence.

pip install promptlab-ai

๐ŸŽฌ Demo

promptlab demo โ€” version, diff, and A/B test prompts
Creating prompts, diffing versions, running A/B tests, and promoting the winner to production


The Problem

Your prompts are the most important code you write, but you manage them as raw strings:

  • โŒ Edited inline, no version history
  • โŒ "Did that last prompt change work?" โ†’ No way to know
  • โŒ Typo in a variable โ†’ silent hallucination
  • โŒ A/B testing prompts โ†’ custom scripts every time
  • โŒ Deploying a bad prompt โ†’ rollback is copy-paste

The Solution

$ promptlab init
Created .prompts/ directory

$ promptlab list
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Prompt               โ”‚ Version โ”‚ Last Modified          โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ system_prompt        โ”‚ v3      โ”‚ 2026-04-28 14:30       โ”‚
โ”‚ search_tool_prompt   โ”‚ v2      โ”‚ 2026-04-25 09:15       โ”‚
โ”‚ summarizer           โ”‚ v5      โ”‚ 2026-05-01 16:42       โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

$ promptlab diff system_prompt v2 v3
  You are a helpful assistant.
- Be concise. Maximum 2 sentences.
+ Be thorough. Provide detailed explanations with examples.
+ Always cite sources when making factual claims.

Quick Start

1. Initialize

promptlab init
# Creates .prompts/ directory with schema

2. Create a prompt

from promptlab import Prompt

# Define a typed prompt template
system = Prompt(
    name="order_analyst",
    template="""You are an order analyst assistant.

The user will ask about maintenance order {{order_id}}.
Plant: {{plant}}
Priority: {{priority}}

Rules:
- Be concise and factual
- Always include the order number in your response
- If unsure, say so
""",
    variables={"order_id": str, "plant": str, "priority": str},
    metadata={"author": "team-alpha", "model": "gpt-4o"},
)

# Render with type validation:
rendered = system.render(order_id="4002310", plant="1010", priority="High")

# Raises TypeError if you pass wrong types or miss a variable:
system.render(order_id=123)  # TypeError: 'order_id' must be str, got int
system.render(order_id="4002310")  # TypeError: missing required variable 'plant'

3. Version your prompts

from promptlab import PromptStore

store = PromptStore(".prompts")

# Save a new version (auto-increments)
store.save(system)  # โ†’ v1

# Edit and save again
system.template += "\n- Always be polite"
store.save(system)  # โ†’ v2

# Load a specific version
v1 = store.load("order_analyst", version=1)
latest = store.load("order_analyst")  # latest version

4. Diff versions

from promptlab import diff_prompts

changes = diff_prompts(store, "order_analyst", v1=1, v2=2)
print(changes)
# + - Always be polite

Or from CLI:

promptlab diff order_analyst v1 v2

5. A/B test prompts

from promptlab import ABTest

test = ABTest(
    prompt_name="summarizer",
    version_a=3,
    version_b=4,
    dataset="eval/summarize_test.jsonl",
    metric="length",  # or custom function
)

results = test.run()
print(results)
# Version A (v3): avg_length=45.2, avg_latency=1.2s
# Version B (v4): avg_length=32.1, avg_latency=0.9s
# Winner: v4 (shorter, faster)

6. Deploy

# Promote a version to "production"
store.promote("order_analyst", version=2, env="production")

# In your app:
prompt = store.load("order_analyst", env="production")

CLI Commands

promptlab init                          # Initialize prompt store
promptlab list                          # List all prompts with versions
promptlab show <name>                   # Show latest prompt content
promptlab show <name> --version 3       # Show specific version
promptlab diff <name> v1 v2             # Diff two versions
promptlab validate                      # Validate all prompts (types, variables)
promptlab promote <name> v3 production  # Promote version to env
promptlab history <name>                # Show version history
promptlab export <name> --format json   # Export prompt as JSON

File Structure

.prompts/
โ”œโ”€โ”€ prompts.yaml          # Registry of all prompts
โ”œโ”€โ”€ order_analyst/
โ”‚   โ”œโ”€โ”€ v1.yaml           # Version 1
โ”‚   โ”œโ”€โ”€ v2.yaml           # Version 2 (current)
โ”‚   โ””โ”€โ”€ metadata.yaml     # Author, model, env mappings
โ”œโ”€โ”€ summarizer/
โ”‚   โ”œโ”€โ”€ v1.yaml
โ”‚   โ”œโ”€โ”€ v2.yaml
โ”‚   โ”œโ”€โ”€ v3.yaml
โ”‚   โ””โ”€โ”€ metadata.yaml
โ””โ”€โ”€ eval/
    โ””โ”€โ”€ summarize_test.jsonl  # A/B test datasets

Each version file:

# .prompts/order_analyst/v2.yaml
version: 2
created: "2026-04-28T14:30:00Z"
template: |
  You are an order analyst assistant.
  The user will ask about maintenance order {{order_id}}.
  ...
variables:
  order_id: { type: str, required: true }
  plant: { type: str, required: true }
  priority: { type: str, required: true, default: "Medium" }
metadata:
  author: team-alpha
  model: gpt-4o
  note: "Added politeness rule"

Features

Feature Description
Versioning Auto-incrementing versions, full history
Type Safety Pydantic-validated variables, catches typos
Diffing Compare any two versions, unified diff format
A/B Testing Run evaluations with custom metrics
Environments Promote versions to dev/staging/production
Validation CI-ready: promptlab validate catches broken prompts
Git-friendly YAML files, meaningful diffs in PRs
Templates Jinja2-style {{variable}} with defaults
Export JSON, YAML, or raw text output
Zero LLM deps Core has no LLM SDK dependency

CI Integration

# .github/workflows/prompts.yml
- name: Validate prompts
  run: promptlab validate
  # Fails if: missing variables, type errors, broken templates

Contributing

git clone https://github.com/naveenkumarbaskaran/promptlab.git
cd promptlab
python -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"
pytest

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

promptlab_ai-2.0.0.tar.gz (10.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

promptlab_ai-2.0.0-py3-none-any.whl (13.3 kB view details)

Uploaded Python 3

File details

Details for the file promptlab_ai-2.0.0.tar.gz.

File metadata

  • Download URL: promptlab_ai-2.0.0.tar.gz
  • Upload date:
  • Size: 10.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for promptlab_ai-2.0.0.tar.gz
Algorithm Hash digest
SHA256 6d7618d0d2a60a12705b7f8a04c65a9d653582eac07088d8e5d06dae8d552fc5
MD5 785d01eedb18fa33dd806d0d0a416325
BLAKE2b-256 70b87046c6efe0d7fce2b9a794d18788d80c212fd230bde940159a9efde962e6

See more details on using hashes here.

File details

Details for the file promptlab_ai-2.0.0-py3-none-any.whl.

File metadata

  • Download URL: promptlab_ai-2.0.0-py3-none-any.whl
  • Upload date:
  • Size: 13.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for promptlab_ai-2.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 6dbbd9edc017812c17c413ee3c826614a87eabe948edae8f9bc128410177e631
MD5 2c54d0b34d7e6cb2ee20fbb435a7fa80
BLAKE2b-256 7769c1daf4d973cb52d95b8664c01ed9ba0dbd7d3c485a0043879872c0f3baf5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page