Skip to main content

Version, compare, and manage your LLM prompts. No API keys required. Bring your own model.

Project description

LLMPromptVault ๐Ÿ”

Version and compare your LLM prompts. No API key required. Bring your own model.

PyPI version Python License: MIT


Why LLMPromptVault Exists

Prompt engineering is iterative.

You tweak wording.
You test outputs.
You switch models.
You measure cost.
You forget what worked.

LLMPromptVault gives structure to that chaos.

It helps you:

  • Track prompt versions
  • Log LLM responses with metrics
  • Compare prompt variants side-by-side
  • Analyze performance over time

What LLMPromptVault Does

LLMPromptVault is a prompt lifecycle management library โ€” not an LLM client.

It does three core things:

  1. Version your prompts โ€” track every change like Git
  2. Log responses โ€” store outputs with latency and token usage
  3. Compare prompts โ€” see side-by-side differences with metrics

โš ๏ธ LLMPromptVault never calls any LLM.

You call your own model โ€” OpenAI, Claude, Gemini, Ollama, local models โ€” and pass the response to LLMPromptVault.


Installation

pip install llmpromptvault

Only one dependency: pyyaml.
Everything else uses Pythonโ€™s standard library.


Quick Start

from llmpromptvault import Prompt, Compare

# 1๏ธโƒฃ Define two versions of a prompt
v1 = Prompt("summarize", template="Summarize this: {text}", version="v1")
v2 = Prompt("summarize", template="Summarize in 3 bullet points: {text}", version="v2")

# 2๏ธโƒฃ YOU call your LLM (any model, any way you like)
r1 = your_llm(v1.render(text="Some article content..."))
r2 = your_llm(v2.render(text="Some article content..."))

# 3๏ธโƒฃ Compare and log results
cmp = Compare(v1, v2)
cmp.log(r1, r2)
cmp.show()

Example Output:

โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
  LLMPROMPTVAULT COMPARISON
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
  Prompt A                       summarize (v1)
  Prompt B                       summarize (v2)
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€

  โ”€โ”€ Response A โ”€โ”€
  Here is a summary of the article...

  โ”€โ”€ Response B โ”€โ”€
  โ€ข Key point one
  โ€ข Key point two
  โ€ข Key point three

โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
  Metric                           Prompt A     Prompt B
  โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
  Word count                             12           18
  Char count                             68          112
  Latency (ms)                        820.0        950.0
  Tokens                                 45           62
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€

Core API

Prompt โ€” Define and Version Prompts

from llmpromptvault import Prompt

p = Prompt(
    name="classify",
    template="Classify this text as positive or negative: {text}",
    version="v1",
    description="Sentiment classifier",
    tags=["classify", "sentiment"],
)

# Variables required by template
p.variables()      # ['text']

# Render prompt (no LLM call happens here)
rendered = p.render(text="I love this product!")

# You call your LLM
response = your_llm(rendered)

# Log run metadata
p.log(
    rendered_prompt=rendered,
    response=response,
    model="gpt-4o-mini",
    latency_ms=820,
    tokens=45,
)

# Aggregate statistics
p.stats()

# Raw run history
p.runs(last_n=10)

Versioning

# Create a new version โ€” v1 automatically preserved
v2 = p.update(
    new_template="You are a sentiment expert. Classify as positive/negative/neutral: {text}"
)

# View full history
p.history()

Versioning is explicit and controlled โ€” nothing happens automatically.


Save & Load YAML

Export prompts as human-readable YAML:

p.save("prompts/classify.yaml")

Load anywhere:

p = Prompt.load("prompts/classify.yaml")

Example YAML:

name: classify
version: v1
description: Sentiment classifier
template: "Classify this text as positive or negative: {text}"
tags:
  - classify
  - sentiment

Compare โ€” Side-by-Side Prompt Evaluation

from llmpromptvault import Compare

cmp = Compare(v1, v2)

cmp.log(
    response_a=response_v1,
    response_b=response_v2,
    model="gpt-4o",
    latency_ms_a=820,
    latency_ms_b=950,
    tokens_a=45,
    tokens_b=62,
)

cmp.show()
cmp.diff()
cmp.summary()

You can:

  • Compare output length
  • Compare token usage
  • Compare latency
  • Aggregate results across multiple runs

Registry โ€” Share Prompts Across Projects

from llmpromptvault import Registry

reg = Registry("./shared_prompts")

reg.push(v1)
reg.push(v2)

reg.list()
reg.versions("classify")

latest = reg.pull("classify")
specific = reg.pull("classify", "v1")

reg.delete("classify", "v1")

Registry is useful for:

  • Team collaboration
  • Shared prompt libraries
  • Reproducible experiments

Works With Any LLM

Because LLMPromptVault never calls an LLM itself, it works with anything.

OpenAI

from openai import OpenAI

client = OpenAI()

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": rendered}]
)

p.log(rendered, response.choices[0].message.content, model="gpt-4o")

Anthropic

import anthropic

client = anthropic.Anthropic(api_key="...")

response = client.messages.create(
    model="claude-haiku-4-5-20251001",
    max_tokens=1024,
    messages=[{"role": "user", "content": rendered}]
)

p.log(rendered, response.content[0].text, model="claude-haiku-4-5-20251001")

Ollama (Local)

import requests

response = requests.post(
    "http://localhost:11434/api/generate",
    json={"model": "llama3", "prompt": rendered, "stream": False}
)

p.log(rendered, response.json()["response"], model="llama3")

Typical Project Structure

your_project/
โ”œโ”€โ”€ prompts/
โ”‚   โ””โ”€โ”€ classify.yaml
โ”œโ”€โ”€ .promptvault/
โ”‚   โ”œโ”€โ”€ history.json
โ”‚   โ””โ”€โ”€ runs.db
โ””โ”€โ”€ main.py

Add .promptvault/ to .gitignore to keep run logs local, or commit it to share analytics with your team.


License

MIT ยฉ LLMPromptVault Contributors

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llmpromptvault-0.1.1.tar.gz (14.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llmpromptvault-0.1.1-py3-none-any.whl (14.0 kB view details)

Uploaded Python 3

File details

Details for the file llmpromptvault-0.1.1.tar.gz.

File metadata

  • Download URL: llmpromptvault-0.1.1.tar.gz
  • Upload date:
  • Size: 14.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for llmpromptvault-0.1.1.tar.gz
Algorithm Hash digest
SHA256 b974c1e64a8311237675ee65ac7ff7b9683727c57e6a59c70e71687fc6011b80
MD5 e62a4a058ec0e8d171b4ec05404dc6aa
BLAKE2b-256 ac8024fb4f40eed0d2b42d53eab7c3ef10a002a2137e0c12519d26630b09659e

See more details on using hashes here.

File details

Details for the file llmpromptvault-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: llmpromptvault-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 14.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for llmpromptvault-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 dfbe6dae20a82a65ca3263a2b3b35d7aedf76a47e312984046eb7c6a46c8773b
MD5 2b86fde69936885e1e3ac44c8a6d1a6f
BLAKE2b-256 717cdce49d5f3090f03e0e38dab2605d86a04ad2c341f4fa0e9b44f91a367fee

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page