Prompt Version Control — production-grade git for LLM prompts
Project description
promptvc
Prompt Version Control - A Git-Like CLI Tool and Execution Core for LLM Prompts.
1. Problem Statement
Prompts are load-bearing logical components in modern software architectures. They dictate data structures, govern agentic execution paths, and transform codebases. However, prompt engineering and integration workflows frequently lack the basic discipline applied to traditional source code.
In modern development environments, prompts suffer from the following systemic issues:
- Out-of-band management: Prompts are copy-pasted into Notion documents, shared over chat tools, or hardcoded directly in application source code.
- Missing version history: Prompt text is edited in place on production databases or server parameters with no version lineage, no change documentation, and no rollback mechanism.
- Silent regressions: Adjusting a prompt to fix an edge case on one input often degrades output quality across other inputs without triggering errors.
- Lack of observability: Latency, token consumption, and API cost metrics are rarely logged or mapped directly to the version of the prompt that generated them.
This lack of control introduces vulnerabilities when deploying LLM integrations to production. A single prompt modification can break backend parsing logic, escalate runtime API costs, or compromise model performance without developer visibility.
2. What This Tool Is (and Is NOT)
- It IS a local, version-controlled prompt registry: Prompts are saved in a structured, local format (
.promptvc/spaces/*.json) using sequential, immutable version identifiers. - It IS a declarative test and evaluation runner: Supports assertion verification (JSON validation, token counts, regex checks, semantic similarity) against evaluation datasets directly in the console.
- It IS a multi-provider execution abstraction: Runs templates across local instances (Ollama) and cloud APIs (OpenAI, Anthropic, Gemini) using a unified invocation format.
- It IS NOT a visual prompt playground: There are no web-based node diagrams, drag-and-drop elements, or third-party cloud hosting requirements.
- It IS NOT a framework-level runtime wrapper: It does not force you to write application code inside specific chains, agent classes, or SDK models.
3. Why This Tool Matters
By treating prompts as immutable, versioned code assets, this tool establishes a reliable local iteration loop:
- Reproducibility: Every execution log, evaluation result, and file change is mapped directly to a specific prompt space name, version ID, and SHA-256 hash.
- Testability: Automated test suites verify prompt outputs against assertions before updates are deployed.
- Control: Historical versions can be explicitly locked to protect stable production paths from unintended modifications.
- Transparency: Integrated token modeling estimates exact API costs and response latency across providers, preventing production cost overruns.
4. Core Features
Immutable Versioning and Space Management
Prompts are organized into logical partitions called "spaces" (e.g., summarize, code_generation). Each commit registers a new immutable version.
- Auto-incrementing IDs (v1, v2, v3).
- Version hashes computed over raw prompt text.
- Author and message metadata logs.
- Lock gates: Running
lockmarks a version as read-only, raising errors if a developer attempts to modify or commit over it.
Schema-Based Variable Injection
Prompts support template parameters using {{variable}} brackets. A version can optionally declare a JSON validation schema defining:
- Variable types (e.g., string, boolean).
- Required vs. optional flags.
- Default values (used automatically if no override is supplied).
- Documentation descriptions for interactive prompts.
Declarative Unit Testing Engine
The test module provides a local, CI-ready testing framework:
- JSON-defined test cases specifying inputs, assertions, and checks.
- Composite Scoring: Runs rules and model evaluations to aggregate a normalized case score from 0.0 to 1.0.
- Rule-Based Assertions:
contains,not_contains,regex,json_valid,min_tokens,max_tokens, andgolden. - Jaccard semantic similarity: The
goldenassertion measures similarity of token word-sets between execution output and a stored golden file. - LLM-as-a-Judge: Validates free-form model assertions (e.g. style constraints, safety guidelines) using configurable evaluation prompts against LLMs.
- Regression Delta & Gates: Compares suite performance against a base version, generating a comparative score delta table, and failing CI with threshold gates.
- Automated updates: The
test goldencommand runs the prompt version and overwrites or creates golden file records.
Multi-Step Orchestration Pipelines
Execute multi-step prompt workflows sequentially using JSON declarations:
- Downstream step templates can reference upstream step outputs using the
{{ steps.step_id.output }}syntax. - Global pipeline variables are referenced using the
{{ input.variable_name }}syntax. - Validates parameter inputs, prompt spaces, and execution paths before requesting provider resources.
Interactive Shell REPL
A stateful interactive command loop for rapid prompt debugging:
- Persistent variable binds:
var code="def add(a, b): return a + b". - Quick provider and model switching:
set provider anthropicorset model gpt-4o. - Shell runtime metrics tracking:
costaggregates accumulated token usage, latency, and dollar costs across the current session.
Diff-Based File Editing
Modify filesystem resources safely using LLM instructions:
- Safe reader checks: Attempts UTF-8, UTF-8-sig, UTF-16, and Latin-1 encodings, preserving the original file encoding during modifications.
- Unified diff validation: Parses model output as a strict unified diff, validates target contexts against original file lines, and handles space-stripped context lines.
- Atomic Backups & Rollback: Automatically saves a
.bakbackup of target files before modifying them, and restores the backup if anything goes wrong. - Idempotency checking: Automatically compares the SHA-256 fingerprint of the target file to prevent duplicate modifications.
- Approval gate: Applies modifications only after interactive validation, logging the change log entry.
Observability and Trace Logging
Gain deep runtime insight with complete execution logging:
- Transactional tracing: Automatically logs inputs, outputs, tokens, latencies, model configurations, scores, and errors to
.promptvc/traces.jsonl. - CLI querying: Search, filter, and inspect traces interactively using
promptvc trace.
Schema & Dataset Validation
Enforce strict consistency across prompt definitions and evaluations:
- Prompt validation: Runs
promptvc validate promptto verify template consistency, variable names, and schema defaults. - Dataset validation: Runs
promptvc validate datasetto verify the structure, types, and schema compatibility of bulk evaluation files.
5. CLI Reference
init
Initialize a promptvc repository in the current workspace.
- Syntax:
promptvc init - Behavior: Creates the
.promptvc/directory structure. - When to use: Set up a new local workspace.
- Example:
promptvc init
status
Provide a high-level overview of the current workspace.
- Syntax:
promptvc status - Behavior: Inspects the registry and displays active spaces, version counts, execution runs, and recent actions.
- When to use: Check workspace state.
- Example:
promptvc status
commit
Commit a new version to a prompt space.
- Syntax:
promptvc commit <name> [flags] - Flags:
--prompt <string>: Raw prompt text. If omitted, opens interactive multi-line terminal input.--message <string>: Commit message. If omitted, opens interactive terminal prompt.
- Behavior: Resolves prompt and message, validates that the space exists, checks that the latest version is not locked, and serializes the new version.
- When to use: Register a new iteration of a prompt template.
- Example:
promptvc commit translate --prompt "Translate this text to French: {{text}}" --message "v1 translation prompt"
log
Display execution commit history for a prompt space.
- Syntax:
promptvc log <name> - Behavior: Renders a structured history table containing version IDs, messages, token counts, lock status, and dates.
- When to use: Audit how a prompt space has evolved over time.
- Example:
promptvc log translate
get
Display the raw prompt content of a specific version.
- Syntax:
promptvc get <name> <version> - Behavior: Prints the raw template string. Supports the
latestversion alias. - When to use: View prompt text without metadata formatting.
- Example:
promptvc get translate latest
inspect
Display detailed metadata and schema information for a version.
- Syntax:
promptvc inspect <name> <version> - Behavior: Parses version records and outputs raw prompt text, variables, validation schema fields, lock states, and example CLI commands. Supports the
latestversion alias. - When to use: Verify a prompt's required template arguments.
- Example:
promptvc inspect translate v1
diff
Compute the token, character, and text difference between two prompt versions.
- Syntax:
promptvc diff <name> <v1> <v2> [flags] - Flags:
--text: Display unified diff lines (likegit diff).--stat: Display comparison metrics table (characters, words, and tokens).
- Behavior: Calculates delta metrics between target versions.
- When to use: Analyze changes between versions.
- Example:
promptvc diff translate v1 v2 --text
lock
Lock a prompt version to prevent modification.
- Syntax:
promptvc lock <name> <version> - Behavior: Sets the
lockedproperty totruein the space record. Succeeding commits or evaluations targeting this version will block modifications. Supports thelatestversion alias. - When to use: Mark a version as a production release.
- Example:
promptvc lock translate v1
list
List all registered prompt spaces.
- Syntax:
promptvc list - Behavior: Returns a table listing all space names, their latest active version, and version counts.
- When to use: Discover prompt spaces in the workspace.
- Example:
promptvc list
run
Execute a prompt version against a provider.
- Syntax:
promptvc run <name> <version> [flags] - Flags:
--provider <string>: Target provider (openai, anthropic, gemini, ollama, mock).--model <string>: Provider model override.--timeout <int>: Timeout limit in seconds.--max-tokens <int>: Output tokens limit.--stream: Stream tokens to stdout.--var <key=value>: Template variable binding. Repeatable.--dry-run: Renders the template to stdout without executing it.--non-interactive: Disable interactive terminal inputs.
- Behavior: Resolves template variables, renders the template, calls the provider, and prints output alongside token usage and latency. Supports the
latestversion alias. - When to use: Test prompt templates with specific inputs.
- Example:
promptvc run translate v1 --var text="Hello world" --provider openai
eval
Evaluate a prompt version against a dataset.
- Syntax:
promptvc eval <name> <version> [flags] - Flags:
--dataset <path>: Required. Path to JSON dataset file.--provider,--model,--timeout,--max-tokens,--stream,--non-interactive.
- Behavior: Executes the prompt template against each item in the dataset. Saves results to the space database. Supports the
latestversion alias. - When to use: Verify prompt output quality across batch datasets.
- Example:
promptvc eval translate v1 --dataset data.json --provider ollama --model llama3
compare
Evaluate two prompt versions on a dataset and display outputs side-by-side.
- Syntax:
promptvc compare <name> <v1> <v2> [flags] - Flags:
--dataset <path>: Required. Dataset file path.--provider,--model,--timeout,--max-tokens,--stream.
- Behavior: Runs evaluations for both versions and prints side-by-side outputs.
- When to use: Run comparative evaluations before releasing prompt updates.
- Example:
promptvc compare translate v1 v2 --dataset data.json
apply
Apply a prompt to a target file or directory using LLM-generated diffs.
- Syntax:
promptvc apply <name> <version> [flags] - Flags:
--file <path>: Target file path to modify.--dir <path>: Target directory path to modify.--glob <pattern>: Filter pattern when using--dir(default:*).--provider,--model,--timeout,--max-tokens,--stream,--var,--dry-run,--non-interactive.
- Behavior: Reads target files, requests unified diffs from the provider, shows diff lines, and applies updates upon user approval. Logs change metadata. Supports the
latestversion alias. - When to use: Run automated refactoring prompts on codebase files.
- Example:
promptvc apply refactor v1 --file src/main.py --provider openai
changes
Display the file change history for a prompt space.
- Syntax:
promptvc changes <name> - Behavior: Displays a table detailing execution timestamps, prompt versions used, and modified file paths.
- When to use: Audit which code files were modified by which prompts.
- Example:
promptvc changes refactor
config
View or modify configuration parameters.
- Syntax:
promptvc config <action> [key] [value] - Actions:
set: Bind value to config key.get: Print value of config key.list: Output entire config JSON object.
- Behavior: Reads and modifies configuration settings at
.promptvc/config.json. - When to use: Set default models, timeouts, or API keys.
- Example:
promptvc config set provider openai promptvc config set models.openai gpt-4o-mini
test
Manage and execute automated assertion test suites.
- Syntax:
promptvc test <subcommand> [flags] - Subcommands:
run <name> <version> --suite <path>: Run assertion test suite.--threshold <float>: Optional. Minimum average score (0.0 to 1.0) to pass (CI exit-code gate).--compare <version>: Optional. Version ID (e.g. v1) to check for regressions. Displays delta metrics table.--deterministic: Optional. Run only rules and skip LLM-as-a-judge assertions for speed/cost.
golden <name> <version> --suite <path>: Run cases and update stored golden files with the outputs.list [--dir <path>]: List all test suite JSON files recursively (default path is.).
- When to use: Validate prompt changes inside CI pipelines or local verification environments.
- Example:
promptvc test run summarize v2 --suite tests/suite.json --compare v1 --threshold 0.8
validate
Validate dataset files or committed prompt version schemas.
- Syntax:
promptvc validate <subcommand> - Subcommands:
dataset <file>: Checks if a dataset file is well-formed JSON, containsinputstructures, and matches expected schemas.prompt <name> <version>: Checks consistency of schema defaults, variable naming, types, and properties.
- When to use: Avoid executing corrupted runs by preemptively validating templates and test parameters.
- Example:
promptvc validate dataset test_inputs.json promptvc validate prompt summarize latest
trace
Query and inspect execution runs log.
- Syntax:
promptvc trace <name> [version] [flags] - Flags:
--last <int>: Retrieve the last N execution traces (default: 20).--json: Print raw trace logs as a JSON list.
- Behavior: Retrieves execution logs from
.promptvc/traces.jsonl, formats them as a clean summary table showing token count, latency, scores, and errors, and displays details for the latest run. - When to use: Audit runtime execution performance, debug outputs, or examine recent scoring history.
- Example:
promptvc trace summarize --last 10
pipe
Execute multi-step prompt workflows sequentially.
- Syntax:
promptvc pipe <subcommand> - Subcommands:
run <pipeline_file> [--var key=value] [--provider name]: Runs the specified multi-step pipeline.validate <pipeline_file>: Verifies pipeline syntax and reference bindings without executing.
- When to use: Compose chained tasks where downstream steps consume upstream step outputs.
- Example:
promptvc pipe run translate_and_summarize.json --var text="Hello world"
shell
Launch the stateful interactive REPL.
- Syntax:
promptvc shell - Behavior: Opens a command-line prompt loop allowing variable bindings, quick model/provider switching, and real-time cost and latency tracking.
- When to use: Iterative debugging and prompt hacking.
- Example:
promptvc shell
Global Flag Behaviors
--version: Print program version and exit.--json: Output command results in machine-readable JSON format where applicable.--provider: Invocation provider override. Resolution order:--providerflag -> value inconfig.json->mock.--model: Invocation model override. Resolution order:--modelflag -> configured default inconfig.json-> provider-native default.--var: Declares template inputs. Parsed askey=valuestrings.--non-interactive: Disables interactive console fallbacks. If required parameters are missing, exits with status 1.--timeout: Invocations terminate and raise errors if they exceed this value in seconds.--max-tokens: Instructs the provider to clamp model response length to this token count limit.--stream: Intercepts API response frames and writes them directly to stdout.
6. End-to-End Developer Workflows
Scenario A: Prompt Creation and Iteration
Initialize the project space and register a summarization template:
$ promptvc init
$ promptvc commit summarize --prompt "Summarize this: {{text}}" --message "v1 basic summary"
Verify version status:
$ promptvc log summarize
Test the template with a sample input:
$ promptvc run summarize v1 --var text="Structured version control improves pipeline reliability." --provider openai
Refine the template by committing a second iteration:
$ promptvc commit summarize --prompt "Summarize this in five words: {{text}}" --message "v2 shorter constraint"
Scenario B: Batch Evaluation and Comparison
Generate a local evaluation dataset named inputs.json:
[
{"input": "Machine learning architectures benefit from clear telemetry integration."},
{"input": "Static analysis parsing prevents runtime execution exceptions."}
]
Run a comparison run between v1 and v2:
$ promptvc compare summarize v1 v2 --dataset inputs.json --provider openai
Review outputs side-by-side, then lock the stable iteration:
$ promptvc lock summarize v2
Scenario C: Safe Codebase Modification
Create a code refactoring prompt space:
$ promptvc commit fix_imports --prompt "Refactor import blocks to keep standard libraries sorted: {{code}}" --message "v1 import sorter"
Apply the prompt to a target python file:
$ promptvc apply fix_imports v1 --file src/main.py --provider openai
Review the diff output in the terminal console. Select y to apply the patch. Check space changes logs:
$ promptvc changes fix_imports
Scenario D: CI/CD Pipeline Assertion Tests
Define a test suite in tests/summarize_suite.json:
[
{
"id": "ml_summary",
"input": {
"text": "Telemetry integrations support pipeline diagnostics."
},
"assertions": [
{ "type": "contains", "value": "telemetry" },
{ "type": "max_tokens", "value": 50 }
]
}
]
Run assertions inside automated build pipelines using --non-interactive:
$ promptvc test run summarize v2 --suite tests/summarize_suite.json --provider openai --non-interactive
7. Programmatic API (Python SDK)
While promptvc provides a comprehensive CLI, the underlying execution engine can be imported directly into Python applications. The library exports a high-level developer SDK at the root namespace, as well as a lower-level core interface (PromptRepo) for advanced repository actions.
7.1 Quick Setup & SDK Imports
First, ensure that your repository is initialized (typically via promptvc init or programmatically with PromptRepo().init_repo()). Then, import the SDK:
import promptvc
All primary execution functions, context wrappers, and result objects are exported directly from the top-level package.
7.2 Single-Shot Execution (run)
Use promptvc.run() to quickly substitute variables and execute a specific prompt version against a provider:
import promptvc
# Run prompt version with OpenAI
result = promptvc.run(
name="translator",
version="v1", # Can also use "latest"
provider="openai", # openai, anthropic, gemini, ollama, mock
model="gpt-4o-mini", # Optional model override
temperature=0.3, # Optional sampling temperature
# Template variables are passed as keyword arguments:
language="Spanish",
text="Hello, world!"
)
if result.ok:
print(f"Output: {result.output}")
print(f"Total Tokens: {result.tokens}")
print(f"Latency: {result.latency_ms} ms")
if result.cost:
print(f"Cost USD: {result.cost_usd}")
else:
print(f"Execution failed: {result.error}")
7.3 Wrapper Decorator (@prompt)
Power any Python function with a versioned prompt space using the @promptvc.prompt decorator. The function's keyword arguments will map directly to template variables, and the decorated function returns a RunResult object:
import promptvc
@promptvc.prompt("summarizer", version="latest", provider="anthropic", model="claude-3-5-sonnet")
def summarize(text: str) -> promptvc.RunResult:
"""This function is powered by promptvc 'summarizer' version 'latest'"""
pass
# Invoke the function
result = summarize(text="Prompt Version Control enforces immutability and version safety...")
print(f"Summary: {result.output}")
print(f"Model Used: {result.model}")
7.4 Context Manager (run_context)
Use the run_context context manager when you want granular execution control, automatic trace logging, or want to compute aggregate costs/latencies over dynamic sequences:
import promptvc
with promptvc.run_context("classifier", "v2", provider="gemini") as ctx:
# Run a classification task
result = ctx.run(text="The pizza was delicious!")
print(f"Classified Output: {result.output}")
print(f"Context Latency: {ctx.latency_ms} ms")
if ctx.cost:
print(f"Accumulated Cost: {promptvc.format_cost(ctx.cost.total_cost_usd)}")
7.5 Batch Execution (batch_run)
To evaluate a prompt template across multiple input dictionaries in parallel, use promptvc.batch_run(). It executes tasks concurrently using a local thread pool:
import promptvc
inputs = [
{"text": "First article contents..."},
{"text": "Second article contents..."},
{"text": "Third article contents..."}
]
# Run all inputs in parallel using up to 4 worker threads
batch_result = promptvc.batch_run(
name="summarizer",
version="v1",
inputs=inputs,
provider="openai",
max_workers=4
)
print(f"Success Rate: {batch_result.success_rate * 100}%")
print(f"Total Combined Tokens: {batch_result.total_tokens}")
if batch_result.total_cost_usd is not None:
print(f"Total Batch Cost: {promptvc.format_cost(batch_result.total_cost_usd)}")
for idx, run_res in enumerate(batch_result.results):
print(f"\n[Run {idx+1}] Output: {run_res.output}")
7.6 SDK Result Interfaces
RunResult
Returned by run(), @prompt, and individual items in BatchResult.
.ok(bool): True if execution succeeded..output(str): The raw text response from the model..tokens(int|None): Sum of input and output tokens..input_tokens(int|None): Tokens in prompt template..output_tokens(int|None): Tokens in completion response..latency_ms(float): Execution duration in milliseconds..cost(CostBreakdown|None): Detailed pricing model breakdown..cost_usd(float|None): Convenience getter for total USD cost..model(str): Model identifier used by the provider..trace_id(str): Unique trace GUID generated for telemetry lookup..error(str|None): Error traceback string if execution failed.
BatchResult
Returned by batch_run().
.results(List[RunResult]): List of execution results (matches input order)..total_tokens(int): Sum of all tokens consumed across the batch..total_cost_usd(float|None): Sum of all costs across the batch..total_latency_ms(float): Total wall-clock time elapsed for the batch sequence..success_rate(float): Percentage (0.0 to 1.0) of successful runs..success_count(int): Number of successful runs..error_count(int): Number of failed runs.
7.7 Low-level Core API (PromptRepo)
If you need programmatic control over repository commands (such as initializing registries, committing prompt modifications, managing version locks, or computing prompt diffs):
from promptvc.core import PromptRepo
from promptvc.utils.template import render_template
from promptvc.core.diff import compute_diff, format_diff
# 1. Load the Repository
# By default, manages the local `.promptvc` directory in the current working directory.
repo = PromptRepo()
# Initialize if not already initialized
if not repo.storage.is_initialized:
repo.init_repo()
# 2. Register & Commit a Prompt Version programmatically
meta = repo.commit(
name="translator",
prompt="Translate this text to {{language}}: {{text}}",
message="v1 initial translation prompt",
schema={
"variables": {
"language": {"type": "string", "required": True},
"text": {"type": "string", "required": True}
}
}
)
# 3. Retrieve Prompt Template and Metadata
# Returns the raw prompt string
prompt_template = repo.get("translator", "v1")
# Returns dictionary containing version author, hash, lock status, and tokens
version_meta = repo.get_version_meta("translator", "v1")
# 4. Render template variables manually
rendered = render_template(prompt_template, {"language": "French", "text": "Hello"})
print(rendered) # "Translate this text to French: Hello"
# 5. Lock Stable Versions
# Prevents further modifications or commits to "v1"
repo.lock("translator", "v1")
# 6. Compute Prompt Differences
# Get token metrics difference between two versions
token_diff = repo.token_diff("translator", "v1", "v2")
# Compute line-by-line unified diffs
diff_lines = compute_diff("Hello world", "Hello beautiful world")
print(format_diff(diff_lines))
See examples/api_usage.py for a complete runnable demonstration.
8. Architecture Overview
Component Diagram
+--------------------------------------------------------------+
| CLI Interface Layer |
| (main.py) |
+--------------------------------------------------------------+
|
| Dispatches Commands
v
+--------------------------------------------------------------+
| Core Logic Controller |
| (repo.py) |
+--------------------------------------------------------------+
| | |
| Locks Mutations | Resolves Templates | Serializes State
v v v
+--------------+ +--------------+ +--------------+
| Lock Guard | | Template Eng | | Storage Eng |
| (lock.py) | | (template.py)| | (storage.py) |
+--------------+ +--------------+ +--------------+
|
| Write Path
v
+--------------+
| Local Disk |
| .promptvc/ |
+--------------+
Subsystems
- CLI Layer (
cli/main.py): Translates command strings to handler calls, configures Windows UTF-8 stdout, and coordinates interactive inputs when parameters are missing. - Core Logic Controller (
core/repo.py): Coordinates access to version registry, validation schemas, evaluation metrics, and runtime hooks. - Lock Guard (
core/lock.py): Enforces mutability rules. Blocks write requests targeting locked records. - Storage Engine (
core/storage.py): Manages local space file serialization. Implements transactional JSON writing to protect files from network or system interruptions. - Template System (
utils/template.py): Isolates template parameters, validates arguments, formats defaults, and returns clean prompt buffers. - Provider Layer (
providers/): Implements vendor connections. Normalizes requests and returns standardized metadata envelopes.
9. Execution Model
Run vs. Eval vs. Compare
The execution engine processes invocations through three distinct runtime pathways:
- run: Executes a single template on one input set. Parameters are resolved from CLI overrides (
--var), validation schema defaults, or interactive prompts. Results are persisted to the database runs array. - eval: Batches executions against a dataset array. Every array item must expose an
inputproperty. Evaluated outcomes, latencies, and token logs are persisted in the evaluations database table. - compare: Compares version metrics side-by-side on a shared dataset. It evaluates v1 and v2 on identical inputs at runtime. Comparison metrics are not stored.
Telemetry Mapping
The provider layer extracts token telemetry (prompt tokens, response tokens, and total tokens) from API response packages. Cost estimates are determined using the model's cost rate coefficients located in utils/cost.py.
10. Developer Experience
Reproducibility
Every prompt commit includes a unique SHA-256 hash. Because files are versioned locally in a declarative format, developers can replicate model inputs at any point by pulling historical commits.
Traceability
The apply command logs changes inside the space database configuration file. Every filesystem modification is mapped directly to the version of the prompt that suggested the patch.
Local Mock Debugging
The mock provider returns reversed prompt buffers. This enables developers to test template layouts, pipeline flows, and test assertions locally without incurring model costs or API latency.
11. Comparison with Existing Tools
| Metric | promptvc | LangSmith / W&B | OpenAI Evals | Basic Scripting |
|---|---|---|---|---|
| Storage Locality | Local filesystem (.promptvc/) |
Cloud-hosted dashboard | Local / Cloud datasets | None / In-code strings |
| Telemetry Profile | Latency and Cost checks | Trace trees and logs | Evaluation frameworks | Manual tracking |
| Code Modification | Diff-based patching | None | None | None |
| Dependencies | Standard Library only | External packages | Python framework core | Custom scripts |
| CI Integration | CLI --non-interactive |
Cloud webhooks | Python command files | Custom setups |
12. Use Cases
Automated Pre-Commit Assertions
Run prompt assertion suites locally on every git commit. If output Jaccard similarity drifts past defined thresholds, block commits to enforce prompt stability.
Air-Gapped LLM Integrations
Integrate prompts locally with offline databases using Ollama and promptvc in security-restricted environments.
Programmatic Code Refactoring
Apply prompt patches to multi-file directory structures in batches to clean up deprecated APIs, sort imports, or apply style rules.
13. Stability and Reliability
The codebase incorporates several checks to ensure reliability in production environments:
- Transactional Serialization: Database writes write first to a temporary file before renaming it to replace the target. This ensures that space registries are not corrupted if the process crashes mid-write.
- Encoding Auto-Detection: Modifying codebase files via
applyuses layered encoding checks to prevent silent character corruption in non-ASCII codebases. - Stream Reconfiguration: At module startup,
console.pydetects Windows shells and configures stdout/stderr streams to UTF-8 to prevent encoder faults when printing Unicode elements. - Validation Gating: Command handlers validate file path parameters and system configurations before calling provider APIs, preventing unnecessary token expenditure on bad configurations.
- Provider Lazy Registration: Postpones loading provider libraries until they are executed, preventing crash sequences on machines missing dependencies for providers they do not use.
- Backup and Rollback Safety: Automatically writes
.bakbackups of codebase files duringapplyactions, safely rolling back changes if a diff fails to apply cleanly. - Idempotency Verification: Validates the target file using SHA-256 fingerprints before modifications to guarantee that a prompt change is not double-applied.
- Self-Healing Connections: Combines backoff-with-jitter retry logic for model execution calls to gracefully handle rate-limit (HTTP 429) errors and temporary connection drops.
14. Installation and Quick Start
Installation
Clone the repository and perform an editable installation using pip:
git clone https://github.com/uayushdubey/prompt-version-control.git
cd prompt-version-control
pip install -e .
Verify system setup:
$ promptvc status
Quick Start Configuration
Bind default model settings:
$ promptvc config set provider openai
$ promptvc config set api_keys.openai "your-api-key"
$ promptvc config set models.openai "gpt-4o-mini"
15. Workflow Integration & DevOps Guides
Integrating promptvc into your development lifecycle ensures that prompt adjustments are treated with the same validation rigor as traditional codebase changes.
15.1 Day-to-Day Iteration Cycle
For active development, the standard workflow cycle proceeds as follows:
graph TD
A["1. init / check status"] --> B["2. commit / edit template"]
B --> C["3. run / quick test"]
C --> D["4. test run / verify assertions"]
D --> E["5. lock / production release"]
- Initialize / Check Status:
Ensure you have a configured workspace.
promptvc init promptvc status
- Draft & Commit:
Commit your template with a description and optional variable schemas.
promptvc commit sentiment_analyzer --prompt "Classify the sentiment of: {{text}}" --message "v1 base sentiment prompt"
- Execute & Debug:
Run the prompt locally using the configured provider to verify outputs.
promptvc run sentiment_analyzer latest --var text="This tool works beautifully!" --stream
- Define & Execute Test Suite:
Run test suites to prevent regression or silent performance drift.
promptvc test run sentiment_analyzer latest --suite tests/sentiment_suite.json
- Lock Version:
Mark the tested version as read-only once verified, making it ready for production integration.
promptvc lock sentiment_analyzer latest
15.2 Local Git Hooks (Pre-commit Validation)
You can prevent developers from committing broken prompts or regressions to the codebase by adding verification checks in git hooks.
Create or edit .git/hooks/pre-commit:
#!/bin/sh
echo "=== Running promptvc pre-commit hooks ==="
# 1. Validate prompt schemas
promptvc validate prompt sentiment_analyzer latest
if [ $? -ne 0 ]; then
echo "❌ Prompt validation failed!"
exit 1
fi
# 2. Run assertion suites (Fail commit if average score falls below threshold)
promptvc test run sentiment_analyzer latest --suite tests/sentiment_suite.json --non-interactive --threshold 0.85
if [ $? -ne 0 ]; then
echo "❌ Regression detected or assertions failed! Aborting commit."
exit 1
fi
echo "✅ All prompt assertions passed."
exit 0
Make the hook executable:
chmod +x .git/hooks/pre-commit
15.3 Continuous Integration (GitHub Actions)
You can integrate promptvc into your CI/CD pipeline to automatically execute assertion suites on every pull request.
Create .github/workflows/promptvc-verify.yml:
name: Verify Prompts
on:
push:
branches: [ main, dev ]
pull_request:
branches: [ main ]
jobs:
verify:
runs-on: ubuntu-latest
steps:
- name: Checkout Codebase
uses: actions/checkout@v3
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.10'
- name: Install promptvc
run: |
pip install .
- name: Configure Defaults & Secrets
run: |
promptvc config set provider openai
promptvc config set api_keys.openai "${{ secrets.OPENAI_API_KEY }}"
promptvc config set models.openai "gpt-4o-mini"
- name: Run Test Assertions
run: |
# Fails pipeline execution (exit code 1) if criteria are not met
promptvc test run sentiment_analyzer latest --suite tests/sentiment_suite.json --non-interactive --threshold 0.80
15.4 Web Application Runtime Integration (FastAPI Example)
Avoid hardcoding prompts in python files. Keep your application clean and isolated by importing promptvc programmatically to resolve locked templates and run models dynamically.
import os
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from promptvc.core import PromptRepo
from promptvc.providers import get_provider
from promptvc.utils.template import render_template
from promptvc.config import get_config_value
app = FastAPI()
# 1. Instantiate the repository (looks for .promptvc in current directory)
repo = PromptRepo()
class AnalysisRequest(BaseModel):
text: str
@app.post("/analyze-sentiment")
async def analyze_sentiment(req: AnalysisRequest):
try:
# 2. Retrieve the locked production prompt template (e.g. pinned to v1)
prompt_template = repo.get("sentiment_analyzer", "v1")
# 3. Inject application variables
rendered = render_template(prompt_template, {
"text": req.text
})
# 4. Resolve the configured provider and execute
provider_name = get_config_value("provider", "openai")
provider = get_provider(provider_name)
result = provider.run(rendered)
# 5. Log execution trace to promptvc registry
run_record = {
"version": "v1",
"output": result["output"],
"tokens": result["tokens"],
"timestamp": repo._utc_now_iso()
}
repo.storage.append_run("sentiment_analyzer", run_record)
return {
"sentiment": result["output"].strip(),
"tokens_consumed": result["tokens"],
"timestamp": run_record["timestamp"]
}
except Exception as exc:
raise HTTPException(status_code=500, detail=str(exc))
16. Roadmap
- Remote Registry Sync: Implement commands to push and pull prompt spaces to cloud systems (PostgreSQL, S3) to support team environments.
- Scoring Dashboard: Build local static site report generation detailing cost trends, latency performance, and test history graphs.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file promptrepo-0.2.0.tar.gz.
File metadata
- Download URL: promptrepo-0.2.0.tar.gz
- Upload date:
- Size: 117.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
37149786f33926b26d881a9f6607a2f8ce50a1d13f7ca369233bdb9aa4962e11
|
|
| MD5 |
77b033bc763157ab8e46eece1251a53e
|
|
| BLAKE2b-256 |
b855f880f6936f0c40ef71db5d8f510d19435100ffc0415dbdbc956720d56b41
|
File details
Details for the file promptrepo-0.2.0-py3-none-any.whl.
File metadata
- Download URL: promptrepo-0.2.0-py3-none-any.whl
- Upload date:
- Size: 105.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
aa593a18dce88cb838bf860a1b8c165ea5344ef574a93e1ce6272119ced95b1b
|
|
| MD5 |
9e7b54b79648e2f94fa7711469b9e70b
|
|
| BLAKE2b-256 |
0a73fd344bbdce7464f5d7c4a1500a50d06f9a77563357a26666711abb2c399a
|