Skip to main content

Prompt Version Control — production-grade git for LLM prompts

Project description

promptvc

Prompt Version Control - A Git-Like CLI Tool and Execution Core for LLM Prompts.


1. Problem Statement

Prompts are load-bearing logical components in modern software architectures. They dictate data structures, govern agentic execution paths, and transform codebases. However, prompt engineering and integration workflows frequently lack the basic discipline applied to traditional source code.

In modern development environments, prompts suffer from the following systemic issues:

  • Out-of-band management: Prompts are copy-pasted into Notion documents, shared over chat tools, or hardcoded directly in application source code.
  • Missing version history: Prompt text is edited in place on production databases or server parameters with no version lineage, no change documentation, and no rollback mechanism.
  • Silent regressions: Adjusting a prompt to fix an edge case on one input often degrades output quality across other inputs without triggering errors.
  • Lack of observability: Latency, token consumption, and API cost metrics are rarely logged or mapped directly to the version of the prompt that generated them.

This lack of control introduces vulnerabilities when deploying LLM integrations to production. A single prompt modification can break backend parsing logic, escalate runtime API costs, or compromise model performance without developer visibility.


2. What This Tool Is (and Is NOT)

  • It IS a local, version-controlled prompt registry: Prompts are saved in a structured, local format (.promptvc/spaces/*.json) using sequential, immutable version identifiers.
  • It IS a declarative test and evaluation runner: Supports assertion verification (JSON validation, token counts, regex checks, semantic similarity) against evaluation datasets directly in the console.
  • It IS a multi-provider execution abstraction: Runs templates across local instances (Ollama) and cloud APIs (OpenAI, Anthropic, Gemini) using a unified invocation format.
  • It IS NOT a visual prompt playground: There are no web-based node diagrams, drag-and-drop elements, or third-party cloud hosting requirements.
  • It IS NOT a framework-level runtime wrapper: It does not force you to write application code inside specific chains, agent classes, or SDK models.

3. Why This Tool Matters

By treating prompts as immutable, versioned code assets, this tool establishes a reliable local iteration loop:

  • Reproducibility: Every execution log, evaluation result, and file change is mapped directly to a specific prompt space name, version ID, and SHA-256 hash.
  • Testability: Automated test suites verify prompt outputs against assertions before updates are deployed.
  • Control: Historical versions can be explicitly locked to protect stable production paths from unintended modifications.
  • Transparency: Integrated token modeling estimates exact API costs and response latency across providers, preventing production cost overruns.

4. Core Features

Immutable Versioning and Space Management

Prompts are organized into logical partitions called "spaces" (e.g., summarize, code_generation). Each commit registers a new immutable version.

  • Auto-incrementing IDs (v1, v2, v3).
  • Version hashes computed over raw prompt text.
  • Author and message metadata logs.
  • Lock gates: Running lock marks a version as read-only, raising errors if a developer attempts to modify or commit over it.

Schema-Based Variable Injection

Prompts support template parameters using {{variable}} brackets. A version can optionally declare a JSON validation schema defining:

  • Variable types (e.g., string, boolean).
  • Required vs. optional flags.
  • Default values (used automatically if no override is supplied).
  • Documentation descriptions for interactive prompts.

Declarative Unit Testing Engine

The test module provides a local, CI-ready testing framework:

  • JSON-defined test cases specifying inputs, assertions, and checks.
  • Composite Scoring: Runs rules and model evaluations to aggregate a normalized case score from 0.0 to 1.0.
  • Rule-Based Assertions: contains, not_contains, regex, json_valid, min_tokens, max_tokens, and golden.
  • Jaccard semantic similarity: The golden assertion measures similarity of token word-sets between execution output and a stored golden file.
  • LLM-as-a-Judge: Validates free-form model assertions (e.g. style constraints, safety guidelines) using configurable evaluation prompts against LLMs.
  • Regression Delta & Gates: Compares suite performance against a base version, generating a comparative score delta table, and failing CI with threshold gates.
  • Automated updates: The test golden command runs the prompt version and overwrites or creates golden file records.

Multi-Step Orchestration Pipelines

Execute multi-step prompt workflows sequentially using JSON declarations:

  • Downstream step templates can reference upstream step outputs using the {{ steps.step_id.output }} syntax.
  • Global pipeline variables are referenced using the {{ input.variable_name }} syntax.
  • Validates parameter inputs, prompt spaces, and execution paths before requesting provider resources.

Interactive Shell REPL

A stateful interactive command loop for rapid prompt debugging:

  • Persistent variable binds: var code="def add(a, b): return a + b".
  • Quick provider and model switching: set provider anthropic or set model gpt-4o.
  • Shell runtime metrics tracking: cost aggregates accumulated token usage, latency, and dollar costs across the current session.

Diff-Based File Editing

Modify filesystem resources safely using LLM instructions:

  • Safe reader checks: Attempts UTF-8, UTF-8-sig, UTF-16, and Latin-1 encodings, preserving the original file encoding during modifications.
  • Unified diff validation: Parses model output as a strict unified diff, validates target contexts against original file lines, and handles space-stripped context lines.
  • Atomic Backups & Rollback: Automatically saves a .bak backup of target files before modifying them, and restores the backup if anything goes wrong.
  • Idempotency checking: Automatically compares the SHA-256 fingerprint of the target file to prevent duplicate modifications.
  • Approval gate: Applies modifications only after interactive validation, logging the change log entry.

Observability and Trace Logging

Gain deep runtime insight with complete execution logging:

  • Transactional tracing: Automatically logs inputs, outputs, tokens, latencies, model configurations, scores, and errors to .promptvc/traces.jsonl.
  • CLI querying: Search, filter, and inspect traces interactively using promptvc trace.

Schema & Dataset Validation

Enforce strict consistency across prompt definitions and evaluations:

  • Prompt validation: Runs promptvc validate prompt to verify template consistency, variable names, and schema defaults.
  • Dataset validation: Runs promptvc validate dataset to verify the structure, types, and schema compatibility of bulk evaluation files.

5. CLI Reference

init

Initialize a promptvc repository in the current workspace.

  • Syntax: promptvc init
  • Behavior: Creates the .promptvc/ directory structure.
  • When to use: Set up a new local workspace.
  • Example:
    promptvc init
    

status

Provide a high-level overview of the current workspace.

  • Syntax: promptvc status
  • Behavior: Inspects the registry and displays active spaces, version counts, execution runs, and recent actions.
  • When to use: Check workspace state.
  • Example:
    promptvc status
    

commit

Commit a new version to a prompt space.

  • Syntax: promptvc commit <name> [flags]
  • Flags:
    • --prompt <string>: Raw prompt text. If omitted, opens interactive multi-line terminal input.
    • --message <string>: Commit message. If omitted, opens interactive terminal prompt.
  • Behavior: Resolves prompt and message, validates that the space exists, checks that the latest version is not locked, and serializes the new version.
  • When to use: Register a new iteration of a prompt template.
  • Example:
    promptvc commit translate --prompt "Translate this text to French: {{text}}" --message "v1 translation prompt"
    

log

Display execution commit history for a prompt space.

  • Syntax: promptvc log <name>
  • Behavior: Renders a structured history table containing version IDs, messages, token counts, lock status, and dates.
  • When to use: Audit how a prompt space has evolved over time.
  • Example:
    promptvc log translate
    

get

Display the raw prompt content of a specific version.

  • Syntax: promptvc get <name> <version>
  • Behavior: Prints the raw template string. Supports the latest version alias.
  • When to use: View prompt text without metadata formatting.
  • Example:
    promptvc get translate latest
    

inspect

Display detailed metadata and schema information for a version.

  • Syntax: promptvc inspect <name> <version>
  • Behavior: Parses version records and outputs raw prompt text, variables, validation schema fields, lock states, and example CLI commands. Supports the latest version alias.
  • When to use: Verify a prompt's required template arguments.
  • Example:
    promptvc inspect translate v1
    

diff

Compute the token, character, and text difference between two prompt versions.

  • Syntax: promptvc diff <name> <v1> <v2> [flags]
  • Flags:
    • --text: Display unified diff lines (like git diff).
    • --stat: Display comparison metrics table (characters, words, and tokens).
  • Behavior: Calculates delta metrics between target versions.
  • When to use: Analyze changes between versions.
  • Example:
    promptvc diff translate v1 v2 --text
    

lock

Lock a prompt version to prevent modification.

  • Syntax: promptvc lock <name> <version>
  • Behavior: Sets the locked property to true in the space record. Succeeding commits or evaluations targeting this version will block modifications. Supports the latest version alias.
  • When to use: Mark a version as a production release.
  • Example:
    promptvc lock translate v1
    

list

List all registered prompt spaces.

  • Syntax: promptvc list
  • Behavior: Returns a table listing all space names, their latest active version, and version counts.
  • When to use: Discover prompt spaces in the workspace.
  • Example:
    promptvc list
    

run

Execute a prompt version against a provider.

  • Syntax: promptvc run <name> <version> [flags]
  • Flags:
    • --provider <string>: Target provider (openai, anthropic, gemini, ollama, mock).
    • --model <string>: Provider model override.
    • --timeout <int>: Timeout limit in seconds.
    • --max-tokens <int>: Output tokens limit.
    • --stream: Stream tokens to stdout.
    • --var <key=value>: Template variable binding. Repeatable.
    • --dry-run: Renders the template to stdout without executing it.
    • --non-interactive: Disable interactive terminal inputs.
  • Behavior: Resolves template variables, renders the template, calls the provider, and prints output alongside token usage and latency. Supports the latest version alias.
  • When to use: Test prompt templates with specific inputs.
  • Example:
    promptvc run translate v1 --var text="Hello world" --provider openai
    

eval

Evaluate a prompt version against a dataset.

  • Syntax: promptvc eval <name> <version> [flags]
  • Flags:
    • --dataset <path>: Required. Path to JSON dataset file.
    • --provider, --model, --timeout, --max-tokens, --stream, --non-interactive.
  • Behavior: Executes the prompt template against each item in the dataset. Saves results to the space database. Supports the latest version alias.
  • When to use: Verify prompt output quality across batch datasets.
  • Example:
    promptvc eval translate v1 --dataset data.json --provider ollama --model llama3
    

compare

Evaluate two prompt versions on a dataset and display outputs side-by-side.

  • Syntax: promptvc compare <name> <v1> <v2> [flags]
  • Flags:
    • --dataset <path>: Required. Dataset file path.
    • --provider, --model, --timeout, --max-tokens, --stream.
  • Behavior: Runs evaluations for both versions and prints side-by-side outputs.
  • When to use: Run comparative evaluations before releasing prompt updates.
  • Example:
    promptvc compare translate v1 v2 --dataset data.json
    

apply

Apply a prompt to a target file or directory using LLM-generated diffs.

  • Syntax: promptvc apply <name> <version> [flags]
  • Flags:
    • --file <path>: Target file path to modify.
    • --dir <path>: Target directory path to modify.
    • --glob <pattern>: Filter pattern when using --dir (default: *).
    • --provider, --model, --timeout, --max-tokens, --stream, --var, --dry-run, --non-interactive.
  • Behavior: Reads target files, requests unified diffs from the provider, shows diff lines, and applies updates upon user approval. Logs change metadata. Supports the latest version alias.
  • When to use: Run automated refactoring prompts on codebase files.
  • Example:
    promptvc apply refactor v1 --file src/main.py --provider openai
    

changes

Display the file change history for a prompt space.

  • Syntax: promptvc changes <name>
  • Behavior: Displays a table detailing execution timestamps, prompt versions used, and modified file paths.
  • When to use: Audit which code files were modified by which prompts.
  • Example:
    promptvc changes refactor
    

config

View or modify configuration parameters.

  • Syntax: promptvc config <action> [key] [value]
  • Actions:
    • set: Bind value to config key.
    • get: Print value of config key.
    • list: Output entire config JSON object.
  • Behavior: Reads and modifies configuration settings at .promptvc/config.json.
  • When to use: Set default models, timeouts, or API keys.
  • Example:
    promptvc config set provider openai
    promptvc config set models.openai gpt-4o-mini
    

test

Manage and execute automated assertion test suites.

  • Syntax: promptvc test <subcommand> [flags]
  • Subcommands:
    • run <name> <version> --suite <path>: Run assertion test suite.
      • --threshold <float>: Optional. Minimum average score (0.0 to 1.0) to pass (CI exit-code gate).
      • --compare <version>: Optional. Version ID (e.g. v1) to check for regressions. Displays delta metrics table.
      • --deterministic: Optional. Run only rules and skip LLM-as-a-judge assertions for speed/cost.
    • golden <name> <version> --suite <path>: Run cases and update stored golden files with the outputs.
    • list [--dir <path>]: List all test suite JSON files recursively (default path is .).
  • When to use: Validate prompt changes inside CI pipelines or local verification environments.
  • Example:
    promptvc test run summarize v2 --suite tests/suite.json --compare v1 --threshold 0.8
    

validate

Validate dataset files or committed prompt version schemas.

  • Syntax: promptvc validate <subcommand>
  • Subcommands:
    • dataset <file>: Checks if a dataset file is well-formed JSON, contains input structures, and matches expected schemas.
    • prompt <name> <version>: Checks consistency of schema defaults, variable naming, types, and properties.
  • When to use: Avoid executing corrupted runs by preemptively validating templates and test parameters.
  • Example:
    promptvc validate dataset test_inputs.json
    promptvc validate prompt summarize latest
    

trace

Query and inspect execution runs log.

  • Syntax: promptvc trace <name> [version] [flags]
  • Flags:
    • --last <int>: Retrieve the last N execution traces (default: 20).
    • --json: Print raw trace logs as a JSON list.
  • Behavior: Retrieves execution logs from .promptvc/traces.jsonl, formats them as a clean summary table showing token count, latency, scores, and errors, and displays details for the latest run.
  • When to use: Audit runtime execution performance, debug outputs, or examine recent scoring history.
  • Example:
    promptvc trace summarize --last 10
    

pipe

Execute multi-step prompt workflows sequentially.

  • Syntax: promptvc pipe <subcommand>
  • Subcommands:
    • run <pipeline_file> [--var key=value] [--provider name]: Runs the specified multi-step pipeline.
    • validate <pipeline_file>: Verifies pipeline syntax and reference bindings without executing.
  • When to use: Compose chained tasks where downstream steps consume upstream step outputs.
  • Example:
    promptvc pipe run translate_and_summarize.json --var text="Hello world"
    

shell

Launch the stateful interactive REPL.

  • Syntax: promptvc shell
  • Behavior: Opens a command-line prompt loop allowing variable bindings, quick model/provider switching, and real-time cost and latency tracking.
  • When to use: Iterative debugging and prompt hacking.
  • Example:
    promptvc shell
    

Global Flag Behaviors

  • --version: Print program version and exit.
  • --json: Output command results in machine-readable JSON format where applicable.
  • --provider: Invocation provider override. Resolution order: --provider flag -> value in config.json -> mock.
  • --model: Invocation model override. Resolution order: --model flag -> configured default in config.json -> provider-native default.
  • --var: Declares template inputs. Parsed as key=value strings.
  • --non-interactive: Disables interactive console fallbacks. If required parameters are missing, exits with status 1.
  • --timeout: Invocations terminate and raise errors if they exceed this value in seconds.
  • --max-tokens: Instructs the provider to clamp model response length to this token count limit.
  • --stream: Intercepts API response frames and writes them directly to stdout.

6. End-to-End Developer Workflows

Scenario A: Prompt Creation and Iteration

Initialize the project space and register a summarization template:

$ promptvc init
$ promptvc commit summarize --prompt "Summarize this: {{text}}" --message "v1 basic summary"

Verify version status:

$ promptvc log summarize

Test the template with a sample input:

$ promptvc run summarize v1 --var text="Structured version control improves pipeline reliability." --provider openai

Refine the template by committing a second iteration:

$ promptvc commit summarize --prompt "Summarize this in five words: {{text}}" --message "v2 shorter constraint"

Scenario B: Batch Evaluation and Comparison

Generate a local evaluation dataset named inputs.json:

[
  {"input": "Machine learning architectures benefit from clear telemetry integration."},
  {"input": "Static analysis parsing prevents runtime execution exceptions."}
]

Run a comparison run between v1 and v2:

$ promptvc compare summarize v1 v2 --dataset inputs.json --provider openai

Review outputs side-by-side, then lock the stable iteration:

$ promptvc lock summarize v2

Scenario C: Safe Codebase Modification

Create a code refactoring prompt space:

$ promptvc commit fix_imports --prompt "Refactor import blocks to keep standard libraries sorted: {{code}}" --message "v1 import sorter"

Apply the prompt to a target python file:

$ promptvc apply fix_imports v1 --file src/main.py --provider openai

Review the diff output in the terminal console. Select y to apply the patch. Check space changes logs:

$ promptvc changes fix_imports

Scenario D: CI/CD Pipeline Assertion Tests

Define a test suite in tests/summarize_suite.json:

[
  {
    "id": "ml_summary",
    "input": {
      "text": "Telemetry integrations support pipeline diagnostics."
    },
    "assertions": [
      { "type": "contains", "value": "telemetry" },
      { "type": "max_tokens", "value": 50 }
    ]
  }
]

Run assertions inside automated build pipelines using --non-interactive:

$ promptvc test run summarize v2 --suite tests/summarize_suite.json --provider openai --non-interactive

7. Programmatic API (Python SDK)

While promptvc provides a comprehensive CLI, the underlying execution engine can be imported directly into Python applications. The library exports a high-level developer SDK at the root namespace, as well as a lower-level core interface (PromptRepo) for advanced repository actions.

7.1 Quick Setup & SDK Imports

First, ensure that your repository is initialized (typically via promptvc init or programmatically with PromptRepo().init_repo()). Then, import the SDK:

import promptvc

All primary execution functions, context wrappers, and result objects are exported directly from the top-level package.


7.2 Single-Shot Execution (run)

Use promptvc.run() to quickly substitute variables and execute a specific prompt version against a provider:

import promptvc

# Run prompt version with OpenAI
result = promptvc.run(
    name="translator",
    version="v1",          # Can also use "latest"
    provider="openai",     # openai, anthropic, gemini, ollama, mock
    model="gpt-4o-mini",   # Optional model override
    temperature=0.3,       # Optional sampling temperature
    # Template variables are passed as keyword arguments:
    language="Spanish",
    text="Hello, world!"
)

if result.ok:
    print(f"Output: {result.output}")
    print(f"Total Tokens: {result.tokens}")
    print(f"Latency: {result.latency_ms} ms")
    if result.cost:
        print(f"Cost USD: {result.cost_usd}")
else:
    print(f"Execution failed: {result.error}")

7.3 Wrapper Decorator (@prompt)

Power any Python function with a versioned prompt space using the @promptvc.prompt decorator. The function's keyword arguments will map directly to template variables, and the decorated function returns a RunResult object:

import promptvc

@promptvc.prompt("summarizer", version="latest", provider="anthropic", model="claude-3-5-sonnet")
def summarize(text: str) -> promptvc.RunResult:
    """This function is powered by promptvc 'summarizer' version 'latest'"""
    pass

# Invoke the function
result = summarize(text="Prompt Version Control enforces immutability and version safety...")
print(f"Summary: {result.output}")
print(f"Model Used: {result.model}")

7.4 Context Manager (run_context)

Use the run_context context manager when you want granular execution control, automatic trace logging, or want to compute aggregate costs/latencies over dynamic sequences:

import promptvc

with promptvc.run_context("classifier", "v2", provider="gemini") as ctx:
    # Run a classification task
    result = ctx.run(text="The pizza was delicious!")
    
    print(f"Classified Output: {result.output}")
    print(f"Context Latency: {ctx.latency_ms} ms")
    if ctx.cost:
        print(f"Accumulated Cost: {promptvc.format_cost(ctx.cost.total_cost_usd)}")

7.5 Batch Execution (batch_run)

To evaluate a prompt template across multiple input dictionaries in parallel, use promptvc.batch_run(). It executes tasks concurrently using a local thread pool:

import promptvc

inputs = [
    {"text": "First article contents..."},
    {"text": "Second article contents..."},
    {"text": "Third article contents..."}
]

# Run all inputs in parallel using up to 4 worker threads
batch_result = promptvc.batch_run(
    name="summarizer",
    version="v1",
    inputs=inputs,
    provider="openai",
    max_workers=4
)

print(f"Success Rate: {batch_result.success_rate * 100}%")
print(f"Total Combined Tokens: {batch_result.total_tokens}")
if batch_result.total_cost_usd is not None:
    print(f"Total Batch Cost: {promptvc.format_cost(batch_result.total_cost_usd)}")

for idx, run_res in enumerate(batch_result.results):
    print(f"\n[Run {idx+1}] Output: {run_res.output}")

7.6 SDK Result Interfaces

RunResult

Returned by run(), @prompt, and individual items in BatchResult.

  • .ok (bool): True if execution succeeded.
  • .output (str): The raw text response from the model.
  • .tokens (int | None): Sum of input and output tokens.
  • .input_tokens (int | None): Tokens in prompt template.
  • .output_tokens (int | None): Tokens in completion response.
  • .latency_ms (float): Execution duration in milliseconds.
  • .cost (CostBreakdown | None): Detailed pricing model breakdown.
  • .cost_usd (float | None): Convenience getter for total USD cost.
  • .model (str): Model identifier used by the provider.
  • .trace_id (str): Unique trace GUID generated for telemetry lookup.
  • .error (str | None): Error traceback string if execution failed.

BatchResult

Returned by batch_run().

  • .results (List[RunResult]): List of execution results (matches input order).
  • .total_tokens (int): Sum of all tokens consumed across the batch.
  • .total_cost_usd (float | None): Sum of all costs across the batch.
  • .total_latency_ms (float): Total wall-clock time elapsed for the batch sequence.
  • .success_rate (float): Percentage (0.0 to 1.0) of successful runs.
  • .success_count (int): Number of successful runs.
  • .error_count (int): Number of failed runs.

7.7 Low-level Core API (PromptRepo)

If you need programmatic control over repository commands (such as initializing registries, committing prompt modifications, managing version locks, or computing prompt diffs):

from promptvc.core import PromptRepo
from promptvc.utils.template import render_template
from promptvc.core.diff import compute_diff, format_diff

# 1. Load the Repository
# By default, manages the local `.promptvc` directory in the current working directory.
repo = PromptRepo()

# Initialize if not already initialized
if not repo.storage.is_initialized:
    repo.init_repo()

# 2. Register & Commit a Prompt Version programmatically
meta = repo.commit(
    name="translator",
    prompt="Translate this text to {{language}}: {{text}}",
    message="v1 initial translation prompt",
    schema={
        "variables": {
            "language": {"type": "string", "required": True},
            "text": {"type": "string", "required": True}
        }
    }
)

# 3. Retrieve Prompt Template and Metadata
# Returns the raw prompt string
prompt_template = repo.get("translator", "v1") 
# Returns dictionary containing version author, hash, lock status, and tokens
version_meta = repo.get_version_meta("translator", "v1") 

# 4. Render template variables manually
rendered = render_template(prompt_template, {"language": "French", "text": "Hello"})
print(rendered) # "Translate this text to French: Hello"

# 5. Lock Stable Versions
# Prevents further modifications or commits to "v1"
repo.lock("translator", "v1")

# 6. Compute Prompt Differences
# Get token metrics difference between two versions
token_diff = repo.token_diff("translator", "v1", "v2")

# Compute line-by-line unified diffs
diff_lines = compute_diff("Hello world", "Hello beautiful world")
print(format_diff(diff_lines))

See examples/api_usage.py for a complete runnable demonstration.


8. Architecture Overview

Component Diagram

+--------------------------------------------------------------+
|                     CLI Interface Layer                      |
|                         (main.py)                            |
+--------------------------------------------------------------+
                               |
                               | Dispatches Commands
                               v
+--------------------------------------------------------------+
|                     Core Logic Controller                    |
|                         (repo.py)                            |
+--------------------------------------------------------------+
       |                       |                        |
       | Locks Mutations       | Resolves Templates     | Serializes State
       v                       v                        v
+--------------+        +--------------+        +--------------+
|  Lock Guard  |        | Template Eng |        | Storage Eng  |
|  (lock.py)   |        | (template.py)|        | (storage.py) |
+--------------+        +--------------+        +--------------+
                                                        |
                                                        | Write Path
                                                        v
                                                +--------------+
                                                | Local Disk   |
                                                | .promptvc/   |
                                                +--------------+

Subsystems

  1. CLI Layer (cli/main.py): Translates command strings to handler calls, configures Windows UTF-8 stdout, and coordinates interactive inputs when parameters are missing.
  2. Core Logic Controller (core/repo.py): Coordinates access to version registry, validation schemas, evaluation metrics, and runtime hooks.
  3. Lock Guard (core/lock.py): Enforces mutability rules. Blocks write requests targeting locked records.
  4. Storage Engine (core/storage.py): Manages local space file serialization. Implements transactional JSON writing to protect files from network or system interruptions.
  5. Template System (utils/template.py): Isolates template parameters, validates arguments, formats defaults, and returns clean prompt buffers.
  6. Provider Layer (providers/): Implements vendor connections. Normalizes requests and returns standardized metadata envelopes.

9. Execution Model

Run vs. Eval vs. Compare

The execution engine processes invocations through three distinct runtime pathways:

  • run: Executes a single template on one input set. Parameters are resolved from CLI overrides (--var), validation schema defaults, or interactive prompts. Results are persisted to the database runs array.
  • eval: Batches executions against a dataset array. Every array item must expose an input property. Evaluated outcomes, latencies, and token logs are persisted in the evaluations database table.
  • compare: Compares version metrics side-by-side on a shared dataset. It evaluates v1 and v2 on identical inputs at runtime. Comparison metrics are not stored.

Telemetry Mapping

The provider layer extracts token telemetry (prompt tokens, response tokens, and total tokens) from API response packages. Cost estimates are determined using the model's cost rate coefficients located in utils/cost.py.


10. Developer Experience

Reproducibility

Every prompt commit includes a unique SHA-256 hash. Because files are versioned locally in a declarative format, developers can replicate model inputs at any point by pulling historical commits.

Traceability

The apply command logs changes inside the space database configuration file. Every filesystem modification is mapped directly to the version of the prompt that suggested the patch.

Local Mock Debugging

The mock provider returns reversed prompt buffers. This enables developers to test template layouts, pipeline flows, and test assertions locally without incurring model costs or API latency.


11. Comparison with Existing Tools

Metric promptvc LangSmith / W&B OpenAI Evals Basic Scripting
Storage Locality Local filesystem (.promptvc/) Cloud-hosted dashboard Local / Cloud datasets None / In-code strings
Telemetry Profile Latency and Cost checks Trace trees and logs Evaluation frameworks Manual tracking
Code Modification Diff-based patching None None None
Dependencies Standard Library only External packages Python framework core Custom scripts
CI Integration CLI --non-interactive Cloud webhooks Python command files Custom setups

12. Use Cases

Automated Pre-Commit Assertions

Run prompt assertion suites locally on every git commit. If output Jaccard similarity drifts past defined thresholds, block commits to enforce prompt stability.

Air-Gapped LLM Integrations

Integrate prompts locally with offline databases using Ollama and promptvc in security-restricted environments.

Programmatic Code Refactoring

Apply prompt patches to multi-file directory structures in batches to clean up deprecated APIs, sort imports, or apply style rules.


13. Stability and Reliability

The codebase incorporates several checks to ensure reliability in production environments:

  • Transactional Serialization: Database writes write first to a temporary file before renaming it to replace the target. This ensures that space registries are not corrupted if the process crashes mid-write.
  • Encoding Auto-Detection: Modifying codebase files via apply uses layered encoding checks to prevent silent character corruption in non-ASCII codebases.
  • Stream Reconfiguration: At module startup, console.py detects Windows shells and configures stdout/stderr streams to UTF-8 to prevent encoder faults when printing Unicode elements.
  • Validation Gating: Command handlers validate file path parameters and system configurations before calling provider APIs, preventing unnecessary token expenditure on bad configurations.
  • Provider Lazy Registration: Postpones loading provider libraries until they are executed, preventing crash sequences on machines missing dependencies for providers they do not use.
  • Backup and Rollback Safety: Automatically writes .bak backups of codebase files during apply actions, safely rolling back changes if a diff fails to apply cleanly.
  • Idempotency Verification: Validates the target file using SHA-256 fingerprints before modifications to guarantee that a prompt change is not double-applied.
  • Self-Healing Connections: Combines backoff-with-jitter retry logic for model execution calls to gracefully handle rate-limit (HTTP 429) errors and temporary connection drops.

14. Installation and Quick Start

Installation

Clone the repository and perform an editable installation using pip:

git clone https://github.com/uayushdubey/prompt-version-control.git
cd prompt-version-control
pip install -e .

Verify system setup:

$ promptvc status

Quick Start Configuration

Bind default model settings:

$ promptvc config set provider openai
$ promptvc config set api_keys.openai "your-api-key"
$ promptvc config set models.openai "gpt-4o-mini"

15. Workflow Integration & DevOps Guides

Integrating promptvc into your development lifecycle ensures that prompt adjustments are treated with the same validation rigor as traditional codebase changes.

15.1 Day-to-Day Iteration Cycle

For active development, the standard workflow cycle proceeds as follows:

graph TD
    A["1. init / check status"] --> B["2. commit / edit template"]
    B --> C["3. run / quick test"]
    C --> D["4. test run / verify assertions"]
    D --> E["5. lock / production release"]
  1. Initialize / Check Status: Ensure you have a configured workspace.
    promptvc init
    promptvc status
    
  2. Draft & Commit: Commit your template with a description and optional variable schemas.
    promptvc commit sentiment_analyzer --prompt "Classify the sentiment of: {{text}}" --message "v1 base sentiment prompt"
    
  3. Execute & Debug: Run the prompt locally using the configured provider to verify outputs.
    promptvc run sentiment_analyzer latest --var text="This tool works beautifully!" --stream
    
  4. Define & Execute Test Suite: Run test suites to prevent regression or silent performance drift.
    promptvc test run sentiment_analyzer latest --suite tests/sentiment_suite.json
    
  5. Lock Version: Mark the tested version as read-only once verified, making it ready for production integration.
    promptvc lock sentiment_analyzer latest
    

15.2 Local Git Hooks (Pre-commit Validation)

You can prevent developers from committing broken prompts or regressions to the codebase by adding verification checks in git hooks.

Create or edit .git/hooks/pre-commit:

#!/bin/sh
echo "=== Running promptvc pre-commit hooks ==="

# 1. Validate prompt schemas
promptvc validate prompt sentiment_analyzer latest
if [ $? -ne 0 ]; then
  echo "❌ Prompt validation failed!"
  exit 1
fi

# 2. Run assertion suites (Fail commit if average score falls below threshold)
promptvc test run sentiment_analyzer latest --suite tests/sentiment_suite.json --non-interactive --threshold 0.85
if [ $? -ne 0 ]; then
  echo "❌ Regression detected or assertions failed! Aborting commit."
  exit 1
fi

echo "✅ All prompt assertions passed."
exit 0

Make the hook executable:

chmod +x .git/hooks/pre-commit

15.3 Continuous Integration (GitHub Actions)

You can integrate promptvc into your CI/CD pipeline to automatically execute assertion suites on every pull request.

Create .github/workflows/promptvc-verify.yml:

name: Verify Prompts

on:
  push:
    branches: [ main, dev ]
  pull_request:
    branches: [ main ]

jobs:
  verify:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout Codebase
        uses: actions/checkout@v3

      - name: Set up Python
        uses: actions/setup-python@v4
        with:
          python-version: '3.10'

      - name: Install promptvc
        run: |
          pip install .

      - name: Configure Defaults & Secrets
        run: |
          promptvc config set provider openai
          promptvc config set api_keys.openai "${{ secrets.OPENAI_API_KEY }}"
          promptvc config set models.openai "gpt-4o-mini"

      - name: Run Test Assertions
        run: |
          # Fails pipeline execution (exit code 1) if criteria are not met
          promptvc test run sentiment_analyzer latest --suite tests/sentiment_suite.json --non-interactive --threshold 0.80

15.4 Web Application Runtime Integration (FastAPI Example)

Avoid hardcoding prompts in python files. Keep your application clean and isolated by importing promptvc programmatically to resolve locked templates and run models dynamically.

import os
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from promptvc.core import PromptRepo
from promptvc.providers import get_provider
from promptvc.utils.template import render_template
from promptvc.config import get_config_value

app = FastAPI()

# 1. Instantiate the repository (looks for .promptvc in current directory)
repo = PromptRepo()

class AnalysisRequest(BaseModel):
    text: str

@app.post("/analyze-sentiment")
async def analyze_sentiment(req: AnalysisRequest):
    try:
        # 2. Retrieve the locked production prompt template (e.g. pinned to v1)
        prompt_template = repo.get("sentiment_analyzer", "v1")
        
        # 3. Inject application variables
        rendered = render_template(prompt_template, {
            "text": req.text
        })
        
        # 4. Resolve the configured provider and execute
        provider_name = get_config_value("provider", "openai")
        provider = get_provider(provider_name)
        result = provider.run(rendered)
        
        # 5. Log execution trace to promptvc registry
        run_record = {
            "version": "v1",
            "output": result["output"],
            "tokens": result["tokens"],
            "timestamp": repo._utc_now_iso()
        }
        repo.storage.append_run("sentiment_analyzer", run_record)
        
        return {
            "sentiment": result["output"].strip(),
            "tokens_consumed": result["tokens"],
            "timestamp": run_record["timestamp"]
        }
        
    except Exception as exc:
        raise HTTPException(status_code=500, detail=str(exc))

16. Roadmap

  • Remote Registry Sync: Implement commands to push and pull prompt spaces to cloud systems (PostgreSQL, S3) to support team environments.
  • Scoring Dashboard: Build local static site report generation detailing cost trends, latency performance, and test history graphs.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

promptrepo-0.2.0.tar.gz (117.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

promptrepo-0.2.0-py3-none-any.whl (105.1 kB view details)

Uploaded Python 3

File details

Details for the file promptrepo-0.2.0.tar.gz.

File metadata

  • Download URL: promptrepo-0.2.0.tar.gz
  • Upload date:
  • Size: 117.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for promptrepo-0.2.0.tar.gz
Algorithm Hash digest
SHA256 37149786f33926b26d881a9f6607a2f8ce50a1d13f7ca369233bdb9aa4962e11
MD5 77b033bc763157ab8e46eece1251a53e
BLAKE2b-256 b855f880f6936f0c40ef71db5d8f510d19435100ffc0415dbdbc956720d56b41

See more details on using hashes here.

File details

Details for the file promptrepo-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: promptrepo-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 105.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for promptrepo-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 aa593a18dce88cb838bf860a1b8c165ea5344ef574a93e1ce6272119ced95b1b
MD5 9e7b54b79648e2f94fa7711469b9e70b
BLAKE2b-256 0a73fd344bbdce7464f5d7c4a1500a50d06f9a77563357a26666711abb2c399a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page