CLI for prompt versioning and A/B testing across AI providers
Project description
PromptPilot
A fast, lightweight Python library and CLI tool for versioning, testing, and optimizing your AI prompts across multiple providers.
๐ Quick Start
# Install from PyPI
pip install promptpilot
# Initialize a new prompt
promptpilot init my-summary --description "Summarize text in 3 paragraphs"
# Run an A/B test
promptpilot abtest my-summary --input sample.txt --provider openai
โจ Features
- Version control for your AI prompts, with or without Git
- A/B testing to compare prompts based on token usage and response quality
- Multi-provider support for OpenAI, Claude, Llama, and HuggingFace
- Fast startup time with lazy loading of dependencies
- Live progress feedback during operations
- Extensible architecture for custom providers and formatters
- Python API for integration into your own workflows
๐ Requirements
- Python 3.11+
- API keys for the services you want to use
๐ง Installation
pip install promptpilot
For development:
git clone https://github.com/doganarif/promptpilot.git
cd promptpilot
pip install -e .
โ๏ธ Configuration
Create a .env file in your project root with your API keys:
# Required API keys (for the providers you use)
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
LLAMA_API_KEY=llm-...
# Optional default models
PROMPTCTL_DEFAULT_MODEL=gpt-4o
PROMPTCTL_CLAUDE_MODEL=claude-3-opus-20240229
PROMPTCTL_LLAMA_MODEL=llama-3-70b-instruct
# Optional API URL for Llama
LLAMA_API_URL=https://api.llama-api.com
PromptPilot will automatically detect and use this file.
๐ CLI Usage Examples
Initialize a new prompt
promptpilot init text-extractor \
--description "Extract key information from documents" \
--author "Jane Smith"
This creates prompts/text-extractor.yml with a template and metadata.
Version Management
# Save current prompt state as a new version
promptpilot save summary -m "Added more detailed instructions"
# Show difference between versions
promptpilot diff summary
# List all versions of a prompt
promptpilot list summary
# List all available prompts
promptpilot list
Run an A/B test
# Basic A/B test with OpenAI
promptpilot abtest summary --input article.txt --provider openai
# Testing with Claude
promptpilot abtest summary --input article.txt --provider claude --model claude-3-sonnet
# Include full responses in the output
promptpilot abtest summary --input article.txt --include-responses
# Output results in JSON format
promptpilot abtest summary --input article.txt --format json > results.json
๐ป Python API Examples
Basic A/B Testing
from promptpilot.utils import load_prompt_versions
from promptpilot.models import Prompt
from promptpilot.providers import get_provider
from promptpilot.runner import ABTestRunner
# Load two prompt versions
prev, curr = load_prompt_versions("summary")
# Create prompt objects
prompt_a = Prompt(name="summary_v1", template=prev, version=1)
prompt_b = Prompt(name="summary_v2", template=curr, version=2)
# Initialize provider (OpenAI by default)
provider = get_provider("openai", model="gpt-4o")
# Run the A/B test
runner = ABTestRunner(provider)
input_text = open("article.txt").read()
result = runner.run_test(
prompt_a=prompt_a,
prompt_b=prompt_b,
variables={"text": input_text},
include_responses=True
)
# Display results
runner.display_results(result)
# Get the winner
winner_name, token_count = runner.get_winner(result)
print(f"Winner: {winner_name} with {token_count} tokens")
Testing Multiple Prompt Variations
from promptpilot.models import Prompt
from promptpilot.providers import get_provider
from promptpilot.runner import MultiPromptTestRunner
# Create multiple prompt variants
prompts = [
Prompt(name="concise",
template="Summarize this text briefly:\n\n{text}",
version=1),
Prompt(name="detailed",
template="Provide a comprehensive summary of the following text:\n\n{text}",
version=2),
Prompt(name="bullet_points",
template="Extract the key points from this text as bullet points:\n\n{text}",
version=3)
]
# Initialize provider
provider = get_provider("claude", model="claude-3-opus-20240229")
# Test all prompts with the same input
runner = MultiPromptTestRunner(provider)
input_text = open("research_paper.txt").read()
result = runner.run_test(
prompts=prompts,
variables={"text": input_text}
)
# Display results
runner.display_results(result)
# Get prompts ranked by efficiency
ranked_prompts = runner.get_ranked_prompts(result)
print("Prompts ranked by token efficiency:", ranked_prompts)
Batch Testing Across Multiple Inputs
from promptpilot.models import Prompt, TestCase
from promptpilot.providers import get_provider
from promptpilot.runner import BatchTestRunner
# Create prompt variants
prompt_a = Prompt(name="generic",
template="Summarize this text:\n\n{text}",
version=1)
prompt_b = Prompt(name="domain_specific",
template="Summarize this scientific text for a general audience:\n\n{text}",
version=2)
# Create test cases with different inputs
test_cases = [
TestCase(
name="news_article",
variables={"text": open("news.txt").read()},
description="General news article"
),
TestCase(
name="research_paper",
variables={"text": open("research.txt").read()},
description="Scientific research paper"
),
TestCase(
name="technical_doc",
variables={"text": open("technical.txt").read()},
description="Technical documentation"
)
]
# Initialize provider
provider = get_provider("openai", model="gpt-4o")
# Run batch test
runner = BatchTestRunner(provider)
result = runner.run_batch_test(
prompt_a=prompt_a,
prompt_b=prompt_b,
test_cases=test_cases
)
# Display results
runner.display_results(result)
# Get overall winner
overall_winner = runner.get_overall_winner(result)
print(f"Overall best prompt: {overall_winner.name}")
# Get best prompt for a specific case
best_for_research = runner.get_best_prompt_for_case(result, "research_paper")
print(f"Best prompt for research papers: {best_for_research}")
๐ Project Structure
my-project/
โโโ .env # API keys and config
โโโ prompts/ # Directory for prompt templates
โ โโโ summary.yml # Example prompt
โ โโโ extractor.yml # Another prompt
โ โโโ versions/ # Version history
โ โโโ summary/ # Directory for summary versions
โ โโโ v1_1714042811.yml
โ โโโ v2_1714042897.yml
โโโ inputs/ # Test inputs
โ โโโ article.txt
โ โโโ paper.txt
โโโ scripts/ # Your Python scripts
โโโ batch_test.py
๐ Prompt File Format
Each prompt is stored as a YAML file:
name: summary
description: Summarize text in three paragraphs
version: 2
author: Jane Smith
created: 2025-04-21
updated: 2025-04-21
prompt: |
Create a concise summary of the following text. Focus on the main points
and key details. Use about 3 paragraphs and make it engaging:
{text}
variables:
- name: text
description: Input text for the prompt
required: true
metadata:
recommended_models: ["gpt-4o", "claude-3-opus"]
token_estimate:
input_multiplier: 1.0
base_tokens: 50
history:
- version: 1
date: 2025-04-21
prompt: |
Summarize the following text in about 3 paragraphs:
{text}
message: "Initial version"
๐ก Best Practices
- Make small, incremental changes between prompt versions to isolate effects
- Use consistent test inputs to ensure fair comparisons
- Save versions regularly using
promptpilot saveto track your improvements - Consider both token efficiency and output quality in your evaluations
- Use descriptive prompt and test case names for better organization
๐ ๏ธ Troubleshooting
- API Key Issues: Ensure your
.envfile contains the correct API keys - Slow Startup: Try updating to the latest version with
pip install -U promptpilot - Import Errors: Install missing dependencies with
pip install 'promptpilot[all]' - Provider Not Found: Check that you've specified a supported provider name
- Version History Issues: Use
promptpilot saveto explicitly save versions
๐ Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add some amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
๐ License
This project is licensed under the MIT License - see the LICENSE file for details.
๐ง Contact
Arif Dogan - me@arif.sh
Project Link: https://github.com/doganarif/promptpilot
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file promptpilot_cli-0.0.1.tar.gz.
File metadata
- Download URL: promptpilot_cli-0.0.1.tar.gz
- Upload date:
- Size: 24.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.2 CPython/3.11.0 Darwin/24.4.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
474753c7431af638748844d1bf5b53f3194f125519beb587cfd0eb9cebc210f3
|
|
| MD5 |
44aef63b0eb1da5c168fc6f6d3991b96
|
|
| BLAKE2b-256 |
d9870a595d1ac453ea85748068c6b085f62d44a2aba190cf076b3b8fe724cf9c
|
File details
Details for the file promptpilot_cli-0.0.1-py3-none-any.whl.
File metadata
- Download URL: promptpilot_cli-0.0.1-py3-none-any.whl
- Upload date:
- Size: 31.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.2 CPython/3.11.0 Darwin/24.4.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
787b82e73c1ddb8603a104f1f1d2da35a8df5f9977bdccc0196008d54de7b4eb
|
|
| MD5 |
103075b964f31d26913ca4b96102a90f
|
|
| BLAKE2b-256 |
407bdffbe3c66f0bc04b697fb8091e73d764459549b981c5184e3c759edfb097
|