CLI for prompt versioning and A/B testing across AI providers

These details have not been verified by PyPI

Project description

PromptPilot

A fast, lightweight Python library and CLI tool for versioning, testing, and optimizing your AI prompts across multiple providers.

🚀 Quick Start

# Install from PyPI
pip install promptpilot

# Initialize a new prompt
promptpilot init my-summary --description "Summarize text in 3 paragraphs"

# Run an A/B test
promptpilot abtest my-summary --input sample.txt --provider openai

✨ Features

Version control for your AI prompts, with or without Git
A/B testing to compare prompts based on token usage and response quality
Multi-provider support for OpenAI, Claude, Llama, and HuggingFace
Fast startup time with lazy loading of dependencies
Live progress feedback during operations
Extensible architecture for custom providers and formatters
Python API for integration into your own workflows

📋 Requirements

Python 3.11+
API keys for the services you want to use

🔧 Installation

pip install promptpilot

For development:

git clone https://github.com/doganarif/promptpilot.git
cd promptpilot
pip install -e .

⚙️ Configuration

Create a .env file in your project root with your API keys:

# Required API keys (for the providers you use)
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
LLAMA_API_KEY=llm-...

# Optional default models
PROMPTCTL_DEFAULT_MODEL=gpt-4o
PROMPTCTL_CLAUDE_MODEL=claude-3-opus-20240229
PROMPTCTL_LLAMA_MODEL=llama-3-70b-instruct

# Optional API URL for Llama
LLAMA_API_URL=https://api.llama-api.com

PromptPilot will automatically detect and use this file.

📖 CLI Usage Examples

Initialize a new prompt

promptpilot init text-extractor \
  --description "Extract key information from documents" \
  --author "Jane Smith"

This creates prompts/text-extractor.yml with a template and metadata.

Version Management

# Save current prompt state as a new version
promptpilot save summary -m "Added more detailed instructions"

# Show difference between versions
promptpilot diff summary

# List all versions of a prompt
promptpilot list summary

# List all available prompts
promptpilot list

Run an A/B test

# Basic A/B test with OpenAI
promptpilot abtest summary --input article.txt --provider openai

# Testing with Claude
promptpilot abtest summary --input article.txt --provider claude --model claude-3-sonnet

# Include full responses in the output
promptpilot abtest summary --input article.txt --include-responses

# Output results in JSON format
promptpilot abtest summary --input article.txt --format json > results.json

💻 Python API Examples

Basic A/B Testing

from promptpilot.utils import load_prompt_versions
from promptpilot.models import Prompt
from promptpilot.providers import get_provider
from promptpilot.runner import ABTestRunner

# Load two prompt versions
prev, curr = load_prompt_versions("summary")

# Create prompt objects
prompt_a = Prompt(name="summary_v1", template=prev, version=1)
prompt_b = Prompt(name="summary_v2", template=curr, version=2)

# Initialize provider (OpenAI by default)
provider = get_provider("openai", model="gpt-4o")

# Run the A/B test
runner = ABTestRunner(provider)
input_text = open("article.txt").read()

result = runner.run_test(
    prompt_a=prompt_a,
    prompt_b=prompt_b,
    variables={"text": input_text},
    include_responses=True
)

# Display results
runner.display_results(result)

# Get the winner
winner_name, token_count = runner.get_winner(result)
print(f"Winner: {winner_name} with {token_count} tokens")

Testing Multiple Prompt Variations

from promptpilot.models import Prompt
from promptpilot.providers import get_provider
from promptpilot.runner import MultiPromptTestRunner

# Create multiple prompt variants
prompts = [
    Prompt(name="concise", 
           template="Summarize this text briefly:\n\n{text}", 
           version=1),
    Prompt(name="detailed", 
           template="Provide a comprehensive summary of the following text:\n\n{text}", 
           version=2),
    Prompt(name="bullet_points", 
           template="Extract the key points from this text as bullet points:\n\n{text}", 
           version=3)
]

# Initialize provider
provider = get_provider("claude", model="claude-3-opus-20240229")

# Test all prompts with the same input
runner = MultiPromptTestRunner(provider)
input_text = open("research_paper.txt").read()

result = runner.run_test(
    prompts=prompts,
    variables={"text": input_text}
)

# Display results
runner.display_results(result)

# Get prompts ranked by efficiency
ranked_prompts = runner.get_ranked_prompts(result)
print("Prompts ranked by token efficiency:", ranked_prompts)

Batch Testing Across Multiple Inputs

from promptpilot.models import Prompt, TestCase
from promptpilot.providers import get_provider
from promptpilot.runner import BatchTestRunner

# Create prompt variants
prompt_a = Prompt(name="generic", 
                  template="Summarize this text:\n\n{text}", 
                  version=1)
                  
prompt_b = Prompt(name="domain_specific", 
                  template="Summarize this scientific text for a general audience:\n\n{text}", 
                  version=2)

# Create test cases with different inputs
test_cases = [
    TestCase(
        name="news_article",
        variables={"text": open("news.txt").read()},
        description="General news article"
    ),
    TestCase(
        name="research_paper",
        variables={"text": open("research.txt").read()},
        description="Scientific research paper"
    ),
    TestCase(
        name="technical_doc",
        variables={"text": open("technical.txt").read()},
        description="Technical documentation"
    )
]

# Initialize provider
provider = get_provider("openai", model="gpt-4o")

# Run batch test
runner = BatchTestRunner(provider)
result = runner.run_batch_test(
    prompt_a=prompt_a,
    prompt_b=prompt_b,
    test_cases=test_cases
)

# Display results
runner.display_results(result)

# Get overall winner
overall_winner = runner.get_overall_winner(result)
print(f"Overall best prompt: {overall_winner.name}")

# Get best prompt for a specific case
best_for_research = runner.get_best_prompt_for_case(result, "research_paper")
print(f"Best prompt for research papers: {best_for_research}")

📁 Project Structure

my-project/
├── .env                 # API keys and config
├── prompts/             # Directory for prompt templates
│   ├── summary.yml      # Example prompt
│   ├── extractor.yml    # Another prompt
│   └── versions/        # Version history
│       └── summary/     # Directory for summary versions 
│           ├── v1_1714042811.yml
│           └── v2_1714042897.yml
├── inputs/              # Test inputs
│   ├── article.txt
│   └── paper.txt
└── scripts/             # Your Python scripts
    └── batch_test.py

🔍 Prompt File Format

Each prompt is stored as a YAML file:

name: summary
description: Summarize text in three paragraphs
version: 2
author: Jane Smith
created: 2025-04-21
updated: 2025-04-21

prompt: |
  Create a concise summary of the following text. Focus on the main points
  and key details. Use about 3 paragraphs and make it engaging:
  
  {text}

variables:
  - name: text
    description: Input text for the prompt
    required: true

metadata:
  recommended_models: ["gpt-4o", "claude-3-opus"]
  token_estimate:
    input_multiplier: 1.0
    base_tokens: 50

history:
  - version: 1
    date: 2025-04-21
    prompt: |
      Summarize the following text in about 3 paragraphs:
      
      {text}
    message: "Initial version"

💡 Best Practices

Make small, incremental changes between prompt versions to isolate effects
Use consistent test inputs to ensure fair comparisons
Save versions regularly using promptpilot save to track your improvements
Consider both token efficiency and output quality in your evaluations
Use descriptive prompt and test case names for better organization

🛠️ Troubleshooting

API Key Issues: Ensure your .env file contains the correct API keys
Slow Startup: Try updating to the latest version with pip install -U promptpilot
Import Errors: Install missing dependencies with pip install 'promptpilot[all]'
Provider Not Found: Check that you've specified a supported provider name
Version History Issues: Use promptpilot save to explicitly save versions

📚 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add some amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

📧 Contact

Arif Dogan - me@arif.sh

Project Link: https://github.com/doganarif/promptpilot

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.0.1

Apr 21, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

promptpilot_cli-0.0.1.tar.gz (24.8 kB view details)

Uploaded Apr 21, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

promptpilot_cli-0.0.1-py3-none-any.whl (31.8 kB view details)

Uploaded Apr 21, 2025 Python 3

File details

Details for the file promptpilot_cli-0.0.1.tar.gz.

File metadata

Download URL: promptpilot_cli-0.0.1.tar.gz
Upload date: Apr 21, 2025
Size: 24.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.1.2 CPython/3.11.0 Darwin/24.4.0

File hashes

Hashes for promptpilot_cli-0.0.1.tar.gz
Algorithm	Hash digest
SHA256	`474753c7431af638748844d1bf5b53f3194f125519beb587cfd0eb9cebc210f3`
MD5	`44aef63b0eb1da5c168fc6f6d3991b96`
BLAKE2b-256	`d9870a595d1ac453ea85748068c6b085f62d44a2aba190cf076b3b8fe724cf9c`

See more details on using hashes here.

File details

Details for the file promptpilot_cli-0.0.1-py3-none-any.whl.

File metadata

Download URL: promptpilot_cli-0.0.1-py3-none-any.whl
Upload date: Apr 21, 2025
Size: 31.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.1.2 CPython/3.11.0 Darwin/24.4.0

File hashes

Hashes for promptpilot_cli-0.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`787b82e73c1ddb8603a104f1f1d2da35a8df5f9977bdccc0196008d54de7b4eb`
MD5	`103075b964f31d26913ca4b96102a90f`
BLAKE2b-256	`407bdffbe3c66f0bc04b697fb8091e73d764459549b981c5184e3c759edfb097`

See more details on using hashes here.

promptpilot-cli 0.0.1

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

PromptPilot

🚀 Quick Start

✨ Features

📋 Requirements

🔧 Installation

⚙️ Configuration

📖 CLI Usage Examples

Initialize a new prompt

Version Management

Run an A/B test

💻 Python API Examples

Basic A/B Testing

Testing Multiple Prompt Variations

Batch Testing Across Multiple Inputs

📁 Project Structure

🔍 Prompt File Format

💡 Best Practices

🛠️ Troubleshooting

📚 Contributing

📄 License

📧 Contact

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes