A CLI tool and Python library for optimizing LLM prompts through systematic testing and evaluation
Project description
prompt-optimizer-cli
A CLI tool and Python library for optimizing LLM prompts through systematic testing, version control, and performance metrics. Think "pytest for prompts" - test multiple prompt variations, measure quality, and automatically select the best performer.
Features
- Prompt Testing: Run multiple prompt variations against test cases
- Quality Metrics: Score outputs on accuracy, conciseness, tone, and cost
- LLM-as-Judge: AI-powered evaluation using any LLM as a judge
- Prometheus Metrics: Built-in observability with Prometheus metrics
- Version Control: Track prompt evolution with history and diffs
- Auto-Selection: Identify and select the best-performing prompt variant
- CLI & Library: Use as a command-line tool or Python import
- Multi-LLM Support: Works with Anthropic Claude, OpenAI GPT, and local Ollama models
Quick Start
# Install from PyPI
pip install prompt-optimizer-cli
# Initialize a project
prompt-optimizer init
# Optimize a prompt
prompt-optimizer optimize prompts/example.yaml \
--test-cases tests/example_tests.yaml \
--strategies concise,detailed \
--llm claude-sonnet-4 \
--output results.json
Installation
From PyPI
pip install prompt-optimizer-cli
From Source
git clone https://github.com/kmcallorum/prompt-optimizer.git
cd prompt-optimizer
pip install -e .
With Development Dependencies
pip install -e ".[dev]"
Using Docker
docker-compose build
docker-compose run prompt-optimizer --help
Usage
CLI Commands
# Initialize new project with example files
prompt-optimizer init
# Test a prompt against test cases
prompt-optimizer test prompt.yaml --test-cases tests.yaml --llm claude-sonnet-4
# Optimize with multiple strategies
prompt-optimizer optimize prompt.yaml \
--strategies concise,detailed,cot \
--test-cases tests.yaml \
--llm claude-sonnet-4 \
--output results.json
# Use LLM-as-judge for AI-powered evaluation
prompt-optimizer optimize prompt.yaml \
--test-cases tests.yaml \
--llm claude-sonnet-4 \
--judge gpt-4o \
--output results.json
# Compare two prompts
prompt-optimizer compare prompt1.yaml prompt2.yaml --test-cases tests.yaml
# View prompt history
prompt-optimizer history my-prompt
# Generate report from results
prompt-optimizer report results.json --format html --output report.html
# Display a prompt file
prompt-optimizer show prompt.yaml
Python Library
from prompt_optimizer import Prompt, TestCase, optimize_prompt
# Define a prompt
prompt = Prompt(
template="Summarize this text in {{ length }}: {{ text }}",
variables={"length": "one sentence", "text": ""},
system_message="You are a helpful summarization assistant.",
name="summarizer",
)
# Define test cases
test_cases = [
TestCase(
input_variables={
"text": "Long article text here...",
"length": "one sentence"
},
expected_properties={"length": "<30 words"}
)
]
# Run optimization
results = optimize_prompt(
prompt,
test_cases,
strategies=["concise", "detailed"],
llm="claude-sonnet-4"
)
print(f"Best variant: {results.best_variant.strategy}")
print(f"Score: {results.best_weighted_score:.2%}")
File Formats
Prompt File (YAML)
template: |
Answer the following question: {{ question }}
Requirements:
- Be concise
- Be accurate
system_message: "You are a helpful AI assistant."
variables:
question: ""
metadata:
author: "developer"
version: "1.0"
tags: ["qa", "concise"]
Test Cases (YAML)
name: "QA Test Suite"
test_cases:
- input_variables:
question: "What is the capital of France?"
expected_output: "Paris"
expected_properties:
tone: "neutral"
length: "<20 words"
- input_variables:
question: "Explain quantum computing"
expected_properties:
length: "50-150 words"
includes: ["qubits", "superposition"]
Supported LLMs
| Provider | Models | Environment Variable |
|---|---|---|
| Anthropic | claude-sonnet-4, claude-opus-4 | ANTHROPIC_API_KEY |
| OpenAI | gpt-4o, gpt-4-turbo, gpt-3.5-turbo | OPENAI_API_KEY |
| Ollama | llama3, mistral, etc. | N/A (local) |
Specify the LLM with the --llm flag:
prompt-optimizer optimize prompt.yaml --llm claude-sonnet-4
prompt-optimizer optimize prompt.yaml --llm gpt-4o
prompt-optimizer optimize prompt.yaml --llm ollama:llama3
Optimization Strategies
| Strategy | Description |
|---|---|
concise |
Makes responses shorter and more direct |
detailed |
Adds context and thorough explanations |
cot |
Adds chain-of-thought reasoning |
structured |
Formats output with sections and bullet points |
few_shot |
Adds example-based prompting |
Evaluation Criteria
Built-in scoring functions:
- accuracy: Compares output to expected result using sequence matching
- conciseness: Scores based on word count and length constraints
- includes: Checks for required keywords in response
Custom evaluators can be added:
from prompt_optimizer.evaluator import EVALUATORS
def custom_scorer(response: str, test_case: TestCase) -> float:
# Your scoring logic
return 0.8
EVALUATORS["custom"] = custom_scorer
LLM-as-Judge
Use an LLM to evaluate response quality instead of rule-based scoring:
# Use GPT-4 as judge while testing with Claude
prompt-optimizer optimize prompt.yaml \
--test-cases tests.yaml \
--llm claude-sonnet-4 \
--judge gpt-4o
from prompt_optimizer import optimize_prompt, Prompt, TestCase
results = optimize_prompt(
prompt=my_prompt,
test_cases=test_cases,
llm="claude-sonnet-4",
judge_llm="gpt-4o", # AI-based evaluation
)
The LLM judge evaluates responses on:
- accuracy - How well the response matches expected output
- relevance - How on-topic the response is
- coherence - How well-structured and logical the response is
- completeness - Whether all aspects of the prompt are addressed
- conciseness - Whether the response is appropriately brief
Prometheus Metrics
Built-in observability for production deployments:
# Start metrics server
prompt-optimizer metrics --port 8000
# Metrics available at http://localhost:8000/metrics
from prompt_optimizer import init_metrics, start_http_server
# Initialize and start metrics server
init_metrics()
start_http_server(8000)
# Run optimizations - metrics are automatically recorded
results = optimize_prompt(...)
Available metrics:
prompt_optimizer_optimizations_total- Total optimization runsprompt_optimizer_optimization_duration_seconds- Optimization durationprompt_optimizer_variants_evaluated_total- Variants evaluatedprompt_optimizer_test_cases_run_total- Test cases runprompt_optimizer_llm_requests_total- LLM API requestsprompt_optimizer_llm_tokens_total- Tokens used (input/output)prompt_optimizer_llm_cost_usd_total- Total cost in USDprompt_optimizer_best_variant_score- Best variant score
Configuration
Environment variables:
export ANTHROPIC_API_KEY=your-api-key
export OPENAI_API_KEY=your-api-key
Development
# Install dev dependencies
pip install -e ".[dev]"
# Run tests
pytest
# Run with coverage
pytest --cov=src/prompt_optimizer --cov-report=html
# Lint
ruff check src tests
# Type check
mypy src
Project Structure
prompt-optimizer/
├── src/prompt_optimizer/
│ ├── __init__.py
│ ├── cli.py # Click-based CLI
│ ├── core.py # Core optimization logic
│ ├── prompt.py # Prompt models
│ ├── evaluator.py # Scoring functions
│ ├── storage.py # Version control
│ ├── reporters.py # Result reporting
│ └── llm_clients/ # LLM integrations
├── tests/
├── examples/
├── Dockerfile
└── docker-compose.yml
License
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file prompt_optimizer_cli-0.3.6.tar.gz.
File metadata
- Download URL: prompt_optimizer_cli-0.3.6.tar.gz
- Upload date:
- Size: 22.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b75507964f11f46ebe415badfa61067b22b7131f94b8565d983aad79d50e9cca
|
|
| MD5 |
7f04d3bf209dcf9604a701f3372ee513
|
|
| BLAKE2b-256 |
73241bafbb7be00064eb864ec78e51a112a237ad3dea3608c858590f0fe122a5
|
File details
Details for the file prompt_optimizer_cli-0.3.6-py3-none-any.whl.
File metadata
- Download URL: prompt_optimizer_cli-0.3.6-py3-none-any.whl
- Upload date:
- Size: 29.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8f9b15d2cc5316a7362822facf50abe20aabad5fa961c06c162d223eba110630
|
|
| MD5 |
6816a9b68024bc48def77a96ea4b274a
|
|
| BLAKE2b-256 |
1364908e85b064d56e98bbf3b814af0cd3479e136ad4ad67c19587e15e5fe4f4
|