Skip to main content

A CLI tool and Python library for optimizing LLM prompts through systematic testing and evaluation

Project description

prompt-optimizer-cli

PyPI CI codecov Snyk Security Python 3.11+ License: MIT Ruff Type Checked

A CLI tool and Python library for optimizing LLM prompts through systematic testing, version control, and performance metrics. Think "pytest for prompts" - test multiple prompt variations, measure quality, and automatically select the best performer.

Features

  • Prompt Testing: Run multiple prompt variations against test cases
  • Quality Metrics: Score outputs on accuracy, conciseness, tone, and cost
  • LLM-as-Judge: AI-powered evaluation using any LLM as a judge
  • Prometheus Metrics: Built-in observability with Prometheus metrics
  • Version Control: Track prompt evolution with history and diffs
  • Auto-Selection: Identify and select the best-performing prompt variant
  • CLI & Library: Use as a command-line tool or Python import
  • Multi-LLM Support: Works with Anthropic Claude, OpenAI GPT, and local Ollama models

Quick Start

# Install from PyPI
pip install prompt-optimizer-cli

# Initialize a project
prompt-optimizer init

# Optimize a prompt
prompt-optimizer optimize prompts/example.yaml \
    --test-cases tests/example_tests.yaml \
    --strategies concise,detailed \
    --llm claude-sonnet-4 \
    --output results.json

Installation

From PyPI

pip install prompt-optimizer-cli

From Source

git clone https://github.com/kmcallorum/prompt-optimizer.git
cd prompt-optimizer
pip install -e .

With Development Dependencies

pip install -e ".[dev]"

Using Docker

docker-compose build
docker-compose run prompt-optimizer --help

Usage

CLI Commands

# Initialize new project with example files
prompt-optimizer init

# Test a prompt against test cases
prompt-optimizer test prompt.yaml --test-cases tests.yaml --llm claude-sonnet-4

# Optimize with multiple strategies
prompt-optimizer optimize prompt.yaml \
    --strategies concise,detailed,cot \
    --test-cases tests.yaml \
    --llm claude-sonnet-4 \
    --output results.json

# Use LLM-as-judge for AI-powered evaluation
prompt-optimizer optimize prompt.yaml \
    --test-cases tests.yaml \
    --llm claude-sonnet-4 \
    --judge gpt-4o \
    --output results.json

# Compare two prompts
prompt-optimizer compare prompt1.yaml prompt2.yaml --test-cases tests.yaml

# View prompt history
prompt-optimizer history my-prompt

# Generate report from results
prompt-optimizer report results.json --format html --output report.html

# Display a prompt file
prompt-optimizer show prompt.yaml

Python Library

from prompt_optimizer import Prompt, TestCase, optimize_prompt

# Define a prompt
prompt = Prompt(
    template="Summarize this text in {{ length }}: {{ text }}",
    variables={"length": "one sentence", "text": ""},
    system_message="You are a helpful summarization assistant.",
    name="summarizer",
)

# Define test cases
test_cases = [
    TestCase(
        input_variables={
            "text": "Long article text here...",
            "length": "one sentence"
        },
        expected_properties={"length": "<30 words"}
    )
]

# Run optimization
results = optimize_prompt(
    prompt,
    test_cases,
    strategies=["concise", "detailed"],
    llm="claude-sonnet-4"
)

print(f"Best variant: {results.best_variant.strategy}")
print(f"Score: {results.best_weighted_score:.2%}")

File Formats

Prompt File (YAML)

template: |
  Answer the following question: {{ question }}

  Requirements:
  - Be concise
  - Be accurate

system_message: "You are a helpful AI assistant."

variables:
  question: ""

metadata:
  author: "developer"
  version: "1.0"
  tags: ["qa", "concise"]

Test Cases (YAML)

name: "QA Test Suite"

test_cases:
  - input_variables:
      question: "What is the capital of France?"
    expected_output: "Paris"
    expected_properties:
      tone: "neutral"
      length: "<20 words"

  - input_variables:
      question: "Explain quantum computing"
    expected_properties:
      length: "50-150 words"
      includes: ["qubits", "superposition"]

Supported LLMs

Provider Models Environment Variable
Anthropic claude-sonnet-4, claude-opus-4 ANTHROPIC_API_KEY
OpenAI gpt-4o, gpt-4-turbo, gpt-3.5-turbo OPENAI_API_KEY
Ollama llama3, mistral, etc. N/A (local)

Specify the LLM with the --llm flag:

prompt-optimizer optimize prompt.yaml --llm claude-sonnet-4
prompt-optimizer optimize prompt.yaml --llm gpt-4o
prompt-optimizer optimize prompt.yaml --llm ollama:llama3

Optimization Strategies

Strategy Description
concise Makes responses shorter and more direct
detailed Adds context and thorough explanations
cot Adds chain-of-thought reasoning
structured Formats output with sections and bullet points
few_shot Adds example-based prompting

Evaluation Criteria

Built-in scoring functions:

  • accuracy: Compares output to expected result using sequence matching
  • conciseness: Scores based on word count and length constraints
  • includes: Checks for required keywords in response

Custom evaluators can be added:

from prompt_optimizer.evaluator import EVALUATORS

def custom_scorer(response: str, test_case: TestCase) -> float:
    # Your scoring logic
    return 0.8

EVALUATORS["custom"] = custom_scorer

LLM-as-Judge

Use an LLM to evaluate response quality instead of rule-based scoring:

# Use GPT-4 as judge while testing with Claude
prompt-optimizer optimize prompt.yaml \
    --test-cases tests.yaml \
    --llm claude-sonnet-4 \
    --judge gpt-4o
from prompt_optimizer import optimize_prompt, Prompt, TestCase

results = optimize_prompt(
    prompt=my_prompt,
    test_cases=test_cases,
    llm="claude-sonnet-4",
    judge_llm="gpt-4o",  # AI-based evaluation
)

The LLM judge evaluates responses on:

  • accuracy - How well the response matches expected output
  • relevance - How on-topic the response is
  • coherence - How well-structured and logical the response is
  • completeness - Whether all aspects of the prompt are addressed
  • conciseness - Whether the response is appropriately brief

Prometheus Metrics

Built-in observability for production deployments:

# Start metrics server
prompt-optimizer metrics --port 8000

# Metrics available at http://localhost:8000/metrics
from prompt_optimizer import init_metrics, start_http_server

# Initialize and start metrics server
init_metrics()
start_http_server(8000)

# Run optimizations - metrics are automatically recorded
results = optimize_prompt(...)

Available metrics:

  • prompt_optimizer_optimizations_total - Total optimization runs
  • prompt_optimizer_optimization_duration_seconds - Optimization duration
  • prompt_optimizer_variants_evaluated_total - Variants evaluated
  • prompt_optimizer_test_cases_run_total - Test cases run
  • prompt_optimizer_llm_requests_total - LLM API requests
  • prompt_optimizer_llm_tokens_total - Tokens used (input/output)
  • prompt_optimizer_llm_cost_usd_total - Total cost in USD
  • prompt_optimizer_best_variant_score - Best variant score

Configuration

Environment variables:

export ANTHROPIC_API_KEY=your-api-key
export OPENAI_API_KEY=your-api-key

Development

# Install dev dependencies
pip install -e ".[dev]"

# Run tests
pytest

# Run with coverage
pytest --cov=src/prompt_optimizer --cov-report=html

# Lint
ruff check src tests

# Type check
mypy src

Project Structure

prompt-optimizer/
├── src/prompt_optimizer/
│   ├── __init__.py
│   ├── cli.py              # Click-based CLI
│   ├── core.py             # Core optimization logic
│   ├── prompt.py           # Prompt models
│   ├── evaluator.py        # Scoring functions
│   ├── storage.py          # Version control
│   ├── reporters.py        # Result reporting
│   └── llm_clients/        # LLM integrations
├── tests/
├── examples/
├── Dockerfile
└── docker-compose.yml

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

prompt_optimizer_cli-0.3.6.tar.gz (22.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

prompt_optimizer_cli-0.3.6-py3-none-any.whl (29.2 kB view details)

Uploaded Python 3

File details

Details for the file prompt_optimizer_cli-0.3.6.tar.gz.

File metadata

  • Download URL: prompt_optimizer_cli-0.3.6.tar.gz
  • Upload date:
  • Size: 22.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for prompt_optimizer_cli-0.3.6.tar.gz
Algorithm Hash digest
SHA256 b75507964f11f46ebe415badfa61067b22b7131f94b8565d983aad79d50e9cca
MD5 7f04d3bf209dcf9604a701f3372ee513
BLAKE2b-256 73241bafbb7be00064eb864ec78e51a112a237ad3dea3608c858590f0fe122a5

See more details on using hashes here.

File details

Details for the file prompt_optimizer_cli-0.3.6-py3-none-any.whl.

File metadata

File hashes

Hashes for prompt_optimizer_cli-0.3.6-py3-none-any.whl
Algorithm Hash digest
SHA256 8f9b15d2cc5316a7362822facf50abe20aabad5fa961c06c162d223eba110630
MD5 6816a9b68024bc48def77a96ea4b274a
BLAKE2b-256 1364908e85b064d56e98bbf3b814af0cd3479e136ad4ad67c19587e15e5fe4f4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page