A CLI tool and Python library for optimizing LLM prompts through systematic testing and evaluation

These details have not been verified by PyPI

Project description

prompt-optimizer-cli

A CLI tool and Python library for optimizing LLM prompts through systematic testing, version control, and performance metrics. Think "pytest for prompts" - test multiple prompt variations, measure quality, and automatically select the best performer.

Features

Prompt Testing: Run multiple prompt variations against test cases
Quality Metrics: Score outputs on accuracy, conciseness, tone, and cost
LLM-as-Judge: AI-powered evaluation using any LLM as a judge
Prometheus Metrics: Built-in observability with Prometheus metrics
Version Control: Track prompt evolution with history and diffs
Auto-Selection: Identify and select the best-performing prompt variant
CLI & Library: Use as a command-line tool or Python import
Multi-LLM Support: Works with Anthropic Claude, OpenAI GPT, and local Ollama models

Quick Start

# Install from PyPI
pip install prompt-optimizer-cli

# Initialize a project
prompt-optimizer init

# Optimize a prompt
prompt-optimizer optimize prompts/example.yaml \
    --test-cases tests/example_tests.yaml \
    --strategies concise,detailed \
    --llm claude-sonnet-4 \
    --output results.json

Installation

From PyPI

pip install prompt-optimizer-cli

From Source

git clone https://github.com/kmcallorum/prompt-optimizer.git
cd prompt-optimizer
pip install -e .

With Development Dependencies

pip install -e ".[dev]"

Using Docker

docker-compose build
docker-compose run prompt-optimizer --help

Usage

CLI Commands

# Initialize new project with example files
prompt-optimizer init

# Test a prompt against test cases
prompt-optimizer test prompt.yaml --test-cases tests.yaml --llm claude-sonnet-4

# Optimize with multiple strategies
prompt-optimizer optimize prompt.yaml \
    --strategies concise,detailed,cot \
    --test-cases tests.yaml \
    --llm claude-sonnet-4 \
    --output results.json

# Use LLM-as-judge for AI-powered evaluation
prompt-optimizer optimize prompt.yaml \
    --test-cases tests.yaml \
    --llm claude-sonnet-4 \
    --judge gpt-4o \
    --output results.json

# Compare two prompts
prompt-optimizer compare prompt1.yaml prompt2.yaml --test-cases tests.yaml

# View prompt history
prompt-optimizer history my-prompt

# Generate report from results
prompt-optimizer report results.json --format html --output report.html

# Display a prompt file
prompt-optimizer show prompt.yaml

Python Library

from prompt_optimizer import Prompt, TestCase, optimize_prompt

# Define a prompt
prompt = Prompt(
    template="Summarize this text in {{ length }}: {{ text }}",
    variables={"length": "one sentence", "text": ""},
    system_message="You are a helpful summarization assistant.",
    name="summarizer",
)

# Define test cases
test_cases = [
    TestCase(
        input_variables={
            "text": "Long article text here...",
            "length": "one sentence"
        },
        expected_properties={"length": "<30 words"}
    )
]

# Run optimization
results = optimize_prompt(
    prompt,
    test_cases,
    strategies=["concise", "detailed"],
    llm="claude-sonnet-4"
)

print(f"Best variant: {results.best_variant.strategy}")
print(f"Score: {results.best_weighted_score:.2%}")

File Formats

Prompt File (YAML)

template: |
  Answer the following question: {{ question }}

  Requirements:
  - Be concise
  - Be accurate

system_message: "You are a helpful AI assistant."

variables:
  question: ""

metadata:
  author: "developer"
  version: "1.0"
  tags: ["qa", "concise"]

Test Cases (YAML)

name: "QA Test Suite"

test_cases:
  - input_variables:
      question: "What is the capital of France?"
    expected_output: "Paris"
    expected_properties:
      tone: "neutral"
      length: "<20 words"

  - input_variables:
      question: "Explain quantum computing"
    expected_properties:
      length: "50-150 words"
      includes: ["qubits", "superposition"]

Supported LLMs

Provider	Models	Environment Variable
Anthropic	claude-sonnet-4, claude-opus-4	`ANTHROPIC_API_KEY`
OpenAI	gpt-4o, gpt-4-turbo, gpt-3.5-turbo	`OPENAI_API_KEY`
Ollama	llama3, mistral, etc.	N/A (local)

Specify the LLM with the --llm flag:

prompt-optimizer optimize prompt.yaml --llm claude-sonnet-4
prompt-optimizer optimize prompt.yaml --llm gpt-4o
prompt-optimizer optimize prompt.yaml --llm ollama:llama3

Optimization Strategies

Strategy	Description
`concise`	Makes responses shorter and more direct
`detailed`	Adds context and thorough explanations
`cot`	Adds chain-of-thought reasoning
`structured`	Formats output with sections and bullet points
`few_shot`	Adds example-based prompting

Evaluation Criteria

Built-in scoring functions:

accuracy: Compares output to expected result using sequence matching
conciseness: Scores based on word count and length constraints
includes: Checks for required keywords in response

Custom evaluators can be added:

from prompt_optimizer.evaluator import EVALUATORS

def custom_scorer(response: str, test_case: TestCase) -> float:
    # Your scoring logic
    return 0.8

EVALUATORS["custom"] = custom_scorer

LLM-as-Judge

Use an LLM to evaluate response quality instead of rule-based scoring:

# Use GPT-4 as judge while testing with Claude
prompt-optimizer optimize prompt.yaml \
    --test-cases tests.yaml \
    --llm claude-sonnet-4 \
    --judge gpt-4o

from prompt_optimizer import optimize_prompt, Prompt, TestCase

results = optimize_prompt(
    prompt=my_prompt,
    test_cases=test_cases,
    llm="claude-sonnet-4",
    judge_llm="gpt-4o",  # AI-based evaluation
)

The LLM judge evaluates responses on:

accuracy - How well the response matches expected output
relevance - How on-topic the response is
coherence - How well-structured and logical the response is
completeness - Whether all aspects of the prompt are addressed
conciseness - Whether the response is appropriately brief

Prometheus Metrics

Built-in observability for production deployments:

# Start metrics server
prompt-optimizer metrics --port 8000

# Metrics available at http://localhost:8000/metrics

from prompt_optimizer import init_metrics, start_http_server

# Initialize and start metrics server
init_metrics()
start_http_server(8000)

# Run optimizations - metrics are automatically recorded
results = optimize_prompt(...)

Available metrics:

prompt_optimizer_optimizations_total - Total optimization runs
prompt_optimizer_optimization_duration_seconds - Optimization duration
prompt_optimizer_variants_evaluated_total - Variants evaluated
prompt_optimizer_test_cases_run_total - Test cases run
prompt_optimizer_llm_requests_total - LLM API requests
prompt_optimizer_llm_tokens_total - Tokens used (input/output)
prompt_optimizer_llm_cost_usd_total - Total cost in USD
prompt_optimizer_best_variant_score - Best variant score

Configuration

Environment variables:

export ANTHROPIC_API_KEY=your-api-key
export OPENAI_API_KEY=your-api-key

Development

# Install dev dependencies
pip install -e ".[dev]"

# Run tests
pytest

# Run with coverage
pytest --cov=src/prompt_optimizer --cov-report=html

# Lint
ruff check src tests

# Type check
mypy src

Project Structure

prompt-optimizer/
├── src/prompt_optimizer/
│   ├── __init__.py
│   ├── cli.py              # Click-based CLI
│   ├── core.py             # Core optimization logic
│   ├── prompt.py           # Prompt models
│   ├── evaluator.py        # Scoring functions
│   ├── storage.py          # Version control
│   ├── reporters.py        # Result reporting
│   └── llm_clients/        # LLM integrations
├── tests/
├── examples/
├── Dockerfile
└── docker-compose.yml

License

MIT

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.3.6

Jan 12, 2026

0.3.5

Jan 12, 2026

0.3.4

Jan 12, 2026

0.3.2

Jan 12, 2026

0.3.0

Jan 12, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

prompt_optimizer_cli-0.3.6.tar.gz (22.7 kB view details)

Uploaded Jan 12, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

prompt_optimizer_cli-0.3.6-py3-none-any.whl (29.2 kB view details)

Uploaded Jan 12, 2026 Python 3

File details

Details for the file prompt_optimizer_cli-0.3.6.tar.gz.

File metadata

Download URL: prompt_optimizer_cli-0.3.6.tar.gz
Upload date: Jan 12, 2026
Size: 22.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for prompt_optimizer_cli-0.3.6.tar.gz
Algorithm	Hash digest
SHA256	`b75507964f11f46ebe415badfa61067b22b7131f94b8565d983aad79d50e9cca`
MD5	`7f04d3bf209dcf9604a701f3372ee513`
BLAKE2b-256	`73241bafbb7be00064eb864ec78e51a112a237ad3dea3608c858590f0fe122a5`

See more details on using hashes here.

File details

Details for the file prompt_optimizer_cli-0.3.6-py3-none-any.whl.

File metadata

Download URL: prompt_optimizer_cli-0.3.6-py3-none-any.whl
Upload date: Jan 12, 2026
Size: 29.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for prompt_optimizer_cli-0.3.6-py3-none-any.whl
Algorithm	Hash digest
SHA256	`8f9b15d2cc5316a7362822facf50abe20aabad5fa961c06c162d223eba110630`
MD5	`6816a9b68024bc48def77a96ea4b274a`
BLAKE2b-256	`1364908e85b064d56e98bbf3b814af0cd3479e136ad4ad67c19587e15e5fe4f4`

See more details on using hashes here.

prompt-optimizer-cli 0.3.6

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

prompt-optimizer-cli

Features

Quick Start

Installation

From PyPI

From Source

With Development Dependencies

Using Docker

Usage

CLI Commands

Python Library

File Formats

Prompt File (YAML)

Test Cases (YAML)

Supported LLMs

Optimization Strategies

Evaluation Criteria

LLM-as-Judge

Prometheus Metrics

Configuration

Development

Project Structure

License

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes