Skip to main content

A/B test prompts across LLM providers from your terminal

Project description

⚡ PromptBench

A/B test prompts across LLM providers from your terminal.

CI Python 3.9+ License: MIT PyPI

Compare how different LLMs respond to the same prompt — see latency, token usage, cost, and responses side by side in one command.

$ promptbench "Explain quantum computing in one sentence"

  ⚡ PROMPTBENCH RESULTS
  ────────────────────────────────────────────────────────────

  Prompt: Explain quantum computing in one sentence

  Model                               Latency     Tokens         Cost
  ─────────────────────────────────── ────────── ────────── ────────────
  claude-sonnet-4-20250514                 1.24s        142     $0.0006
  gemini-2.0-flash                    ⚡💰 312ms         98     $0.0000
  gpt-4o                                  845ms        127     $0.0010

  ┌─ claude-sonnet-4-20250514
  │ Quantum computing uses qubits that can exist in superpositions of
  │ 0 and 1 simultaneously, enabling parallel computation that can
  │ solve certain problems exponentially faster than classical computers.
  └───────────────────────────────────────────────────────────

  ┌─ gemini-2.0-flash
  │ Quantum computing harnesses quantum mechanical phenomena like
  │ superposition and entanglement to process information in ways
  │ impossible for traditional computers.
  └───────────────────────────────────────────────────────────

  ┌─ gpt-4o
  │ Quantum computing leverages the principles of quantum mechanics —
  │ superposition and entanglement — to perform computations that
  │ would be infeasible for classical computers.
  └───────────────────────────────────────────────────────────

  Total cost: $0.0016 · Avg latency: 799ms · 3 model(s)

Why PromptBench?

  • One command, multiple models — no switching between playgrounds
  • Side-by-side comparison — latency, tokens, cost, and full responses
  • Supports major providers — OpenAI, Anthropic, Google Gemini
  • Multiple output formats — terminal table, JSON, CSV, Markdown
  • Fast parallel execution — all models run concurrently by default
  • Batch mode — test multiple prompts from a file
  • Zero required dependencies — install only the providers you need
  • Pipe-friendly — works with stdin for scripting workflows

Install

pip install promptbench-cli

Install with provider SDKs:

# Individual providers
pip install "promptbench-cli[openai]"
pip install "promptbench-cli[anthropic]"
pip install "promptbench-cli[google]"

# All providers at once
pip install "promptbench-cli[all]"

Setup

Export your API keys:

export OPENAI_API_KEY="sk-..."
export ANTHROPIC_API_KEY="sk-ant-..."
export GOOGLE_API_KEY="AI..."

You only need keys for the providers you want to use. PromptBench will warn you if a key is missing.

Usage

Basic comparison (default: GPT-4o, Claude Sonnet, Gemini Flash)

promptbench "Explain quantum computing in one sentence"

Pick specific models

promptbench "Write a haiku about coding" -m gpt4o sonnet flash

With a system prompt

promptbench "Summarize this text" -s "You are a concise technical writer" -m gpt4mini haiku

Batch prompts from a file

# prompts.txt — one prompt per line
promptbench -f prompts.txt -m gpt4o sonnet

JSON output

promptbench "What is Python?" -o json

CSV output

promptbench "What is Python?" -o csv > results.csv

Save results to file

promptbench "Compare REST vs GraphQL" --save results.json

Pipe from stdin

echo "What is the meaning of life?" | promptbench -m gpt4o sonnet

List all supported models

promptbench --list-models

Supported Models

Alias Model Provider
gpt4o / gpt4 gpt-4o OpenAI
gpt4mini gpt-4o-mini OpenAI
gpt3.5 gpt-3.5-turbo OpenAI
sonnet / claude-sonnet claude-sonnet-4-20250514 Anthropic
haiku / claude-haiku claude-haiku-4-5-20251001 Anthropic
flash / gemini-flash gemini-2.0-flash Google
gemini-pro gemini-1.5-pro Google

You can also use full model names directly (e.g., gpt-4-turbo, gemini-1.5-flash).

Python Library Usage

from promptbench.runner import run_bench
from promptbench.display import display_comparison

run = run_bench(
    prompt="Explain recursion simply",
    models=["gpt4o", "sonnet", "flash"],
    temperature=0.5,
)

print(display_comparison(run))

# Access individual results
for result in run.results:
    print(f"{result.model}: {result.latency_ms:.0f}ms, {result.cost_usd:.6f} USD")

Configuration Flags

Flag Description Default
-m, --models Models to test gpt4o sonnet flash
-s, --system System prompt None
-t, --temperature Sampling temperature 0.7
--max-tokens Max output tokens 1024
-f, --file Prompts file path None
-o, --output Output format: table, json, csv, markdown table
--full Show full responses (no truncation) Off
--no-parallel Run models sequentially Off
--save Save results to JSON file None
--list-models List supported models
--version Show version

Contributing

See CONTRIBUTING.md for development setup, how to add providers, and PR guidelines.

License

MIT — see LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

promptbench_cli-0.1.0.tar.gz (17.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

promptbench_cli-0.1.0-py3-none-any.whl (14.7 kB view details)

Uploaded Python 3

File details

Details for the file promptbench_cli-0.1.0.tar.gz.

File metadata

  • Download URL: promptbench_cli-0.1.0.tar.gz
  • Upload date:
  • Size: 17.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for promptbench_cli-0.1.0.tar.gz
Algorithm Hash digest
SHA256 ea4ba075771c7eb5c4adedc93f6c62b111034b18dece9647ad814998dd8e16d8
MD5 244481eb43e91886a03318d6c001a5a5
BLAKE2b-256 5a1e925be7c3223496f482e8b5342d3064e4cc6e63c8afb1623855db0971c0ff

See more details on using hashes here.

File details

Details for the file promptbench_cli-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for promptbench_cli-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0d37549b821379a547f42a903d5b58df78aea969a3b97692401ca7002822da71
MD5 69584d892be2ee2f30fbf2ce51087d04
BLAKE2b-256 60267deef0ad9a0f025353946585ae5d06787d426f67e9fae625efb623303a0c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page