A/B test prompts across LLM providers from your terminal
Project description
⚡ PromptBench
A/B test prompts across LLM providers from your terminal.
Compare how different LLMs respond to the same prompt — see latency, token usage, cost, and responses side by side in one command.
$ promptbench "Explain quantum computing in one sentence"
⚡ PROMPTBENCH RESULTS
────────────────────────────────────────────────────────────
Prompt: Explain quantum computing in one sentence
Model Latency Tokens Cost
─────────────────────────────────── ────────── ────────── ────────────
claude-sonnet-4-20250514 1.24s 142 $0.0006
gemini-2.0-flash ⚡💰 312ms 98 $0.0000
gpt-4o 845ms 127 $0.0010
┌─ claude-sonnet-4-20250514
│ Quantum computing uses qubits that can exist in superpositions of
│ 0 and 1 simultaneously, enabling parallel computation that can
│ solve certain problems exponentially faster than classical computers.
└───────────────────────────────────────────────────────────
┌─ gemini-2.0-flash
│ Quantum computing harnesses quantum mechanical phenomena like
│ superposition and entanglement to process information in ways
│ impossible for traditional computers.
└───────────────────────────────────────────────────────────
┌─ gpt-4o
│ Quantum computing leverages the principles of quantum mechanics —
│ superposition and entanglement — to perform computations that
│ would be infeasible for classical computers.
└───────────────────────────────────────────────────────────
Total cost: $0.0016 · Avg latency: 799ms · 3 model(s)
Why PromptBench?
- One command, multiple models — no switching between playgrounds
- Side-by-side comparison — latency, tokens, cost, and full responses
- Supports major providers — OpenAI, Anthropic, Google Gemini
- Multiple output formats — terminal table, JSON, CSV, Markdown
- Fast parallel execution — all models run concurrently by default
- Batch mode — test multiple prompts from a file
- Zero required dependencies — install only the providers you need
- Pipe-friendly — works with stdin for scripting workflows
Install
pip install promptbench-cli
Install with provider SDKs:
# Individual providers
pip install "promptbench-cli[openai]"
pip install "promptbench-cli[anthropic]"
pip install "promptbench-cli[google]"
# All providers at once
pip install "promptbench-cli[all]"
Setup
Export your API keys:
export OPENAI_API_KEY="sk-..."
export ANTHROPIC_API_KEY="sk-ant-..."
export GOOGLE_API_KEY="AI..."
You only need keys for the providers you want to use. PromptBench will warn you if a key is missing.
Usage
Basic comparison (default: GPT-4o, Claude Sonnet, Gemini Flash)
promptbench "Explain quantum computing in one sentence"
Pick specific models
promptbench "Write a haiku about coding" -m gpt4o sonnet flash
With a system prompt
promptbench "Summarize this text" -s "You are a concise technical writer" -m gpt4mini haiku
Batch prompts from a file
# prompts.txt — one prompt per line
promptbench -f prompts.txt -m gpt4o sonnet
JSON output
promptbench "What is Python?" -o json
CSV output
promptbench "What is Python?" -o csv > results.csv
Save results to file
promptbench "Compare REST vs GraphQL" --save results.json
Pipe from stdin
echo "What is the meaning of life?" | promptbench -m gpt4o sonnet
List all supported models
promptbench --list-models
Supported Models
| Alias | Model | Provider |
|---|---|---|
gpt4o / gpt4 |
gpt-4o | OpenAI |
gpt4mini |
gpt-4o-mini | OpenAI |
gpt3.5 |
gpt-3.5-turbo | OpenAI |
sonnet / claude-sonnet |
claude-sonnet-4-20250514 | Anthropic |
haiku / claude-haiku |
claude-haiku-4-5-20251001 | Anthropic |
flash / gemini-flash |
gemini-2.0-flash | |
gemini-pro |
gemini-1.5-pro |
You can also use full model names directly (e.g., gpt-4-turbo, gemini-1.5-flash).
Python Library Usage
from promptbench.runner import run_bench
from promptbench.display import display_comparison
run = run_bench(
prompt="Explain recursion simply",
models=["gpt4o", "sonnet", "flash"],
temperature=0.5,
)
print(display_comparison(run))
# Access individual results
for result in run.results:
print(f"{result.model}: {result.latency_ms:.0f}ms, {result.cost_usd:.6f} USD")
Configuration Flags
| Flag | Description | Default |
|---|---|---|
-m, --models |
Models to test | gpt4o sonnet flash |
-s, --system |
System prompt | None |
-t, --temperature |
Sampling temperature | 0.7 |
--max-tokens |
Max output tokens | 1024 |
-f, --file |
Prompts file path | None |
-o, --output |
Output format: table, json, csv, markdown | table |
--full |
Show full responses (no truncation) | Off |
--no-parallel |
Run models sequentially | Off |
--save |
Save results to JSON file | None |
--list-models |
List supported models | — |
--version |
Show version | — |
Contributing
See CONTRIBUTING.md for development setup, how to add providers, and PR guidelines.
License
MIT — see LICENSE for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file promptbench_cli-0.1.0.tar.gz.
File metadata
- Download URL: promptbench_cli-0.1.0.tar.gz
- Upload date:
- Size: 17.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ea4ba075771c7eb5c4adedc93f6c62b111034b18dece9647ad814998dd8e16d8
|
|
| MD5 |
244481eb43e91886a03318d6c001a5a5
|
|
| BLAKE2b-256 |
5a1e925be7c3223496f482e8b5342d3064e4cc6e63c8afb1623855db0971c0ff
|
File details
Details for the file promptbench_cli-0.1.0-py3-none-any.whl.
File metadata
- Download URL: promptbench_cli-0.1.0-py3-none-any.whl
- Upload date:
- Size: 14.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0d37549b821379a547f42a903d5b58df78aea969a3b97692401ca7002822da71
|
|
| MD5 |
69584d892be2ee2f30fbf2ce51087d04
|
|
| BLAKE2b-256 |
60267deef0ad9a0f025353946585ae5d06787d426f67e9fae625efb623303a0c
|