A/B test prompts across LLM providers from your terminal

These details have not been verified by PyPI

Project links

Project description

⚡ PromptBench

A/B test prompts across LLM providers from your terminal.

Compare how different LLMs respond to the same prompt — see latency, token usage, cost, and responses side by side in one command.

$ promptbench "Explain quantum computing in one sentence"

  ⚡ PROMPTBENCH RESULTS
  ────────────────────────────────────────────────────────────

  Prompt: Explain quantum computing in one sentence

  Model                               Latency     Tokens         Cost
  ─────────────────────────────────── ────────── ────────── ────────────
  claude-sonnet-4-20250514                 1.24s        142     $0.0006
  gemini-2.0-flash                    ⚡💰 312ms         98     $0.0000
  gpt-4o                                  845ms        127     $0.0010

  ┌─ claude-sonnet-4-20250514
  │ Quantum computing uses qubits that can exist in superpositions of
  │ 0 and 1 simultaneously, enabling parallel computation that can
  │ solve certain problems exponentially faster than classical computers.
  └───────────────────────────────────────────────────────────

  ┌─ gemini-2.0-flash
  │ Quantum computing harnesses quantum mechanical phenomena like
  │ superposition and entanglement to process information in ways
  │ impossible for traditional computers.
  └───────────────────────────────────────────────────────────

  ┌─ gpt-4o
  │ Quantum computing leverages the principles of quantum mechanics —
  │ superposition and entanglement — to perform computations that
  │ would be infeasible for classical computers.
  └───────────────────────────────────────────────────────────

  Total cost: $0.0016 · Avg latency: 799ms · 3 model(s)

Why PromptBench?

One command, multiple models — no switching between playgrounds
Side-by-side comparison — latency, tokens, cost, and full responses
Supports major providers — OpenAI, Anthropic, Google Gemini
Multiple output formats — terminal table, JSON, CSV, Markdown
Fast parallel execution — all models run concurrently by default
Batch mode — test multiple prompts from a file
Zero required dependencies — install only the providers you need
Pipe-friendly — works with stdin for scripting workflows

Install

pip install promptbench-cli

Install with provider SDKs:

# Individual providers
pip install "promptbench-cli[openai]"
pip install "promptbench-cli[anthropic]"
pip install "promptbench-cli[google]"

# All providers at once
pip install "promptbench-cli[all]"

Setup

Export your API keys:

export OPENAI_API_KEY="sk-..."
export ANTHROPIC_API_KEY="sk-ant-..."
export GOOGLE_API_KEY="AI..."

You only need keys for the providers you want to use. PromptBench will warn you if a key is missing.

Usage

Basic comparison (default: GPT-4o, Claude Sonnet, Gemini Flash)

promptbench "Explain quantum computing in one sentence"

Pick specific models

promptbench "Write a haiku about coding" -m gpt4o sonnet flash

With a system prompt

promptbench "Summarize this text" -s "You are a concise technical writer" -m gpt4mini haiku

Batch prompts from a file

# prompts.txt — one prompt per line
promptbench -f prompts.txt -m gpt4o sonnet

JSON output

promptbench "What is Python?" -o json

CSV output

promptbench "What is Python?" -o csv > results.csv

Save results to file

promptbench "Compare REST vs GraphQL" --save results.json

Pipe from stdin

echo "What is the meaning of life?" | promptbench -m gpt4o sonnet

List all supported models

promptbench --list-models

Supported Models

Alias	Model	Provider
`gpt4o` / `gpt4`	gpt-4o	OpenAI
`gpt4mini`	gpt-4o-mini	OpenAI
`gpt3.5`	gpt-3.5-turbo	OpenAI
`sonnet` / `claude-sonnet`	claude-sonnet-4-20250514	Anthropic
`haiku` / `claude-haiku`	claude-haiku-4-5-20251001	Anthropic
`flash` / `gemini-flash`	gemini-2.0-flash	Google
`gemini-pro`	gemini-1.5-pro	Google

You can also use full model names directly (e.g., gpt-4-turbo, gemini-1.5-flash).

Python Library Usage

from promptbench.runner import run_bench
from promptbench.display import display_comparison

run = run_bench(
    prompt="Explain recursion simply",
    models=["gpt4o", "sonnet", "flash"],
    temperature=0.5,
)

print(display_comparison(run))

# Access individual results
for result in run.results:
    print(f"{result.model}: {result.latency_ms:.0f}ms, {result.cost_usd:.6f} USD")

Configuration Flags

Flag	Description	Default
`-m, --models`	Models to test	`gpt4o sonnet flash`
`-s, --system`	System prompt	None
`-t, --temperature`	Sampling temperature	`0.7`
`--max-tokens`	Max output tokens	`1024`
`-f, --file`	Prompts file path	None
`-o, --output`	Output format: table, json, csv, markdown	`table`
`--full`	Show full responses (no truncation)	Off
`--no-parallel`	Run models sequentially	Off
`--save`	Save results to JSON file	None
`--list-models`	List supported models	—
`--version`	Show version	—

Contributing

See CONTRIBUTING.md for development setup, how to add providers, and PR guidelines.

License

MIT — see LICENSE for details.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

Mar 26, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

promptbench_cli-0.1.0.tar.gz (17.1 kB view details)

Uploaded Mar 26, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

promptbench_cli-0.1.0-py3-none-any.whl (14.7 kB view details)

Uploaded Mar 26, 2026 Python 3

File details

Details for the file promptbench_cli-0.1.0.tar.gz.

File metadata

Download URL: promptbench_cli-0.1.0.tar.gz
Upload date: Mar 26, 2026
Size: 17.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for promptbench_cli-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`ea4ba075771c7eb5c4adedc93f6c62b111034b18dece9647ad814998dd8e16d8`
MD5	`244481eb43e91886a03318d6c001a5a5`
BLAKE2b-256	`5a1e925be7c3223496f482e8b5342d3064e4cc6e63c8afb1623855db0971c0ff`

See more details on using hashes here.

File details

Details for the file promptbench_cli-0.1.0-py3-none-any.whl.

File metadata

Download URL: promptbench_cli-0.1.0-py3-none-any.whl
Upload date: Mar 26, 2026
Size: 14.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for promptbench_cli-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`0d37549b821379a547f42a903d5b58df78aea969a3b97692401ca7002822da71`
MD5	`69584d892be2ee2f30fbf2ce51087d04`
BLAKE2b-256	`60267deef0ad9a0f025353946585ae5d06787d426f67e9fae625efb623303a0c`

See more details on using hashes here.

promptbench-cli 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

⚡ PromptBench

Why PromptBench?

Install

Setup

Usage

Basic comparison (default: GPT-4o, Claude Sonnet, Gemini Flash)

Pick specific models

With a system prompt

Batch prompts from a file

JSON output

CSV output

Save results to file

Pipe from stdin

List all supported models

Supported Models

Python Library Usage

Configuration Flags

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes