Dead-simple LLM benchmarking CLI. Measure TTFT, TPS, latency, cost, and quality for any OpenAI-compatible API.

These details have not been verified by PyPI

Project description

🏎️ bench-my-llm

New here? Start with the Getting Started Guide.

Stop guessing which model is faster. Measure it.

Point bench-my-llm at any OpenAI-compatible API and get latency, throughput, cost, and quality metrics in seconds. Compare models side by side. Get a beautiful terminal report. Ship with confidence.

✨ Features

🔥 TTFT Measurement - Time to first token via streaming
⚡ Tokens per Second - Real throughput numbers
📊 p50 / p95 / p99 Latencies - Production-grade percentiles
💰 Cost Estimation - Know what you're spending
🎯 Quality Scoring - Compare responses against reference answers
🏁 Model Comparison - Side-by-side with winner highlights
📦 Built-in Prompt Suites - Reasoning, coding, creative, factual
🔌 Any OpenAI-compatible API - OpenAI, Anthropic, Ollama, vLLM, Together, and more
💾 Export to JSON - Pipe into CI, dashboards, or your own tools

🚀 Quick Start

pip install bench-my-llm

Single Model Benchmark

bench-my-llm run --model gpt-4o --suite reasoning

┌──────────────────────────────────────────────────────────┐
│  🏎️  Benchmark Report                                    │
│  bench-my-llm results for gpt-4o                         │
│  Suite: reasoning | Prompts: 5 | Cost: $0.0043           │
└──────────────────────────────────────────────────────────┘

          Latency Summary
┌────────┬────────────┬────────────────────┐
│ Metric │ TTFT (ms)  │ Total Latency (ms) │
├────────┼────────────┼────────────────────┤
│ p50    │ 234.1      │ 1,523.4            │
│ p95    │ 312.7      │ 2,187.9            │
│ p99    │ 348.2      │ 2,401.3            │
│ Mean   │ 251.3      │ 1,687.2            │
└────────┴────────────┴────────────────────┘

       Throughput & Quality
┌───────────────────┬─────────────┐
│ Metric            │ Value       │
├───────────────────┼─────────────┤
│ Mean TPS          │ 67.3 tok/s  │
│ Median TPS        │ 64.8 tok/s  │
│ Quality Score     │ 82%         │
│ Estimated Cost    │ $0.0043     │
└───────────────────┴─────────────┘

Model Comparison

bench-my-llm compare gpt-4o gpt-4o-mini --suite reasoning

┌──────────────────────────────────────────────────────────┐
│  🏁 Model Comparison                                     │
│  gpt-4o vs gpt-4o-mini                                   │
└──────────────────────────────────────────────────────────┘

              Head-to-Head
┌────────────────────────┬─────────┬─────────────┐
│ Metric                 │ gpt-4o  │ gpt-4o-mini │
├────────────────────────┼─────────┼─────────────┤
│ TTFT p50 (ms)          │ 234.1   │ 142.3  🏆   │
│ TTFT p95 (ms)          │ 312.7   │ 198.4  🏆   │
│ Total Latency p50 (ms) │ 1523.4  │ 876.2  🏆   │
│ Mean TPS               │ 67.3 🏆 │ 54.1        │
│ Cost (USD)             │ $0.0043 │ $0.0008 🏆  │
│ Quality Score          │ 0.82 🏆 │ 0.71        │
└────────────────────────┴─────────┴─────────────┘

🏆 Winner: gpt-4o-mini (4/6 metrics)

📖 Usage

Custom Prompts

Pass your own prompts file (JSON array):

[
  {"text": "Explain quantum computing", "category": "factual", "reference": "...", "max_tokens": 256}
]

Prompt Suites

Suite	Description	Prompts
`reasoning`	Logic, math, step-by-step	5
`coding`	Code generation and explanation	5
`creative`	Writing, storytelling, metaphors	5
`factual`	Knowledge recall, definitions	5
`all`	Everything combined	20

Export Results

bench-my-llm run --model gpt-4o --suite all --output results.json
bench-my-llm report results.json

Local Models (Ollama)

bench-my-llm run --model llama3 --base-url http://localhost:11434/v1 --api-key ollama

CI Integration

Add to your GitHub Actions workflow:

- name: Benchmark LLM
  run: |
    pip install bench-my-llm
    bench-my-llm run --model gpt-4o-mini --suite reasoning --output benchmark.json
  env:
    OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}

- name: Upload results
  uses: actions/upload-artifact@v4
  with:
    name: benchmark-results
    path: benchmark.json

🛠️ Development

git clone https://github.com/manasvardhan/bench-my-llm.git
cd bench-my-llm
python -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"
pytest

📄 License

MIT. See LICENSE.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.1.1

Feb 17, 2026

This version

0.1.0

Feb 17, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bench_my_llm-0.1.0.tar.gz (16.6 kB view details)

Uploaded Feb 17, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

bench_my_llm-0.1.0-py3-none-any.whl (15.7 kB view details)

Uploaded Feb 17, 2026 Python 3

File details

Details for the file bench_my_llm-0.1.0.tar.gz.

File metadata

Download URL: bench_my_llm-0.1.0.tar.gz
Upload date: Feb 17, 2026
Size: 16.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for bench_my_llm-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`76d1ea3d10efb6d0001d9dd4a2c12e5c32121594d6ecea9e4469a4c30d90e9f3`
MD5	`42cd90e6794adbc8b91a476102d8f913`
BLAKE2b-256	`8084a8e70aa726b66e0a0842fb90578b94c3547addae619ae8a0a7537548bc21`

See more details on using hashes here.

File details

Details for the file bench_my_llm-0.1.0-py3-none-any.whl.

File metadata

Download URL: bench_my_llm-0.1.0-py3-none-any.whl
Upload date: Feb 17, 2026
Size: 15.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for bench_my_llm-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`6ae9e6210b2a34fb628d6e959022e03b05db50f2bdae035dc0fdd466a6724b8c`
MD5	`1cff116773f3dd60363b9356cf737071`
BLAKE2b-256	`33352e03632b95aa89b3184bcc64c00709260380fdf88495862315ff1cbeb279`

See more details on using hashes here.

bench-my-llm 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

🏎️ bench-my-llm

✨ Features

🚀 Quick Start

Single Model Benchmark

Model Comparison

📖 Usage

Custom Prompts

Prompt Suites

Export Results

Local Models (Ollama)

CI Integration

🛠️ Development

📄 License

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes