LLM Inference Benchmark CLI - measure TTFT, TPS, ITL, E2E latency for any OpenAI-compatible API
Project description
llm-benchmark-runner
LLM 推理性能测评 CLI 工具 -- 测量 TTFT / TPS / ITL / E2E 延迟,输出标准 JSON 可直接导入 Web 端 查看可视化图表。
适用于浏览器无法触达的场景(CORS 未配置、无头服务器 SSH 环境、CI/CD 集成等)。
安装
pip:
pip install llm-benchmark-runner
uv:
uv pip install llm-benchmark-runner
从源码(开发模式):
cd runner
pip install -e .
使用
# 基本用法
llm-benchmark --url http://localhost:11434 --model llama3.2
# 完整参数
llm-benchmark \
--url http://localhost:11434 \
--model llama3.2 \
--name "My Ollama" \
--prompt "Write a short essay about AI." \
--max-tokens 512 \
--repeat 10 \
--concurrency 1,2,4,8 \
--output results.json
# 也可以用 python -m 方式运行
python -m llm_benchmark_runner --url http://localhost:11434 --model llama3.2
参数说明
| 参数 | 默认值 | 说明 |
|---|---|---|
--url |
(必填) | 模型 API 的 Base URL |
--model |
(必填) | Model ID |
--name |
自动生成 | 端点显示名称 |
--api-key |
空 | API Key |
--prompt |
内置 | 测试 Prompt |
--max-tokens |
256 | 最大输出 token 数 |
--repeat |
5 | 每个并发级别重复次数 |
--concurrency |
1,2,4,8 | 并发级别(逗号分隔) |
--output |
自动命名 | 输出 JSON 文件路径 |
--version |
- | 显示版本号 |
输出格式
输出的 JSON 文件遵循 BenchmarkSession 标准格式,包含:
- 测评配置(prompt、maxTokens、repeatCount、concurrencyLevels)
- 每个端点的聚合指标(TTFT / TPS / ITL / E2E 的 mean/median/p95/p99/min/max/stdDev)
- 并发压测结果(各并发级别的吞吐量和延迟)
- 五维雷达评分(Speed / Responsiveness / Smoothness / Scalability / Stability)
- 原始请求数据(逐请求的 token 时间戳)
导入 Web 端
输出的 JSON 文件可直接导入 Web 端的 "历史记录 -> 导入 JSON" 查看可视化图表:
- 打开 Web 端(https://benchmark-for-llm.vercel.app)
- 切换到 "历史记录" Tab
- 点击 "导入" 按钮
- 选择 CLI 输出的 JSON 文件
支持的 API 格式
- OpenAI Chat Completions API(
/v1/chat/completions) - Ollama(兼容 OpenAI 格式 + 原生格式)
- LM Studio
- vLLM
- llama.cpp
- 任何支持 SSE streaming 的 OpenAI 兼容 API
开发
cd runner
uv build # 构建分发包
uv publish # 发布到 PyPI(需要配置 token)
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file llm_benchmark_runner-0.2.0.tar.gz.
File metadata
- Download URL: llm_benchmark_runner-0.2.0.tar.gz
- Upload date:
- Size: 12.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2cb708845711021ff50c248bd792c41a9766edc3a908e62cd6b0d54b0d860aa8
|
|
| MD5 |
60494a87927f6e685e3659a3d35e05e8
|
|
| BLAKE2b-256 |
083b57328cb25515cba18bcc65b9941a3923455942d49a00e76fa6c84d4a1b6c
|
Provenance
The following attestation bundles were made for llm_benchmark_runner-0.2.0.tar.gz:
Publisher:
publish-pypi.yml on kuhung/benchmark-for-LLM
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
llm_benchmark_runner-0.2.0.tar.gz -
Subject digest:
2cb708845711021ff50c248bd792c41a9766edc3a908e62cd6b0d54b0d860aa8 - Sigstore transparency entry: 1582383507
- Sigstore integration time:
-
Permalink:
kuhung/benchmark-for-LLM@82029dd1ae916efbaf0f84c1f425b7ae01c79a66 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/kuhung
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-pypi.yml@82029dd1ae916efbaf0f84c1f425b7ae01c79a66 -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file llm_benchmark_runner-0.2.0-py3-none-any.whl.
File metadata
- Download URL: llm_benchmark_runner-0.2.0-py3-none-any.whl
- Upload date:
- Size: 9.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
405c410b2c97048905243e84dd0b11a330d9445ef710fb81d9480d9a4ff25bfe
|
|
| MD5 |
9ec94b72c8cab72adc933b9406461f8c
|
|
| BLAKE2b-256 |
022d2736f9d0c244b79ccf30e2740e25a7b1572037c8560a8b444a0cd94cd6de
|
Provenance
The following attestation bundles were made for llm_benchmark_runner-0.2.0-py3-none-any.whl:
Publisher:
publish-pypi.yml on kuhung/benchmark-for-LLM
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
llm_benchmark_runner-0.2.0-py3-none-any.whl -
Subject digest:
405c410b2c97048905243e84dd0b11a330d9445ef710fb81d9480d9a4ff25bfe - Sigstore transparency entry: 1582383625
- Sigstore integration time:
-
Permalink:
kuhung/benchmark-for-LLM@82029dd1ae916efbaf0f84c1f425b7ae01c79a66 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/kuhung
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-pypi.yml@82029dd1ae916efbaf0f84c1f425b7ae01c79a66 -
Trigger Event:
workflow_dispatch
-
Statement type: