Async HTTP benchmarking utility with pluggable workloads and load models.
Project description
bench-maker
Async HTTP benchmarking with pluggable workload-types (protocols), workloads (datasets), load models, hooks, and optional periodic monitors.
+--------+ item +---------------+ request +-----------+ +---------+
|workload|--------->| workload-type |------------>| pre-hooks |-->| aiohttp |
|(dataset| | (protocol) | +-----------+ +---------+
| / log) | | make_request | |
+--------+ | make_sample | +------------+ v
^ +---------------+ | post-hooks |<----+
| +------------+
+-- load model decides WHEN to fire ----+ v
| +----------+
monitors run alongside ------+------->| metrics |
(Prometheus, NVML, ...) | aggregator|
+----------+
Install
pip install -e .
pip install -e .[dev] # for tests
This installs the benchmaker Python package and the bench-maker CLI.
30-second tour
import asyncio
from benchmaker import BenchConfig, BenchRunner, ConstantRPS, HttpWorkloadType
async def main():
cfg = BenchConfig(
workload_type=HttpWorkloadType(url="https://httpbin.org/get"),
load=ConstantRPS(rps=50, duration_s=10),
)
result = await BenchRunner(cfg).run()
print(result.summary)
asyncio.run(main())
Or via the CLI:
bench-maker quick --url https://httpbin.org/get --rate poisson:50 --duration 10s
Walkthrough: benchmarking an LLM endpoint with ShareGPT
A realistic LLM benchmark needs a real prompt distribution.
ShareGPT V3
is a common choice — multi-turn human/assistant conversations scraped from real
ChatGPT users. A cleaned, benchmark-ready copy is published at
researchcomputer/llmsys-bench
(split="sharegpt"), with one row per conversation:
{"id": "...", "messages": [{"role": "user", "content": "..."},
{"role": "assistant", "content": "..."},
{"role": "user", "content": "..."}]}
messages is the only content field — it's everything a chat benchmark needs.
Each row is truncated to end on a user turn, so it's a valid generation
request: the server completes the final assistant reply given the prior
history. Short source conversations collapse to a single user turn (a plain
single-turn prompt); longer ones carry multi-turn context.
Load it directly from the Hub
Pull the published split and feed each row's messages list straight into the
chat workload-type (pip install -e .[hf]):
import asyncio
from datasets import load_dataset
from benchmaker import (
BenchConfig, BenchRunner, OpenAIChatWorkloadType,
IterableWorkload, parse_rate_spec,
)
async def main():
ds = load_dataset("researchcomputer/llmsys-bench", split="sharegpt")
cfg = BenchConfig(
workload_type=OpenAIChatWorkloadType(
url="http://localhost:8000/v1/chat/completions",
model="meta-llama/Llama-3.1-8B-Instruct",
max_tokens=256,
),
workload=IterableWorkload(row["messages"] for row in ds),
load=parse_rate_spec("poisson:8", duration_s=60),
timeout_s=600,
)
result = await BenchRunner(cfg).run()
print(result.summary)
asyncio.run(main())
OpenAIChatWorkloadType receives the message list as-is, so single-turn rows
send one user message and multi-turn rows replay the full history before the
server generates the final assistant turn. TTFT, inter-token latency, and
tokens/sec are captured the same way in both cases. URL / model / API key can
also come from .env via OpenAIChatWorkloadType.from_env(...).
Rebuild or customize it yourself
The published split is produced by tools/prepare_sharegpt.py, which downloads
the upstream JSON once into .local/ (gitignored) and converts it to the JSONL
shape above. Run it when you want a subset, different filtering, or a refresh:
# Defaults: .local/sharegpt_v3_raw.json -> .local/sharegpt_v3.jsonl
python tools/prepare_sharegpt.py
# A quick subset for smoke tests:
python tools/prepare_sharegpt.py --max-items 2000
The raw download is ~700 MB. Use --min-chars / --max-chars to drop empty or
pathologically long conversations (measured over total message content per
row). Point any workload at the local file with JsonlWorkload(path=..., field="messages"), or on the CLI:
bench-maker llm \
--url http://localhost:8000/v1/chat/completions \
--model meta-llama/Llama-3.1-8B-Instruct \
--prompts-jsonl .local/sharegpt_v3.jsonl \
--prompt-field messages \
--max-tokens 256 \
--rate poisson:8 --duration 60s \
--out-dir ./runs --label dataset=sharegpt
To re-publish after regenerating, tools/upload_sharegpt_hf.py pushes the
JSONL back to the Hub (needs a write token).
Documentation
Full docs live in docs/:
- Quickstart
- Concepts — WorkloadType, Workload, LoadModel, Monitor
- Load models — rate-spec syntax, open vs closed loop
- Workloads & workload-types — built-ins and custom subclasses
- Hooks — pre/post request processing
- Monitors — vLLM
/metrics, GPU telemetry, custom samplers - Metrics & output — summary structure, JSONL dumps
- Correctness / accuracy eval — grade responses against references
- CLI & YAML reference
- ShareGPT benchmark — self-contained end-to-end walkthrough
Examples
Under examples/:
simple_get.py— minimal library usagecustom_hooks.py— request signing + response parsingllm_chat.py— OpenAI-compatible LLM endpoint with streamingvllm_with_monitor.py— LLM benchmark with concurrent vLLM/metricsscrapesandbox_exec.py— Flash Sandbox/execlatency benchmarksandbox_lifecycle.py— full create → exec → delete cold-start benchmarkllm_eval.py— LLM benchmark + accuracy grading (exact/regex/judge)gsm8k_eval.py— GSM8K from HuggingFace + integer-match scorerconfig.yaml— generic HTTP YAML configconfig_llm.yaml— LLM YAML config with a Prometheus monitor
Helper scripts under tools/:
prepare_sharegpt.py— fetch ShareGPT V3 and convert to a generic JSONLupload_sharegpt_hf.py— push the converted JSONL to the HF Hub (write token)start_local_llm.sh— example local SGLang launch command
Project layout
benchmaker/ # library code
entrypoints/ # CLI (bench-maker)
examples/ # runnable examples
tools/ # one-off helper scripts (dataset prep, etc.)
tests/ # pytest smoke tests
docs/ # reference docs
Run the tests
pytest -q
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file benchmaker-0.1.0.tar.gz.
File metadata
- Download URL: benchmaker-0.1.0.tar.gz
- Upload date:
- Size: 74.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
84103f8b64ae7aa41d6618e3821d5743b5fa0efb8faaec99c1ca2046bd571b5d
|
|
| MD5 |
b809f32a674fd4bb41f0490e8c4decc4
|
|
| BLAKE2b-256 |
2b3524abeafa48acaab26b2b53f57c2fd84ca3c286a55c8a8a78ff1eff1c6b41
|
File details
Details for the file benchmaker-0.1.0-py3-none-any.whl.
File metadata
- Download URL: benchmaker-0.1.0-py3-none-any.whl
- Upload date:
- Size: 63.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
87a24cffc9ff3f3a8fb798c7bf6df38f3bf6cc7c2018cce019e5c770a9a9f131
|
|
| MD5 |
5de29f0983bb578601e5adff140938bf
|
|
| BLAKE2b-256 |
346cf01a26577199241befc71bc93a4b33eac97cedd96757efbf7ecb4e6b96a6
|