Async HTTP benchmarking utility with pluggable workloads and load models.

Project description

bench-maker

Async HTTP benchmarking with pluggable workload-types (protocols), workloads (datasets), load models, hooks, and optional periodic monitors.

+--------+   item   +---------------+   request   +-----------+   +---------+
|workload|--------->| workload-type |------------>| pre-hooks |-->| aiohttp |
|(dataset|          | (protocol)    |             +-----------+   +---------+
| / log) |          | make_request  |                                 |
+--------+          | make_sample   |              +------------+     v
   ^                +---------------+              | post-hooks |<----+
   |                                               +------------+
   +-- load model decides WHEN to fire ----+              v
                                           |        +----------+
              monitors run alongside ------+------->| metrics  |
              (Prometheus, NVML, ...)               | aggregator|
                                                    +----------+

Install

pip install -e .
pip install -e .[dev]   # for tests

This installs the benchmaker Python package and the bench-maker CLI.

30-second tour

import asyncio
from benchmaker import BenchConfig, BenchRunner, ConstantRPS, HttpWorkloadType

async def main():
    cfg = BenchConfig(
        workload_type=HttpWorkloadType(url="https://httpbin.org/get"),
        load=ConstantRPS(rps=50, duration_s=10),
    )
    result = await BenchRunner(cfg).run()
    print(result.summary)

asyncio.run(main())

Or via the CLI:

bench-maker quick --url https://httpbin.org/get --rate poisson:50 --duration 10s

Walkthrough: benchmarking an LLM endpoint with ShareGPT

A realistic LLM benchmark needs a real prompt distribution. ShareGPT V3 is a common choice — multi-turn human/assistant conversations scraped from real ChatGPT users. A cleaned, benchmark-ready copy is published at researchcomputer/llmsys-bench (split="sharegpt"), with one row per conversation:

{"id": "...", "messages": [{"role": "user", "content": "..."},
                           {"role": "assistant", "content": "..."},
                           {"role": "user", "content": "..."}]}

messages is the only content field — it's everything a chat benchmark needs. Each row is truncated to end on a user turn, so it's a valid generation request: the server completes the final assistant reply given the prior history. Short source conversations collapse to a single user turn (a plain single-turn prompt); longer ones carry multi-turn context.

Load it directly from the Hub

Pull the published split and feed each row's messages list straight into the chat workload-type (pip install -e .[hf]):

import asyncio
from datasets import load_dataset
from benchmaker import (
    BenchConfig, BenchRunner, OpenAIChatWorkloadType,
    IterableWorkload, parse_rate_spec,
)

async def main():
    ds = load_dataset("researchcomputer/llmsys-bench", split="sharegpt")
    cfg = BenchConfig(
        workload_type=OpenAIChatWorkloadType(
            url="http://localhost:8000/v1/chat/completions",
            model="meta-llama/Llama-3.1-8B-Instruct",
            max_tokens=256,
        ),
        workload=IterableWorkload(row["messages"] for row in ds),
        load=parse_rate_spec("poisson:8", duration_s=60),
        timeout_s=600,
    )
    result = await BenchRunner(cfg).run()
    print(result.summary)

asyncio.run(main())

OpenAIChatWorkloadType receives the message list as-is, so single-turn rows send one user message and multi-turn rows replay the full history before the server generates the final assistant turn. TTFT, inter-token latency, and tokens/sec are captured the same way in both cases. URL / model / API key can also come from .env via OpenAIChatWorkloadType.from_env(...).

Rebuild or customize it yourself

The published split is produced by tools/prepare_sharegpt.py, which downloads the upstream JSON once into .local/ (gitignored) and converts it to the JSONL shape above. Run it when you want a subset, different filtering, or a refresh:

# Defaults: .local/sharegpt_v3_raw.json  ->  .local/sharegpt_v3.jsonl
python tools/prepare_sharegpt.py

# A quick subset for smoke tests:
python tools/prepare_sharegpt.py --max-items 2000

The raw download is ~700 MB. Use --min-chars / --max-chars to drop empty or pathologically long conversations (measured over total message content per row). Point any workload at the local file with JsonlWorkload(path=..., field="messages"), or on the CLI:

bench-maker llm \
    --url   http://localhost:8000/v1/chat/completions \
    --model meta-llama/Llama-3.1-8B-Instruct \
    --prompts-jsonl .local/sharegpt_v3.jsonl \
    --prompt-field  messages \
    --max-tokens 256 \
    --rate poisson:8 --duration 60s \
    --out-dir ./runs --label dataset=sharegpt

To re-publish after regenerating, tools/upload_sharegpt_hf.py pushes the JSONL back to the Hub (needs a write token).

Documentation

Full docs live in docs/:

Quickstart
Concepts — WorkloadType, Workload, LoadModel, Monitor
Load models — rate-spec syntax, open vs closed loop
Workloads & workload-types — built-ins and custom subclasses
Hooks — pre/post request processing
Monitors — vLLM /metrics, GPU telemetry, custom samplers
Metrics & output — summary structure, JSONL dumps
Correctness / accuracy eval — grade responses against references
CLI & YAML reference
ShareGPT benchmark — self-contained end-to-end walkthrough

Examples

Under examples/:

simple_get.py — minimal library usage
custom_hooks.py — request signing + response parsing
llm_chat.py — OpenAI-compatible LLM endpoint with streaming
vllm_with_monitor.py — LLM benchmark with concurrent vLLM /metrics scrape
sandbox_exec.py — Flash Sandbox /exec latency benchmark
sandbox_lifecycle.py — full create → exec → delete cold-start benchmark
llm_eval.py — LLM benchmark + accuracy grading (exact/regex/judge)
gsm8k_eval.py — GSM8K from HuggingFace + integer-match scorer
config.yaml — generic HTTP YAML config
config_llm.yaml — LLM YAML config with a Prometheus monitor

Helper scripts under tools/:

prepare_sharegpt.py — fetch ShareGPT V3 and convert to a generic JSONL
upload_sharegpt_hf.py — push the converted JSONL to the HF Hub (write token)
start_local_llm.sh — example local SGLang launch command

Project layout

benchmaker/          # library code
entrypoints/         # CLI (bench-maker)
examples/            # runnable examples
tools/               # one-off helper scripts (dataset prep, etc.)
tests/               # pytest smoke tests
docs/                # reference docs

Run the tests

pytest -q

Project details

Release history Release notifications | RSS feed

0.1.2

Jun 11, 2026

0.1.1

Jun 10, 2026

This version

0.1.0

May 28, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

benchmaker-0.1.0.tar.gz (74.3 kB view details)

Uploaded May 28, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

benchmaker-0.1.0-py3-none-any.whl (63.9 kB view details)

Uploaded May 28, 2026 Python 3

File details

Details for the file benchmaker-0.1.0.tar.gz.

File metadata

Download URL: benchmaker-0.1.0.tar.gz
Upload date: May 28, 2026
Size: 74.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for benchmaker-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`84103f8b64ae7aa41d6618e3821d5743b5fa0efb8faaec99c1ca2046bd571b5d`
MD5	`b809f32a674fd4bb41f0490e8c4decc4`
BLAKE2b-256	`2b3524abeafa48acaab26b2b53f57c2fd84ca3c286a55c8a8a78ff1eff1c6b41`

See more details on using hashes here.

File details

Details for the file benchmaker-0.1.0-py3-none-any.whl.

File metadata

Download URL: benchmaker-0.1.0-py3-none-any.whl
Upload date: May 28, 2026
Size: 63.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for benchmaker-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`87a24cffc9ff3f3a8fb798c7bf6df38f3bf6cc7c2018cce019e5c770a9a9f131`
MD5	`5de29f0983bb578601e5adff140938bf`
BLAKE2b-256	`346cf01a26577199241befc71bc93a4b33eac97cedd96757efbf7ecb4e6b96a6`

See more details on using hashes here.

benchmaker 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

bench-maker

Install

30-second tour

Walkthrough: benchmarking an LLM endpoint with ShareGPT

Load it directly from the Hub

Rebuild or customize it yourself

Documentation

Examples

Project layout

Run the tests

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes