Skip to main content

Async HTTP benchmarking utility with pluggable workloads and load models.

Project description

bench-maker

Async HTTP benchmarking with pluggable workload-types (protocols), workloads (datasets), load models, hooks, and optional periodic monitors.

+--------+   item   +---------------+   request   +-----------+   +---------+
|workload|--------->| workload-type |------------>| pre-hooks |-->| aiohttp |
|(dataset|          | (protocol)    |             +-----------+   +---------+
| / log) |          | make_request  |                                 |
+--------+          | make_sample   |              +------------+     v
   ^                +---------------+              | post-hooks |<----+
   |                                               +------------+
   +-- load model decides WHEN to fire ----+              v
                                           |        +----------+
              monitors run alongside ------+------->| metrics  |
              (Prometheus, NVML, ...)               | aggregator|
                                                    +----------+

Install

pip install -e .
pip install -e .[dev]   # for tests

This installs the benchmaker Python package and the bench-maker CLI.

30-second tour

import asyncio
from benchmaker import BenchConfig, BenchRunner, ConstantRPS, HttpWorkloadType

async def main():
    cfg = BenchConfig(
        workload_type=HttpWorkloadType(url="https://httpbin.org/get"),
        load=ConstantRPS(rps=50, duration_s=10),
    )
    result = await BenchRunner(cfg).run()
    print(result.summary)

asyncio.run(main())

Or via the CLI:

bench-maker quick --url https://httpbin.org/get --rate poisson:50 --duration 10s

Walkthrough: benchmarking an LLM endpoint with ShareGPT

A realistic LLM benchmark needs a real prompt distribution. ShareGPT V3 is a common choice — multi-turn human/assistant conversations scraped from real ChatGPT users. A cleaned, benchmark-ready copy is published at researchcomputer/llmsys-bench (split="sharegpt"), with one row per conversation:

{"id": "...", "messages": [{"role": "user", "content": "..."},
                           {"role": "assistant", "content": "..."},
                           {"role": "user", "content": "..."}]}

messages is the only content field — it's everything a chat benchmark needs. Each row is truncated to end on a user turn, so it's a valid generation request: the server completes the final assistant reply given the prior history. Short source conversations collapse to a single user turn (a plain single-turn prompt); longer ones carry multi-turn context.

Load it directly from the Hub

Pull the published split and feed each row's messages list straight into the chat workload-type (pip install -e .[hf]):

import asyncio
from datasets import load_dataset
from benchmaker import (
    BenchConfig, BenchRunner, OpenAIChatWorkloadType,
    IterableWorkload, parse_rate_spec,
)

async def main():
    ds = load_dataset("researchcomputer/llmsys-bench", split="sharegpt")
    cfg = BenchConfig(
        workload_type=OpenAIChatWorkloadType(
            url="http://localhost:8000/v1/chat/completions",
            model="meta-llama/Llama-3.1-8B-Instruct",
            max_tokens=256,
        ),
        workload=IterableWorkload(row["messages"] for row in ds),
        load=parse_rate_spec("poisson:8", duration_s=60),
        timeout_s=600,
    )
    result = await BenchRunner(cfg).run()
    print(result.summary)

asyncio.run(main())

OpenAIChatWorkloadType receives the message list as-is, so single-turn rows send one user message and multi-turn rows replay the full history before the server generates the final assistant turn. TTFT, inter-token latency, and tokens/sec are captured the same way in both cases. URL / model / API key can also come from .env via OpenAIChatWorkloadType.from_env(...).

Rebuild or customize it yourself

The published split is produced by tools/prepare_sharegpt.py, which downloads the upstream JSON once into .local/ (gitignored) and converts it to the JSONL shape above. Run it when you want a subset, different filtering, or a refresh:

# Defaults: .local/sharegpt_v3_raw.json  ->  .local/sharegpt_v3.jsonl
python tools/prepare_sharegpt.py

# A quick subset for smoke tests:
python tools/prepare_sharegpt.py --max-items 2000

The raw download is ~700 MB. Use --min-chars / --max-chars to drop empty or pathologically long conversations (measured over total message content per row). Point any workload at the local file with JsonlWorkload(path=..., field="messages"), or on the CLI:

bench-maker llm \
    --url   http://localhost:8000/v1/chat/completions \
    --model meta-llama/Llama-3.1-8B-Instruct \
    --prompts-jsonl .local/sharegpt_v3.jsonl \
    --prompt-field  messages \
    --max-tokens 256 \
    --rate poisson:8 --duration 60s \
    --out-dir ./runs --label dataset=sharegpt

To re-publish after regenerating, tools/upload_sharegpt_hf.py pushes the JSONL back to the Hub (needs a write token).

Documentation

Full docs live in docs/:

Examples

Under examples/:

  • simple_get.py — minimal library usage
  • custom_hooks.py — request signing + response parsing
  • llm_chat.py — OpenAI-compatible LLM endpoint with streaming
  • vllm_with_monitor.py — LLM benchmark with concurrent vLLM /metrics scrape
  • sandbox_exec.py — Flash Sandbox /exec latency benchmark
  • sandbox_lifecycle.py — full create → exec → delete cold-start benchmark
  • llm_eval.py — LLM benchmark + accuracy grading (exact/regex/judge)
  • gsm8k_eval.py — GSM8K from HuggingFace + integer-match scorer
  • config.yaml — generic HTTP YAML config
  • config_llm.yaml — LLM YAML config with a Prometheus monitor

Helper scripts under tools/:

  • prepare_sharegpt.py — fetch ShareGPT V3 and convert to a generic JSONL
  • upload_sharegpt_hf.py — push the converted JSONL to the HF Hub (write token)
  • start_local_llm.sh — example local SGLang launch command

Project layout

benchmaker/          # library code
entrypoints/         # CLI (bench-maker)
examples/            # runnable examples
tools/               # one-off helper scripts (dataset prep, etc.)
tests/               # pytest smoke tests
docs/                # reference docs

Run the tests

pytest -q

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

benchmaker-0.1.0.tar.gz (74.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

benchmaker-0.1.0-py3-none-any.whl (63.9 kB view details)

Uploaded Python 3

File details

Details for the file benchmaker-0.1.0.tar.gz.

File metadata

  • Download URL: benchmaker-0.1.0.tar.gz
  • Upload date:
  • Size: 74.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for benchmaker-0.1.0.tar.gz
Algorithm Hash digest
SHA256 84103f8b64ae7aa41d6618e3821d5743b5fa0efb8faaec99c1ca2046bd571b5d
MD5 b809f32a674fd4bb41f0490e8c4decc4
BLAKE2b-256 2b3524abeafa48acaab26b2b53f57c2fd84ca3c286a55c8a8a78ff1eff1c6b41

See more details on using hashes here.

File details

Details for the file benchmaker-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: benchmaker-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 63.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for benchmaker-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 87a24cffc9ff3f3a8fb798c7bf6df38f3bf6cc7c2018cce019e5c770a9a9f131
MD5 5de29f0983bb578601e5adff140938bf
BLAKE2b-256 346cf01a26577199241befc71bc93a4b33eac97cedd96757efbf7ecb4e6b96a6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page