Skip to main content

LLM inference benchmarking toolkit

Project description

Tokenomics

Benchmarking suite for OpenAI-compatible inference servers. Measures throughput, latency, and steady-state performance.

Example benchmark

Install

pip install tokenomics

From source

git clone https://github.com/tugot17/tokenomics.git
cd tokenomics
uv venv --python 3.12 --seed && source .venv/bin/activate
uv pip install -e .

Completion Benchmark

Sends chat completion requests to any OpenAI-compatible server and records per-request and system-wide metrics.

Usage

# Burst mode — fires all requests at once
tokenomics completion \
  --dataset-config examples/dataset_configs/aime_simple.json \
  --scenario "N(100,50)/(50,0)" \
  --model your-model \
  --batch-sizes 1,2,4,8

# Sustained mode — maintains constant concurrency via semaphore
tokenomics completion \
  --dataset-config examples/dataset_configs/aime_simple.json \
  --scenario "N(100,50)/(50,0)" \
  --model your-model \
  --max-concurrency 1,2,4,8 \
  --num-prompts 128

The two modes are mutually exclusive. Burst is good for peak throughput; sustained gives realistic production numbers.

Traffic Scenarios

Pattern Example Description
D(in,out) D(100,50) Fixed token counts
N(mu,sigma)/(mu,sigma) N(100,50)/(50,0) Normal distribution
U(min,max)/(min,max) U(50,150)/(20,80) Uniform distribution

Datasets

The benchmark concatenates random text snippets from a dataset until it reaches the input token count specified by the scenario. Snippets are picked with replacement, so even a small dataset can produce long prompts. If the target is smaller than a single snippet, you get one full snippet (no truncation).

Dataset config format

A dataset config is a JSON file with a source section:

Local file (TXT, CSV, or JSON):

{
  "source": { "type": "file", "path": "../data/prompts.txt" },
  "prompt_column": "text"
}

File paths are resolved relative to the config file.

HuggingFace dataset:

{
  "source": {
    "type": "huggingface",
    "path": "squad",
    "huggingface_kwargs": { "split": "train" }
  },
  "prompt_column": "question"
}

AIME (built-in shortcut):

{
  "source": { "type": "aime" }
}

See examples/dataset_configs/ for more examples.

Key Options

Flag Description
--dataset-config Path to JSON dataset config (see examples/dataset_configs/)
--scenario Traffic pattern
--model Model name
--api-base Server URL (default: http://localhost:8000/v1)
--batch-sizes Burst mode sweep points
--max-concurrency Sustained mode sweep points
--num-prompts Prompts per sweep point in sustained mode
--num-runs Runs per sweep point (default: 3)
--max-tokens Max output tokens (default: 4096)
--results-dir Output directory (one JSON per sweep value)
--lora-strategy LoRA distribution: single, uniform, zipf, mixed, all-unique
--lora-names Comma-separated LoRA adapter names

Metrics

Per-request:

  • TTFT — time to first token (prefill latency)
  • Decode throughput — output tokens/s per request
  • TPOT — time per output token

System-wide:

  • End-to-end output throughputtotal_output_tokens / wall_time, includes ramp-up and drain
  • Steady-state output throughput — median tok/s across time buckets where the batch is >= 80% full, isolating true decode performance

Plotting

# Single benchmark
tokenomics plot-completion results_dir/ plot.png

# Compare multiple benchmarks
tokenomics plot-completion output.png results_dir1/ results_dir2/

Produces a 6-panel dashboard:

Left Right
Row 1 TTFT Decode throughput per request
Row 2 End-to-end output throughput Latency breakdown (prefill vs decode)
Row 3 Steady-state output throughput Time-series token buckets

Embedding Benchmark

Tests concurrent embedding throughput.

tokenomics embedding \
  --model Qwen/Qwen3-Embedding-4B \
  --sequence_lengths "200" \
  --batch_sizes "1,8,16,32,64,128,256,512" \
  --num_runs 3 \
  --results-dir embedding_results/

tokenomics plot-embedding embedding_results/ embedding_plot.png

Embedding performance

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tokenomics-0.5.2.tar.gz (3.0 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tokenomics-0.5.2-py3-none-any.whl (37.3 kB view details)

Uploaded Python 3

File details

Details for the file tokenomics-0.5.2.tar.gz.

File metadata

  • Download URL: tokenomics-0.5.2.tar.gz
  • Upload date:
  • Size: 3.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for tokenomics-0.5.2.tar.gz
Algorithm Hash digest
SHA256 b9b9c61a0f63b4f28b92b80c744c4a922009c25d9da9bccc7243f38bde473006
MD5 465b7b6016b074fa289a67e48c23b239
BLAKE2b-256 43e01442a802fd22074dec5b47e67308c188a7b6d57c0c9576c6caaf0b4dcf3d

See more details on using hashes here.

Provenance

The following attestation bundles were made for tokenomics-0.5.2.tar.gz:

Publisher: publish.yml on tugot17/tokenomics

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file tokenomics-0.5.2-py3-none-any.whl.

File metadata

  • Download URL: tokenomics-0.5.2-py3-none-any.whl
  • Upload date:
  • Size: 37.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for tokenomics-0.5.2-py3-none-any.whl
Algorithm Hash digest
SHA256 6ae7a6afc6a6357a45f039b542a777af8257673f0dc7c8f29baa37334bf64851
MD5 6cbb058d9ad71c8185e4c8648aa8c896
BLAKE2b-256 c31926b1046c2183ec0aca31a17d529a685a896560505256bf17f400e6e4ce63

See more details on using hashes here.

Provenance

The following attestation bundles were made for tokenomics-0.5.2-py3-none-any.whl:

Publisher: publish.yml on tugot17/tokenomics

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page