LLM inference benchmarking toolkit

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

Tokenomics

Benchmarking suite for OpenAI-compatible inference servers. Measures throughput, latency, and steady-state performance.

Example benchmark

Install

uv venv --python 3.12 --seed && source .venv/bin/activate
uv pip install -e .

Completion Benchmark

Sends chat completion requests to any OpenAI-compatible server and records per-request and system-wide metrics.

Usage

# Burst mode — fires all requests at once
tokenomics completion \
  --dataset-config examples/dataset_configs/aime_simple.json \
  --scenario "N(100,50)/(50,0)" \
  --model your-model \
  --batch-sizes 1,2,4,8

# Sustained mode — maintains constant concurrency via semaphore
tokenomics completion \
  --dataset-config examples/dataset_configs/aime_simple.json \
  --scenario "N(100,50)/(50,0)" \
  --model your-model \
  --max-concurrency 1,2,4,8 \
  --num-prompts 128

The two modes are mutually exclusive. Burst is good for peak throughput; sustained gives realistic production numbers.

Traffic Scenarios

Pattern	Example	Description
`D(in,out)`	`D(100,50)`	Fixed token counts
`N(mu,sigma)/(mu,sigma)`	`N(100,50)/(50,0)`	Normal distribution
`U(min,max)/(min,max)`	`U(50,150)/(20,80)`	Uniform distribution
`I(w,h)`	`I(512,512)`	Image input

Key Options

Flag	Description
`--dataset-config`	Path to JSON dataset config (see `examples/dataset_configs/`)
`--scenario`	Traffic pattern
`--model`	Model name
`--api-base`	Server URL (default: `http://localhost:8000/v1`)
`--batch-sizes`	Burst mode sweep points
`--max-concurrency`	Sustained mode sweep points
`--num-prompts`	Prompts per sweep point in sustained mode
`--num-runs`	Runs per sweep point (default: 3)
`--max-tokens`	Max output tokens (default: 4096)
`--results-dir`	Output directory (one JSON per sweep value)
`--lora-strategy`	LoRA distribution: single, uniform, zipf, mixed, all-unique
`--lora-names`	Comma-separated LoRA adapter names

Metrics

Per-request:

TTFT — time to first token (prefill latency)
Decode throughput — output tokens/s per request
TPOT — time per output token

System-wide:

End-to-end output throughput — total_output_tokens / wall_time, includes ramp-up and drain
Steady-state output throughput — median tok/s across time buckets where the batch is >= 80% full, isolating true decode performance

Plotting

# Single benchmark
tokenomics plot-completion results_dir/ plot.png

# Compare multiple benchmarks
tokenomics plot-completion output.png results_dir1/ results_dir2/

Produces a 6-panel dashboard:

	Left	Right
Row 1	TTFT	Decode throughput per request
Row 2	End-to-end output throughput	Latency breakdown (prefill vs decode)
Row 3	Steady-state output throughput	Time-series token buckets

Embedding Benchmark

Tests concurrent embedding throughput.

tokenomics embedding \
  --model Qwen/Qwen3-Embedding-4B \
  --sequence_lengths "200" \
  --batch_sizes "1,8,16,32,64,128,256,512" \
  --num_runs 3 \
  --results-dir embedding_results/

tokenomics plot-embedding embedding_results/ embedding_plot.png

Embedding performance

Project details

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

0.6.1

Mar 30, 2026

0.6.0

Mar 27, 2026

0.5.4

Mar 26, 2026

0.5.3

Mar 25, 2026

0.5.2

Mar 24, 2026

This version

0.5.1

Mar 24, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tokenomics-0.5.1.tar.gz (3.0 MB view details)

Uploaded Mar 24, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

tokenomics-0.5.1-py3-none-any.whl (35.6 kB view details)

Uploaded Mar 24, 2026 Python 3

File details

Details for the file tokenomics-0.5.1.tar.gz.

File metadata

Download URL: tokenomics-0.5.1.tar.gz
Upload date: Mar 24, 2026
Size: 3.0 MB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for tokenomics-0.5.1.tar.gz
Algorithm	Hash digest
SHA256	`1bc9083be017105d16ad26804e6d1c9ed01f6185fffff1cf6bade950b2b1ac9c`
MD5	`7f87e8c2458fb6759a13a9b7a415cafc`
BLAKE2b-256	`a03778f3a0de7253ab2a1ebc31d2b4718be081d4aa9b05ce9b69c08e3e00e083`

See more details on using hashes here.

Provenance

The following attestation bundles were made for tokenomics-0.5.1.tar.gz:

Publisher: publish.yml on tugot17/tokenomics

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: tokenomics-0.5.1.tar.gz
- Subject digest: 1bc9083be017105d16ad26804e6d1c9ed01f6185fffff1cf6bade950b2b1ac9c
- Sigstore transparency entry: 1175116402
- Sigstore integration time: Mar 24, 2026
Source repository:
- Permalink: tugot17/tokenomics@ec69723936aaf3a6d89f08fd3b696e1688a7ff74
- Branch / Tag: refs/tags/v0.5.1
- Owner: https://github.com/tugot17
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@ec69723936aaf3a6d89f08fd3b696e1688a7ff74
- Trigger Event: release

File details

Details for the file tokenomics-0.5.1-py3-none-any.whl.

File metadata

Download URL: tokenomics-0.5.1-py3-none-any.whl
Upload date: Mar 24, 2026
Size: 35.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for tokenomics-0.5.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`869b927b4bf0f50e8a16c61394e917e1d211561eab230eaa7feee66b85e517a2`
MD5	`31aa17888d891c0add6e5073d6c15ea9`
BLAKE2b-256	`970b5f6d8ddb3864e9528884fd4762ec5bd9ddb838547079e6b62900a8d3bd4a`

See more details on using hashes here.

Provenance

The following attestation bundles were made for tokenomics-0.5.1-py3-none-any.whl:

Publisher: publish.yml on tugot17/tokenomics

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: tokenomics-0.5.1-py3-none-any.whl
- Subject digest: 869b927b4bf0f50e8a16c61394e917e1d211561eab230eaa7feee66b85e517a2
- Sigstore transparency entry: 1175116524
- Sigstore integration time: Mar 24, 2026
Source repository:
- Permalink: tugot17/tokenomics@ec69723936aaf3a6d89f08fd3b696e1688a7ff74
- Branch / Tag: refs/tags/v0.5.1
- Owner: https://github.com/tugot17
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@ec69723936aaf3a6d89f08fd3b696e1688a7ff74
- Trigger Event: release

tokenomics 0.5.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Tokenomics

Install

Completion Benchmark

Usage

Traffic Scenarios

Key Options

Metrics

Plotting

Embedding Benchmark

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance