LLM inference benchmarking harness with pluggable backends

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

Bradderz

These details have not been verified by PyPI

Project description

splleed

LLM inference benchmarking with a Python-first API.

Features

Python API: Write benchmarks as scripts, not config files
Pluggable backends: vLLM, TGI (more coming)
Comprehensive metrics: TTFT, ITL, TPOT, throughput, E2E latency
Statistical rigor: Multiple trials with confidence intervals
Flexible operation: Connect to existing servers or let splleed manage them

Installation

pip install splleed

For HuggingFace dataset support:

pip install splleed[hf]

Inference engines (vLLM, TGI) are not bundled - install them separately.

Quick Start

import asyncio
from splleed import Benchmark, VLLMConfig, SamplingParams

async def main():
    results = await Benchmark(
        backend=VLLMConfig(model="Qwen/Qwen2.5-0.5B-Instruct"),
        prompts=[
            "What is the capital of France?",
            "Explain quantum computing briefly.",
        ],
        concurrency=[1, 2, 4],
        trials=3,
        sampling=SamplingParams(max_tokens=100),
    ).run()

    results.print()
    results.save("results.json")

if __name__ == "__main__":
    asyncio.run(main())

Connect vs Managed Mode

Managed mode - splleed starts and stops the server:

backend = VLLMConfig(model="Qwen/Qwen2.5-0.5B-Instruct")

Connect mode - use an existing server:

backend = VLLMConfig(
    model="Qwen/Qwen2.5-0.5B-Instruct",
    endpoint="http://localhost:8000",
)

Using HuggingFace Datasets

from datasets import load_dataset
from splleed import Benchmark, VLLMConfig

async def main():
    ds = load_dataset("tatsu-lab/alpaca", split="train")
    ds = ds.shuffle(seed=42).select(range(100))
    prompts = list(ds["instruction"])

    results = await Benchmark(
        backend=VLLMConfig(model="Qwen/Qwen2.5-3B-Instruct"),
        prompts=prompts,
        concurrency=[1, 2, 4, 8],
        trials=3,
    ).run()

    results.print()

Backend Configuration

vLLM

from splleed import VLLMConfig

backend = VLLMConfig(
    model="meta-llama/Llama-3.1-8B-Instruct",
    tensor_parallel=2,
    gpu_memory_utilization=0.9,
    quantization="awq",  # optional
    dtype="auto",
)

TGI

from splleed import TGIConfig

backend = TGIConfig(
    model="meta-llama/Llama-3.1-8B-Instruct",
    quantize="bitsandbytes-nf4",  # optional
)

Benchmark Modes

Latency Mode (default)

Sequential requests to measure per-request latency without interference:

Benchmark(..., mode="latency")

Throughput Mode

Concurrent requests to measure maximum throughput:

Benchmark(..., mode="throughput", concurrency=[1, 4, 8, 16])

Serve Mode

Simulate realistic traffic with controlled arrival patterns:

Benchmark(
    ...,
    mode="serve",
    arrival_rate=10.0,           # 10 requests/sec
    arrival_pattern="poisson",   # realistic traffic
    concurrency=[32],            # max concurrent requests
)

Arrival patterns:

poisson - exponential inter-arrival times (realistic web traffic)
gamma - configurable burstiness
constant - fixed interval between requests

Benchmark Options

Benchmark(
    backend=...,
    prompts=["..."],

    # Benchmark settings
    mode="latency",          # "latency", "throughput", or "serve"
    concurrency=[1, 4, 8],   # concurrency levels to test
    warmup=2,                # warmup iterations
    runs=10,                 # requests per concurrency level
    trials=3,                # independent trials for CI
    confidence_level=0.95,   # confidence interval level

    # Serve mode only
    arrival_rate=10.0,       # requests per second
    arrival_pattern="poisson",  # "poisson", "gamma", "constant"

    # Sampling parameters
    sampling=SamplingParams(
        max_tokens=100,
        temperature=0.0,
        top_p=1.0,
    ),
)

Metrics

Metric	Description
TTFT	Time to first token
ITL	Inter-token latency
TPOT	Time per output token (mean ITL)
E2E	End-to-end request latency
Throughput	Tokens/sec
Goodput	% of requests meeting SLO

All latency metrics include p50, p95, p99, and mean. With multiple trials, results include 95% confidence intervals.

Output Formats

results.print()              # Rich table to console
results.save("out.json")     # JSON format
results.save("out.csv")      # CSV format

json_str = results.to_json()
csv_str = results.to_csv()

License

MIT

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

Bradderz

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.0a3 pre-release

Dec 29, 2025

0.1.0a2 pre-release

Dec 28, 2025

0.1.0a1 pre-release

Dec 28, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

splleed-0.1.0a3.tar.gz (164.6 kB view details)

Uploaded Dec 29, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

splleed-0.1.0a3-py3-none-any.whl (47.3 kB view details)

Uploaded Dec 29, 2025 Python 3

File details

Details for the file splleed-0.1.0a3.tar.gz.

File metadata

Download URL: splleed-0.1.0a3.tar.gz
Upload date: Dec 29, 2025
Size: 164.6 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for splleed-0.1.0a3.tar.gz
Algorithm	Hash digest
SHA256	`ecbad77e46273d388400002f481d9b8257b15b76097353023ed7b105d9d51aa1`
MD5	`59fdd19f5462cbc30dce0398b08726aa`
BLAKE2b-256	`3657de3502e51b4084fbd92427cde109f2e84ac1411fa534a2fa0dfd4496826b`

See more details on using hashes here.

Provenance

The following attestation bundles were made for splleed-0.1.0a3.tar.gz:

Publisher: publish.yml on Bradley-Butcher/Splleed

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: splleed-0.1.0a3.tar.gz
- Subject digest: ecbad77e46273d388400002f481d9b8257b15b76097353023ed7b105d9d51aa1
- Sigstore transparency entry: 781776166
- Sigstore integration time: Dec 29, 2025
Source repository:
- Permalink: Bradley-Butcher/Splleed@e07d101d852f939c642a9609e77a361386a849c6
- Branch / Tag: refs/tags/v0.1.0a3
- Owner: https://github.com/Bradley-Butcher
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@e07d101d852f939c642a9609e77a361386a849c6
- Trigger Event: release

File details

Details for the file splleed-0.1.0a3-py3-none-any.whl.

File metadata

Download URL: splleed-0.1.0a3-py3-none-any.whl
Upload date: Dec 29, 2025
Size: 47.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for splleed-0.1.0a3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a888ef7111dd8967f9f59ac26cf2a01b795362598401ad6257c9f4b65f07ba1a`
MD5	`0bdec37ba9c84931fb8419bb567ea0de`
BLAKE2b-256	`e2fa77b0dd58140a7164a74fb5a90e465316b1390ee3719b00243c414daab170`

See more details on using hashes here.

Provenance

The following attestation bundles were made for splleed-0.1.0a3-py3-none-any.whl:

Publisher: publish.yml on Bradley-Butcher/Splleed

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: splleed-0.1.0a3-py3-none-any.whl
- Subject digest: a888ef7111dd8967f9f59ac26cf2a01b795362598401ad6257c9f4b65f07ba1a
- Sigstore transparency entry: 781776180
- Sigstore integration time: Dec 29, 2025
Source repository:
- Permalink: Bradley-Butcher/Splleed@e07d101d852f939c642a9609e77a361386a849c6
- Branch / Tag: refs/tags/v0.1.0a3
- Owner: https://github.com/Bradley-Butcher
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@e07d101d852f939c642a9609e77a361386a849c6
- Trigger Event: release

splleed 0.1.0a3

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

splleed

Features

Installation

Quick Start

Connect vs Managed Mode

Using HuggingFace Datasets

Backend Configuration

vLLM

TGI

Benchmark Modes

Latency Mode (default)

Throughput Mode

Serve Mode

Benchmark Options

Metrics

Output Formats

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance