LLM inference benchmarking harness with pluggable backends
Project description
splleed
LLM inference benchmarking harness with pluggable backends.
Features
- Pluggable backends: vLLM, TGI (more coming)
- Comprehensive metrics: TTFT, ITL, TPOT, throughput, E2E latency
- Multiple modes: throughput, latency, serve simulation
- Flexible operation: Connect to existing servers or let splleed manage them
Installation
# Clone the repo
git clone https://github.com/Bradley-Butcher/Splleed.git
cd Splleed
# With uv (recommended)
uv sync
uv run splleed --help
# Or with pip
pip install -e .
splleed --help
Inference engines (vLLM, TGI) are not bundled - install them separately as needed.
Quick Start
# Run a benchmark
splleed run examples/vllm.yaml
# Other commands
splleed validate config.yaml # Check config syntax
splleed backends # List available backends
splleed init -o config.yaml # Generate example config
Configuration
Connect Mode
Connect to an already-running server:
backend:
type: vllm
endpoint: http://localhost:8000
Managed Mode
Let splleed start and stop the server:
backend:
type: vllm
model: Qwen/Qwen2.5-0.5B-Instruct
port: 8000
Full Example
backend:
type: vllm
model: meta-llama/Llama-3.1-8B-Instruct
port: 8000
gpu_memory_utilization: 0.9
dataset:
type: inline
prompts:
- "What is the capital of France?"
- "Explain quantum computing."
benchmark:
mode: latency # throughput, latency, or serve
concurrency: [1, 4, 8]
warmup: 2
runs: 10
sampling:
max_tokens: 100
temperature: 0.0
output:
format: json
See examples/ for more configurations.
Metrics
| Metric | Description |
|---|---|
| TTFT | Time to first token |
| ITL | Inter-token latency |
| TPOT | Time per output token |
| E2E | End-to-end latency |
| Throughput | Tokens/sec |
All latency metrics include p50, p95, p99, and mean.
Backend Setup
For managed mode, splleed finds the engine executable via:
- Config:
executable: /path/to/vllm - Env var:
SPLLEED_VLLM_PATHorSPLLEED_TGI_PATH - System PATH
Adding Backends
splleed new-backend my_engine
See src/splleed/backends/_template/ for the template.
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file splleed-0.1.0a1.tar.gz.
File metadata
- Download URL: splleed-0.1.0a1.tar.gz
- Upload date:
- Size: 384.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
17157a87cb3ae0ab7646d261cd1106640e6da09231aeea5021f292d7b8a963e9
|
|
| MD5 |
3703a88269185fd5ef0c83fcfd005194
|
|
| BLAKE2b-256 |
abe6a2912fa2e0473cd94d6760c22db643dace4791e00114158ebc7678be0c05
|
Provenance
The following attestation bundles were made for splleed-0.1.0a1.tar.gz:
Publisher:
publish.yml on Bradley-Butcher/Splleed
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
splleed-0.1.0a1.tar.gz -
Subject digest:
17157a87cb3ae0ab7646d261cd1106640e6da09231aeea5021f292d7b8a963e9 - Sigstore transparency entry: 780781631
- Sigstore integration time:
-
Permalink:
Bradley-Butcher/Splleed@d565bb6e825b7eb8beb182728d8b55fc2fa9be0a -
Branch / Tag:
refs/tags/v0.1.0a1 - Owner: https://github.com/Bradley-Butcher
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@d565bb6e825b7eb8beb182728d8b55fc2fa9be0a -
Trigger Event:
release
-
Statement type:
File details
Details for the file splleed-0.1.0a1-py3-none-any.whl.
File metadata
- Download URL: splleed-0.1.0a1-py3-none-any.whl
- Upload date:
- Size: 43.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
56f7e67ec3f475359e04aa602cf87e9a4fcb7dd18812d2f17929d28ffab26bb4
|
|
| MD5 |
8857d57a620e3efe06f1041055bc74c5
|
|
| BLAKE2b-256 |
d7d6b451d508492dd36eb45b472643a24aaceeb6c68ca2da056a1d191f3723cc
|
Provenance
The following attestation bundles were made for splleed-0.1.0a1-py3-none-any.whl:
Publisher:
publish.yml on Bradley-Butcher/Splleed
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
splleed-0.1.0a1-py3-none-any.whl -
Subject digest:
56f7e67ec3f475359e04aa602cf87e9a4fcb7dd18812d2f17929d28ffab26bb4 - Sigstore transparency entry: 780781633
- Sigstore integration time:
-
Permalink:
Bradley-Butcher/Splleed@d565bb6e825b7eb8beb182728d8b55fc2fa9be0a -
Branch / Tag:
refs/tags/v0.1.0a1 - Owner: https://github.com/Bradley-Butcher
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@d565bb6e825b7eb8beb182728d8b55fc2fa9be0a -
Trigger Event:
release
-
Statement type: