Skip to main content

LLM inference benchmarking harness with pluggable backends

Project description

splleed

LLM inference benchmarking harness with pluggable backends.

Features

  • Pluggable backends: vLLM, TGI (more coming)
  • Comprehensive metrics: TTFT, ITL, TPOT, throughput, E2E latency
  • Multiple modes: throughput, latency, serve simulation
  • Flexible operation: Connect to existing servers or let splleed manage them

Installation

# Clone the repo
git clone https://github.com/Bradley-Butcher/Splleed.git
cd Splleed

# With uv (recommended)
uv sync
uv run splleed --help

# Or with pip
pip install -e .
splleed --help

Inference engines (vLLM, TGI) are not bundled - install them separately as needed.

Quick Start

# Run a benchmark
splleed run examples/vllm.yaml

# Other commands
splleed validate config.yaml   # Check config syntax
splleed backends               # List available backends
splleed init -o config.yaml    # Generate example config

Configuration

Connect Mode

Connect to an already-running server:

backend:
  type: vllm
  endpoint: http://localhost:8000

Managed Mode

Let splleed start and stop the server:

backend:
  type: vllm
  model: Qwen/Qwen2.5-0.5B-Instruct
  port: 8000

Full Example

backend:
  type: vllm
  model: meta-llama/Llama-3.1-8B-Instruct
  port: 8000
  gpu_memory_utilization: 0.9

dataset:
  type: inline
  prompts:
    - "What is the capital of France?"
    - "Explain quantum computing."

benchmark:
  mode: latency        # throughput, latency, or serve
  concurrency: [1, 4, 8]
  warmup: 2
  runs: 10

sampling:
  max_tokens: 100
  temperature: 0.0

output:
  format: json

See examples/ for more configurations.

Metrics

Metric Description
TTFT Time to first token
ITL Inter-token latency
TPOT Time per output token
E2E End-to-end latency
Throughput Tokens/sec

All latency metrics include p50, p95, p99, and mean.

Backend Setup

For managed mode, splleed finds the engine executable via:

  1. Config: executable: /path/to/vllm
  2. Env var: SPLLEED_VLLM_PATH or SPLLEED_TGI_PATH
  3. System PATH

Adding Backends

splleed new-backend my_engine

See src/splleed/backends/_template/ for the template.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

splleed-0.1.0a1.tar.gz (384.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

splleed-0.1.0a1-py3-none-any.whl (43.3 kB view details)

Uploaded Python 3

File details

Details for the file splleed-0.1.0a1.tar.gz.

File metadata

  • Download URL: splleed-0.1.0a1.tar.gz
  • Upload date:
  • Size: 384.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for splleed-0.1.0a1.tar.gz
Algorithm Hash digest
SHA256 17157a87cb3ae0ab7646d261cd1106640e6da09231aeea5021f292d7b8a963e9
MD5 3703a88269185fd5ef0c83fcfd005194
BLAKE2b-256 abe6a2912fa2e0473cd94d6760c22db643dace4791e00114158ebc7678be0c05

See more details on using hashes here.

Provenance

The following attestation bundles were made for splleed-0.1.0a1.tar.gz:

Publisher: publish.yml on Bradley-Butcher/Splleed

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file splleed-0.1.0a1-py3-none-any.whl.

File metadata

  • Download URL: splleed-0.1.0a1-py3-none-any.whl
  • Upload date:
  • Size: 43.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for splleed-0.1.0a1-py3-none-any.whl
Algorithm Hash digest
SHA256 56f7e67ec3f475359e04aa602cf87e9a4fcb7dd18812d2f17929d28ffab26bb4
MD5 8857d57a620e3efe06f1041055bc74c5
BLAKE2b-256 d7d6b451d508492dd36eb45b472643a24aaceeb6c68ca2da056a1d191f3723cc

See more details on using hashes here.

Provenance

The following attestation bundles were made for splleed-0.1.0a1-py3-none-any.whl:

Publisher: publish.yml on Bradley-Butcher/Splleed

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page