A lightweight, cross-platform latency and throughput profiler for LLMs
Project description
LLMeter is a pure-python library for simple latency and throughput testing of large language models (LLMs). It's designed to be lightweight to install; straightforward to run standard tests; and versatile to integrate - whether in notebooks, CI/CD, or other workflows.
🛠️ Installation
LLMeter requires python>=3.10, please make sure your current version of python is compatible.
To install the basic metering functionalities, you can install the minimum package using pip install:
pip install llmeter
LLMeter also offers extra features that require additional dependencies. Currently these extras include:
- plotting: Add methods to generate charts to summarize the results
- openai: Enable testing endpoints offered by OpenAI
- litellm: Enable testing a range of different models through LiteLLM
- mlflow: Enable logging LLMeter experiments to MLFlow
You can install one or more of these extra options using pip:
pip install 'llmeter[plotting,openai,litellm,mlflow]'
🚀 Quick-start
At a high level, you'll start by configuring an LLMeter "Endpoint" for whatever type of LLM you're connecting to:
# For example with Amazon Bedrock...
from llmeter.endpoints import BedrockConverse
endpoint = BedrockConverse(model_id="...")
# ...or OpenAI...
from llmeter.endpoints import OpenAIEndpoint
endpoint = OpenAIEndpoint(model_id="...", api_key="...")
# ...or via LiteLLM...
from llmeter.endpoints import LiteLLM
endpoint = LiteLLM("{provider}/{model_id}")
# ...and so on
You can then run the high-level "experiments" offered by LLMeter:
# Testing how throughput varies with concurrent request count:
from llmeter.experiments import LoadTest
load_test = LoadTest(
endpoint=endpoint,
payload={...},
sequence_of_clients=[1, 5, 20, 50, 100, 500],
output_path="local or S3 path"
)
load_test_results = await load_test.run()
load_test_results.plot_results()
Where payload can be a single dictionary, a list of dictionary, or a path to a JSON Line file that contains a payload for every line.
Alternatively, you can use the low-level llmeter.runner.Runner class to run and analyze request
batches - and build your own custom experiments.
from llmeter.runner import Runner
endpoint_test = Runner(
endpoint,
tokenizer=tokenizer,
output_path="local or S3 path",
)
result = await endpoint_test.run(
payload={...},
n_requests=3,
clients=3,
)
print(result.stats)
Additional functionality like cost modelling and MLFlow experiment tracking is enabled through llmeter.callbacks, and you can write your own callbacks to hook other custom logic into LLMeter test runs.
For more details, check out our selection of end-to-end code examples in the examples folder!
Analyze and compare results
You can analyze the results of a single run or a load test by generating interactive charts. You can find examples in in the examples folder.
Load testing
You can generate a collection of standard charts to visualize the result of a load test:
# Load test results
from llmeter.experiments import LoadTestResult
load_test_result = LoadTestResult.load("local or S3 path", test_name="Test result")
figures = load_test_result.plot_results()
| --- | --- |
You can see how to compare two load test in Compare load test.
Single Run visualizations
Metrics like time to first token (TTFT) and time per output token (TPOT) are described as distributions. While statistical descriptions of these distributions (median, 90th percentile, average, etc.) are a convenient way to compare them, visualizations provide insights on the endpoint behavior.
Boxplot
import plotly.graph_objects as go
from llmeter.plotting import boxplot_by_dimension
result = Result.load("local or S3 path")
fig = go.Figure()
trace = boxplot_by_dimension(result=result, dimension="time_to_first_token")
fig.add_trace(trace)
Multiple traces can easily be combined into the same figure.
Histograms
import plotly.graph_objects as go
from llmeter.plotting import histogram_by_dimension
result = Result.load("local or S3 path")
fig = go.Figure()
trace = histogram_by_dimension(result=result, dimension="time_to_first_token", xbins={"size":0.02})
fig.add_trace(trace)
Multiple traces can easily be combined into the same figure.
Security
See CONTRIBUTING for more information.
License
This project is licensed under the Apache-2.0 License.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file llmeter-0.1.7.tar.gz.
File metadata
- Download URL: llmeter-0.1.7.tar.gz
- Upload date:
- Size: 54.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
018146728aab7902bb96edbed9e98f55b895b513298d8b6fd9834afe2266fad8
|
|
| MD5 |
955aab5768481afdeff6692e1afeacf8
|
|
| BLAKE2b-256 |
11fd71ed409afd8b5252ac93b538548cc210daf7e4152beb52453eeb8d0b65f8
|
Provenance
The following attestation bundles were made for llmeter-0.1.7.tar.gz:
Publisher:
pypi.yml on awslabs/llmeter
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
llmeter-0.1.7.tar.gz -
Subject digest:
018146728aab7902bb96edbed9e98f55b895b513298d8b6fd9834afe2266fad8 - Sigstore transparency entry: 752556531
- Sigstore integration time:
-
Permalink:
awslabs/llmeter@c80380128d9a576709470f90522575997b0d2f3e -
Branch / Tag:
refs/tags/v0.1.7 - Owner: https://github.com/awslabs
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pypi.yml@c80380128d9a576709470f90522575997b0d2f3e -
Trigger Event:
push
-
Statement type:
File details
Details for the file llmeter-0.1.7-py3-none-any.whl.
File metadata
- Download URL: llmeter-0.1.7-py3-none-any.whl
- Upload date:
- Size: 68.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f76a2955c31f6dbb49b0db302541850187fd0845969f4dc945d3edfd0a982e50
|
|
| MD5 |
4f4fc1ff2b08d514008d1fb803c41d56
|
|
| BLAKE2b-256 |
23fecf271e47f464e411e5a82021d76723514748ad249bdb5fe5c243e0b5c00f
|
Provenance
The following attestation bundles were made for llmeter-0.1.7-py3-none-any.whl:
Publisher:
pypi.yml on awslabs/llmeter
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
llmeter-0.1.7-py3-none-any.whl -
Subject digest:
f76a2955c31f6dbb49b0db302541850187fd0845969f4dc945d3edfd0a982e50 - Sigstore transparency entry: 752556536
- Sigstore integration time:
-
Permalink:
awslabs/llmeter@c80380128d9a576709470f90522575997b0d2f3e -
Branch / Tag:
refs/tags/v0.1.7 - Owner: https://github.com/awslabs
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pypi.yml@c80380128d9a576709470f90522575997b0d2f3e -
Trigger Event:
push
-
Statement type: