A lightweight, cross-platform latency and throughput profiler for LLMs
Project description
LLMeter is a pure-python library for simple latency and throughput testing of large language models (LLMs). It's designed to be lightweight to install; straightforward to run standard tests; and versatile to integrate - whether in notebooks, CI/CD, or other workflows.
🛠️ Installation
LLMeter requires python>=3.10
, please make sure your current version of python is compatible.
To install the basic metering functionalities, you can install the minimum package using pip install:
pip install llmeter
LLMeter also offers extra features that require additional dependencies. Currently these extras include:
- plotting: Add methods to generate charts and heatmaps to summarize the results
- openai: Enable testing endpoints offered by OpenAI
- litellm: Enable testing a range of different models through LiteLLM
You can install one or more of these extra options using pip:
pip install llmeter[plotting, openai, litellm]
🚀 Quick-start
At a high level, you'll start by configuring an LLMeter "Endpoint" for whatever type of LLM you're connecting to:
# For example with Amazon Bedrock...
from llmeter.endpoints import BedrockConverse
endpoint = BedrockConverse(model_id="...")
# ...or OpenAI...
from llmeter.endpoints import OpenAIEndpoint
endpoint = OpenAIEndpoint(model_id="...", api_key="...")
# ...or via LiteLLM...
from llmeter.endpoints import LiteLLM
endpoint = LiteLLM("{provider}/{model_id}")
# ...and so on
You can then run the high-level "experiments" offered by LLMeter:
# For example a heatmap of latency by input & output token count:
from llmeter.experiments import LatencyHeatmap
latency_heatmap = LatencyHeatmap(
endpoint=endpoint,
clients=10,
source_file="examples/MaryShelleyFrankenstein.txt",
...
)
heatmap_results = await latency_heatmap.run()
latency_heatmap.plot_heatmap()
# ...Or testing how throughput varies with concurrent request count:
from llmeter.experiments import LoadTest
sweep_test = LoadTest(
endpoint=endpoint,
payload={...},
sequence_of_clients=[1, 5, 20, 50, 100, 500],
)
sweep_results = await sweep_test.run()
sweep_test.plot_sweep_results()
Alternatively, you can use the low-level llmeter.runner.Runner
class to run and analyze request
batches - and build your own custom experiments.
For more details, check out our selection of end-to-end code examples in the examples folder!
Security
See CONTRIBUTING for more information.
License
This project is licensed under the Apache-2.0 License.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file llmeter-0.1.3.tar.gz
.
File metadata
- Download URL: llmeter-0.1.3.tar.gz
- Upload date:
- Size: 27.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.1.1 CPython/3.12.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b3ccba7d05804795e373de9052a0ee3c79ac4dae1dc5528216c9a1081a92fa5d |
|
MD5 | 892d7cb34d523365ccd638b40c67e373 |
|
BLAKE2b-256 | 7d1fa23ff7748e962a6fae9258387be931673a2ee218009309d3a3fbf09a0240 |
File details
Details for the file llmeter-0.1.3-py3-none-any.whl
.
File metadata
- Download URL: llmeter-0.1.3-py3-none-any.whl
- Upload date:
- Size: 34.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.1.1 CPython/3.12.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 153f93912eb31b7558c36bc245b3e5f4dd02f7e4e877955358aa027f04f38fe6 |
|
MD5 | c975dd32204d7bffdaf63292094efd95 |
|
BLAKE2b-256 | fbd3149329ebf0b49976fb453ea4ab4ffc3a40deab3fb2eba5f1f37c0eca9dd6 |