Skip to main content

A lightweight, cross-platform latency and throughput profiler for LLMs

Project description

LLMeter (Logo)

Measuring large language models latency and throughput

Latest Version Supported Python Versions Code Style: Ruff

LLMeter is a pure-python library for simple latency and throughput testing of large language models (LLMs). It's designed to be lightweight to install; straightforward to run standard tests; and versatile to integrate - whether in notebooks, CI/CD, or other workflows.

🛠️ Installation

LLMeter requires python>=3.10, please make sure your current version of python is compatible.

To install the basic metering functionalities, you can install the minimum package using pip install:

pip install llmeter

LLMeter also offers extra features that require additional dependencies. Currently these extras include:

  • plotting: Add methods to generate charts and heatmaps to summarize the results
  • openai: Enable testing endpoints offered by OpenAI
  • litellm: Enable testing a range of different models through LiteLLM

You can install one or more of these extra options using pip:

pip install llmeter[plotting, openai, litellm]

🚀 Quick-start

At a high level, you'll start by configuring an LLMeter "Endpoint" for whatever type of LLM you're connecting to:

# For example with Amazon Bedrock...
from llmeter.endpoints import BedrockConverse
endpoint = BedrockConverse(model_id="...")

# ...or OpenAI...
from llmeter.endpoints import OpenAIEndpoint
endpoint = OpenAIEndpoint(model_id="...", api_key="...")

# ...or via LiteLLM...
from llmeter.endpoints import LiteLLM
endpoint = LiteLLM("{provider}/{model_id}")

# ...and so on

You can then run the high-level "experiments" offered by LLMeter:

# For example a heatmap of latency by input & output token count:
from llmeter.experiments import LatencyHeatmap
latency_heatmap = LatencyHeatmap(
    endpoint=endpoint,
    clients=10,
    source_file="examples/MaryShelleyFrankenstein.txt",
    ...
)
heatmap_results = await latency_heatmap.run()
latency_heatmap.plot_heatmap()

# ...Or testing how throughput varies with concurrent request count:
from llmeter.experiments import LoadTest
sweep_test = LoadTest(
    endpoint=endpoint,
    payload={...},
    sequence_of_clients=[1, 5, 20, 50, 100, 500],
)
sweep_results = await sweep_test.run()
sweep_test.plot_sweep_results()

Alternatively, you can use the low-level llmeter.runner.Runner class to run and analyze request batches - and build your own custom experiments.

For more details, check out our selection of end-to-end code examples in the examples folder!

Security

See CONTRIBUTING for more information.

License

This project is licensed under the Apache-2.0 License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llmeter-0.1.3.tar.gz (27.6 kB view details)

Uploaded Source

Built Distribution

llmeter-0.1.3-py3-none-any.whl (34.3 kB view details)

Uploaded Python 3

File details

Details for the file llmeter-0.1.3.tar.gz.

File metadata

  • Download URL: llmeter-0.1.3.tar.gz
  • Upload date:
  • Size: 27.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for llmeter-0.1.3.tar.gz
Algorithm Hash digest
SHA256 b3ccba7d05804795e373de9052a0ee3c79ac4dae1dc5528216c9a1081a92fa5d
MD5 892d7cb34d523365ccd638b40c67e373
BLAKE2b-256 7d1fa23ff7748e962a6fae9258387be931673a2ee218009309d3a3fbf09a0240

See more details on using hashes here.

File details

Details for the file llmeter-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: llmeter-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 34.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for llmeter-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 153f93912eb31b7558c36bc245b3e5f4dd02f7e4e877955358aa027f04f38fe6
MD5 c975dd32204d7bffdaf63292094efd95
BLAKE2b-256 fbd3149329ebf0b49976fb453ea4ab4ffc3a40deab3fb2eba5f1f37c0eca9dd6

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page