Skip to main content

LLM Inference Benchmarking Tool

Project description

EchoSwift: LLM Inference Benchmarking Tool by Infobell IT

EchoSwift is a powerful and flexible tool designed for benchmarking Large Language Model (LLM) inference. It allows users to measure and analyze the performance of LLM endpoints across various metrics, including token latency, throughput, and time to first token (TTFT).

Features

  • Benchmark LLM inference across multiple Inference Servers
  • Measure key performance metrics: latency, throughput, and TTFT
  • Support for varying input and output token lengths
  • Simulate concurrent users to test scalability
  • Easy-to-use CLI interface
  • Detailed logging and progress tracking

Performance metrics:

The performance metrics captured for varying input and output tokens and parallel users while running the benchmark includes

  • Latency (ms/token)
  • TTFT(ms)
  • Throughput(tokens/sec)

Supported Inference Servers

  • TGI
  • vLLM
  • Ollama
  • Llamacpp
  • NIMS

Installation

You can install EchoSwift using pip:

pip install echoswift

Alternatively, you can install from source:

git clone --branch akhil https://github.com/Infobellit-Solutions-Pvt-Ltd/EchoSwift.git
cd EchoSwift
pip install -e .

Usage

EchoSwift provides a simple CLI interface for running LLM Inference benchmarks.

Below are the steps to run a sample test, assuming the generation endpoint is active.

1. Download the Dataset and create a default config.json

Before running a benchmark, you need to download and filter the dataset:

echoswift dataprep

This command will download the filtered ShareGPT dataset from Huggingface and creates a sample config.json

2. Configure the Benchmark

Modify the config.json file in the project root directory. Here's an example configuration:

{
  "_comment": "EchoSwift Configuration",
  "out_dir": "test_results",
  "base_url": "http://10.216.178.15:8000/v1/completions",
  "provider": "vLLM",
  "model": "meta-llama/Meta-Llama-3-8B",
  "max_requests": 5,
  "user_counts": [3],
  "input_tokens": [32],
  "output_tokens": [256]
}

Adjust these parameters according to your LLM endpoint you're benchmarking.

3. Run the Benchmark

To start the benchmark using the configuration from config.json:

echoswift start --config path/to/your/config.json

4. Plot the Results

echoswift plot --results-dir path/to/your/results_dir

Output

EchoSwift will create a results directory (or the directory specified in out_dir) containing:

  • CSV files with raw benchmark data
  • Averaged results for each combination of users, input tokens, and output tokens
  • Log files for each Locust run

Analyzing Results

After the benchmark completes, you can find CSV files in the output directory. These files contain information about latency, throughput, and TTFT for each test configuration.

Citation

If you find our resource useful, please cite our paper:

EchoSwift: An Inference Benchmarking and Configuration Discovery Tool for Large Language Models (LLMs)

@inproceedings{Krishna2024,
  series = {ICPE '24},
  title = {EchoSwift: An Inference Benchmarking and Configuration Discovery Tool for Large Language Models (LLMs)},
  url = {https://dl.acm.org/doi/10.1145/3629527.3652273},
  DOI = {10.1145/3629527.3652273},
  booktitle = {Companion of the 15th ACM/SPEC International Conference on Performance Engineering},
  publisher = {ACM},
  author = {Krishna, Karthik and Bandili, Ramana},
  year = {2024},
  month = May,
  collection = {ICPE '24}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

echoswift-1.1.1.tar.gz (17.2 kB view details)

Uploaded Source

Built Distribution

echoswift-1.1.1-py3-none-any.whl (18.3 kB view details)

Uploaded Python 3

File details

Details for the file echoswift-1.1.1.tar.gz.

File metadata

  • Download URL: echoswift-1.1.1.tar.gz
  • Upload date:
  • Size: 17.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.15

File hashes

Hashes for echoswift-1.1.1.tar.gz
Algorithm Hash digest
SHA256 6712851ace5dcbe2892779c28678f46aa01be8235d6855dc55e4133cb713a0e9
MD5 77b9d5f9cc80afff05773982a50315df
BLAKE2b-256 1e191668de91d46b9af5abfab9158c519d5e52b89751b2476c86974dc71918e3

See more details on using hashes here.

File details

Details for the file echoswift-1.1.1-py3-none-any.whl.

File metadata

  • Download URL: echoswift-1.1.1-py3-none-any.whl
  • Upload date:
  • Size: 18.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.15

File hashes

Hashes for echoswift-1.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 605e0d8378c6693af6753ecb1e4c57f5f31e7b4939f66bc6af15269ee0263838
MD5 09956884bf66c2a86889981ad952bfce
BLAKE2b-256 c3a27565b7c13a910b8b380ce7e5e60ebbdaed154525ba172ac81dae9b2e4cca

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page