Skip to main content

LLM Inference Benchmarking Tool

Project description

EchoSwift: LLM Inference Benchmarking Tool by Infobell IT

EchoSwift is a powerful and flexible tool designed for benchmarking Large Language Model (LLM) inference. It allows users to measure and analyze the performance of LLM endpoints across various metrics, including token latency, throughput, and time to first token (TTFT).

Features

  • Benchmark LLM inference across multiple Inference Servers
  • Measure key performance metrics: latency, throughput, and TTFT
  • Support for varying input and output token lengths
  • Simulate concurrent users to test scalability
  • Easy-to-use CLI interface
  • Detailed logging and progress tracking

Supported Inference Servers

  • TGI
  • vLLM
  • Ollama
  • Llamacpp
  • NIMS

Performance metrics:

The performance metrics captured for varying input and output tokens and parallel users while running the benchmark includes

  • Latency (ms/token)
  • TTFT(ms)
  • Throughput(tokens/sec)

Installation

You can install EchoSwift using pip:

pip install echoswift

Alternatively, you can install from source:

git clone --branch akhil https://github.com/Infobellit-Solutions-Pvt-Ltd/EchoSwift.git
cd EchoSwift
pip install -e .

Usage

EchoSwift provides a simple CLI interface for running LLM Inference benchmarks.

Below are the steps to run a sample test, assuming the generation endpoint is active.

1. Download the Dataset and create a default config.json

Before running a benchmark, you need to download and filter the dataset:

echoswift dataprep

This command will download the filtered ShareGPT dataset from Huggingface and creates a sample config.json

2. Configure the Benchmark

Modify the config.json file in the project root directory. Here's an example configuration:

{
  "_comment": "EchoSwift Configuration",
  "out_dir": "test_results",
  "base_url": "http://10.216.178.15:8000/v1/completions",
  "provider": "vLLM",
  "model": "meta-llama/Meta-Llama-3-8B",
  "max_requests": 5,
  "user_counts": [3],
  "input_tokens": [32],
  "output_tokens": [256]
}

Adjust these parameters according to your LLM endpoint you're benchmarking.

3. Run the Benchmark

To start the benchmark using the configuration from config.json:

echoswift start --config path/to/your/config.json

4. Plot the Results

echoswift plot --results-dir path/to/your/results_dir

Output

EchoSwift will create a results directory (or the directory specified in out_dir) containing:

  • CSV files with raw benchmark data
  • Averaged results for each combination of users, input tokens, and output tokens
  • Log files for each Locust run

Analyzing Results

After the benchmark completes, you can find CSV files in the output directory. These files contain information about latency, throughput, and TTFT for each test configuration.

Citation

If you find our resource useful, please cite our paper:

EchoSwift: An Inference Benchmarking and Configuration Discovery Tool for Large Language Models (LLMs)

@inproceedings{Krishna2024,
  series = {ICPE '24},
  title = {EchoSwift: An Inference Benchmarking and Configuration Discovery Tool for Large Language Models (LLMs)},
  url = {https://dl.acm.org/doi/10.1145/3629527.3652273},
  DOI = {10.1145/3629527.3652273},
  booktitle = {Companion of the 15th ACM/SPEC International Conference on Performance Engineering},
  publisher = {ACM},
  author = {Krishna, Karthik and Bandili, Ramana},
  year = {2024},
  month = May,
  collection = {ICPE '24}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

echoswift-1.1.2.tar.gz (17.2 kB view details)

Uploaded Source

Built Distribution

echoswift-1.1.2-py3-none-any.whl (18.3 kB view details)

Uploaded Python 3

File details

Details for the file echoswift-1.1.2.tar.gz.

File metadata

  • Download URL: echoswift-1.1.2.tar.gz
  • Upload date:
  • Size: 17.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.15

File hashes

Hashes for echoswift-1.1.2.tar.gz
Algorithm Hash digest
SHA256 927416837769ac5c9e8566dbd0e087561440225d4c076ba69dfab796334f8f2c
MD5 ac745cc779694c2955a08892d18600b5
BLAKE2b-256 29ccf7ed01d8856131bd87f154f03e515a33b23c07729814a3207ba77a699de2

See more details on using hashes here.

File details

Details for the file echoswift-1.1.2-py3-none-any.whl.

File metadata

  • Download URL: echoswift-1.1.2-py3-none-any.whl
  • Upload date:
  • Size: 18.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.15

File hashes

Hashes for echoswift-1.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 087a80f5621765f8ede7ecb0928faed07a1c6d1bdee861761b0e427c1d9ae99d
MD5 8cc26350cdf2075deb6396c14580e36e
BLAKE2b-256 09910cfc61ef776b1fcc95890720da6e8298e5dd6c435341e5a92446613f48a4

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page