LLM Inference Benchmarking Tool
Project description
EchoSwift: LLM Inference Benchmarking Tool by Infobell IT
EchoSwift is a powerful and flexible tool designed for benchmarking Large Language Model (LLM) inference. It allows users to measure and analyze the performance of LLM endpoints across various metrics, including token latency, throughput, and time to first token (TTFT).
Features
- Benchmark LLM inference across multiple Inference Servers
- Measure key performance metrics: latency, throughput, and TTFT
- Support for varying input and output token lengths
- Simulate concurrent users to test scalability
- Easy-to-use CLI interface
- Detailed logging and progress tracking
Performance metrics:
The performance metrics captured for varying input and output tokens and parallel users while running the benchmark includes
- Latency (ms/token)
- TTFT(ms)
- Throughput(tokens/sec)
Supported Inference Servers
- TGI
- vLLM
- Ollama
- Llamacpp
- NIMS
Installation
You can install EchoSwift using pip:
pip install echoswift
Alternatively, you can install from source:
git clone --branch akhil https://github.com/Infobellit-Solutions-Pvt-Ltd/EchoSwift.git
cd EchoSwift
pip install -e .
Usage
EchoSwift provides a simple CLI interface for running LLM Inference benchmarks.
Below are the steps to run a sample test, assuming the generation endpoint is active.
1. Download the Dataset and create a default config.json
Before running a benchmark, you need to download and filter the dataset:
echoswift dataprep
This command will download the filtered ShareGPT dataset from Huggingface and creates a sample config.json
2. Configure the Benchmark
Modify the config.json
file in the project root directory. Here's an example configuration:
{
"_comment": "EchoSwift Configuration",
"out_dir": "test_results",
"base_url": "http://10.216.178.15:8000/v1/completions",
"provider": "vLLM",
"model": "meta-llama/Meta-Llama-3-8B",
"max_requests": 5,
"user_counts": [3],
"input_tokens": [32],
"output_tokens": [256]
}
Adjust these parameters according to your LLM endpoint you're benchmarking.
3. Run the Benchmark
To start the benchmark using the configuration from config.json
:
echoswift start --config path/to/your/config.json
4. Plot the Results
echoswift plot --results-dir path/to/your/results_dir
Output
EchoSwift will create a results
directory (or the directory specified in out_dir
) containing:
- CSV files with raw benchmark data
- Averaged results for each combination of users, input tokens, and output tokens
- Log files for each Locust run
Analyzing Results
After the benchmark completes, you can find CSV files in the output directory. These files contain information about latency, throughput, and TTFT for each test configuration.
Citation
If you find our resource useful, please cite our paper:
EchoSwift: An Inference Benchmarking and Configuration Discovery Tool for Large Language Models (LLMs)
@inproceedings{Krishna2024,
series = {ICPE '24},
title = {EchoSwift: An Inference Benchmarking and Configuration Discovery Tool for Large Language Models (LLMs)},
url = {https://dl.acm.org/doi/10.1145/3629527.3652273},
DOI = {10.1145/3629527.3652273},
booktitle = {Companion of the 15th ACM/SPEC International Conference on Performance Engineering},
publisher = {ACM},
author = {Krishna, Karthik and Bandili, Ramana},
year = {2024},
month = May,
collection = {ICPE '24}
}
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file echoswift-1.1.1.tar.gz
.
File metadata
- Download URL: echoswift-1.1.1.tar.gz
- Upload date:
- Size: 17.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.15
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6712851ace5dcbe2892779c28678f46aa01be8235d6855dc55e4133cb713a0e9 |
|
MD5 | 77b9d5f9cc80afff05773982a50315df |
|
BLAKE2b-256 | 1e191668de91d46b9af5abfab9158c519d5e52b89751b2476c86974dc71918e3 |
File details
Details for the file echoswift-1.1.1-py3-none-any.whl
.
File metadata
- Download URL: echoswift-1.1.1-py3-none-any.whl
- Upload date:
- Size: 18.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.15
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 605e0d8378c6693af6753ecb1e4c57f5f31e7b4939f66bc6af15269ee0263838 |
|
MD5 | 09956884bf66c2a86889981ad952bfce |
|
BLAKE2b-256 | c3a27565b7c13a910b8b380ce7e5e60ebbdaed154525ba172ac81dae9b2e4cca |