LLM Inference Benchmarking Tool

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

ScaleBench_AI: LLM Inference Benchmarking Tool by Infobell IT

scalebench is a CLI-based tool designed to benchmark LLM (Large Language Model) inference endpoints. It helps evaluate performance using real-world prompts with configurable parameters and visualized results.

Features

Easy-to-use CLI interface
Benchmark LLM inference across multiple Inference Servers
Measures key performance metrics: latency, throughput, and TTFT (Time to First Token)
Support for varying input and output token lengths
Simulate concurrent users to test scalability
Determine the optimal number of concurrent users the server can handle while maintaining: TTFT < 2000 ms and Token latency < 200 ms
Detailed logging and progress tracking

Supported Inference Servers

TGI
vLLM
Ollama
Llamacpp
NIMS
SGLang

Performance metrics:

The performance metrics captured for varying input and output tokens and parallel users while running the benchmark includes

Latency (ms/token)
TTFT(ms)
Throughput(tokens/sec)

Installation

You can install scalebench using pip:

pip install scalebench

Alternatively, you can install from source:

git clone https://github.com/Infobellit-Solutions-Pvt-Ltd/ScaleBench_AI
cd scalebench
pip install -e .

Usage

scalebench provides a simple CLI interface for running LLM Inference benchmarks.

Below are the steps to run a sample test, assuming the generation endpoint is active.

1. Download the Dataset and create a default `config.json`

Before running a benchmark, you need to download and filter the dataset:

scalebench dataprep

This command will:

Download the filtered ShareGPT dataset from Huggingface
Create a default config.json file in your working directory

2. Configure the Benchmark

Edit the generated config.json file to match your LLM server configuration. Below is a sample:

{
    "_comment": "scalebench Configuration",
    "out_dir": "Results",
    "base_url": "http://localhost:8000/v1/completions",
    "tokenizer_path": "/path/to/tokenizer/",
    "inference_server": "vLLM",
    "model": "/model",
    "random_prompt": true,
    "max_requests": 1,
    "user_counts": [
        10
    ],
    "increment_user": [
        100
    ],
    "input_tokens": [
        32
    ],
    "output_tokens": [
        256
    ]
}

Note: Modify base_url, tokenizer_path, model, and other fields according to your LLM deployment.

Prompt Configuration Modes

scalebench supports two input modes depending on your test requirements:

1. Fixed Input Tokens

If you want to run the benchmark with a fixed number of input tokens:

Set "random_prompt": false
Define both input_tokens and output_tokens explicitly

2. Random Input Length

If you prefer using randomized prompts from the dataset:

Set "random_prompt": true
Provide only output_tokens — scalebench will choose random input lengths from the dataset

User Load Configuration (For `optimaluserrun`)

To perform optimal user benchmarking:

Use user_counts to set the initial number of concurrent users
Use increment_user to define how many users to add per step

Example:

"user_counts": [10],
"increment_user": [100]

In this case, the benchmark will start with 10 users and increase by 100 in each iteration until performance thresholds are hit.

Tokenizer Configuration

scalebench allows two ways to configure the tokenizer used for benchmarking:

Option 1: Use a Custom Tokenizer

Set the TOKENIZER environment variable to the path of your desired tokenizer.

Option 2: Use Default Fallback

If TOKENIZER is not set or is empty, scalebench falls back to a built-in default tokenizer:

This ensures the tool remains functional, but the fallback tokenizer may not align with your model's behavior. Use it only for testing or when no tokenizer is specified.

💡 Best Practice: Always specify the correct tokenizer that matches your LLM model for accurate benchmarking results.

Use these combinations as per your requirement to effectively benchmark your LLM endpoint.

3. Run the Benchmark

Option A: Standard Benchmarking

Use the start command to run a basic benchmark:

scalebench start --config path/to/config.json

Option B: Optimal User Load Benchmarking

To find the optimal number of concurrent users for your LLM endpoint:

scalebench optimaluserrun --config path/to/config.json

4. Plot the Results

Visualize the benchmark results using the built-in plotting tool:

scalebench plot --results-dir path/to/your/results_dir

Output

scalebench will create a results directory (or the directory specified in out_dir) containing:

CSV files with raw benchmark data
Averaged results for each combination of users, input tokens, and output tokens
Log files for each Locust run

Analyzing Results

After the benchmark completes, you can find CSV files in the output directory. These files contain information about latency, throughput, and TTFT for each test configuration.

Project details

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

0.1.4

Nov 6, 2025

0.1.3

Oct 29, 2025

This version

0.1.2

Oct 27, 2025

0.1.1

Oct 13, 2025

0.1.0

Oct 12, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scalebench-0.1.2.tar.gz (22.1 kB view details)

Uploaded Oct 27, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

scalebench-0.1.2-py3-none-any.whl (22.6 kB view details)

Uploaded Oct 27, 2025 Python 3

File details

Details for the file scalebench-0.1.2.tar.gz.

File metadata

Download URL: scalebench-0.1.2.tar.gz
Upload date: Oct 27, 2025
Size: 22.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.18

File hashes

Hashes for scalebench-0.1.2.tar.gz
Algorithm	Hash digest
SHA256	`b25dd34a05827d5dabb15c17e98be911a9d947b30d27187bc7e07d484a182b25`
MD5	`26323b42b85220c5de4b51cf27b48af2`
BLAKE2b-256	`d9a4bdcc145932291cbd061ecc84e48f0740587610f4d8b8ff5d62bc219f9b35`

See more details on using hashes here.

File details

Details for the file scalebench-0.1.2-py3-none-any.whl.

File metadata

Download URL: scalebench-0.1.2-py3-none-any.whl
Upload date: Oct 27, 2025
Size: 22.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.18

File hashes

Hashes for scalebench-0.1.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d858dbe6eae820bda698382eb3ce138fac45965bd376a5a0f0aa59e0069f7166`
MD5	`c6f765ae1e92bdb10140c6539da758d5`
BLAKE2b-256	`ca03996b78c7ca01d4d597d4c916e5f8356b010fb544eee61564617cbef1f3a6`

See more details on using hashes here.

scalebench 0.1.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

ScaleBench_AI: LLM Inference Benchmarking Tool by Infobell IT

Features

Supported Inference Servers

Performance metrics:

Installation

Usage

1. Download the Dataset and create a default config.json

2. Configure the Benchmark

Prompt Configuration Modes

1. Fixed Input Tokens

2. Random Input Length

User Load Configuration (For optimaluserrun)

Tokenizer Configuration

Option 1: Use a Custom Tokenizer

Option 2: Use Default Fallback

3. Run the Benchmark

4. Plot the Results

Output

Analyzing Results

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

1. Download the Dataset and create a default `config.json`

User Load Configuration (For `optimaluserrun`)