A lightweight, high-performance benchmarking tool for NVIDIA NIM LLMs
Project description
A lightweight, high-performance benchmarking tool for NVIDIA NIM LLMs.
Measure latency, throughput, and reliability with style.
🚀 Overview
nimbench is a surgical CLI tool designed to benchmark NVIDIA NIM (NVIDIA Inference Microservices) chat models. Powered by httpx for connection-pooled requests and rich for beautiful terminal presentation, it handles model discovery, intelligent filtering, and robust benchmarking, providing you with a clean, formatted performance report.
✨ Key Features
- 🔍 Auto-Discovery: Automatically finds and ranks all available models from your NVIDIA NIM endpoint.
- 📊 Precise Metrics: Measures Median, Min, Max latency and Tokens Per Second (TPS).
- ⏱️ Progress & ETA: Live interactive progress bar with percentage and estimated time remaining.
- 🌈 Rich Terminal UI: Beautiful, color-coded status tables and highlights using
rich. - 🔌 Connection Pooling: Uses
httpxto reuse TCP connections, minimizing handshake overhead for accurate latency comparisons. - 🛡️ Intelligent Retries: Automatically handles rate limits (429) by respecting
Retry-Afterheaders and applies temperature fallbacks when needed. - 📝 Failure Analysis: Detailed breakdown of failure reasons (Not Provisioned, Timeout, Unsupported, etc.).
- 💾 Skip Cache: Remembers failed models to speed up subsequent runs.
🔬 What it measures
nimbench measures wall-clock request time for a minimal POST /v1/chat/completions call. It is designed to evaluate request/response latency rather than long-form output quality.
Default Request Shape:
- Prompt:
Reply with one short word. - Max Tokens:
8 - Temperature:
0.0(with automatic fallback to0.1if rejected).
The CLI reports tokens per second for each model. It uses server-provided metrics when available, or derives an approximate rate from completion_tokens / wall_time.
🛠️ How it behaves
- Discovery: Fetches all models from
GET /v1/modelsand filters for likely chat-capable IDs. - Sequential Execution: Benchmarking is performed sequentially to preserve the 40 RPM (Requests Per Minute) cap.
- Intelligent Skipping: A local skip cache is maintained for models that are not provisioned, reject chat input, or repeatedly timeout.
- Cap Logic: The
--limitflag means "stop after N successful benchmarks", preventing your rate limit from being wasted on unavailable models.
📦 Installation
Requires Python 3.10+.
git clone https://github.com/your-username/nimbench.git
cd nimbench
pip install -e .
🚀 Quick Start
Benchmark the top 10 most likely chat models:
python3 -m nimbench --limit 10
Advanced Usage
# Benchmark everything (including non-chat) with 3 repeats each
python3 -m nimbench --all-models --repeats 3
# Filter for specific models using regex
python3 -m nimbench --pattern "llama|nemotron|mistral"
# Export results to JSON
python3 -m nimbench --limit 5 --json > results.json
⚙️ Configuration & Options
API Key Precedence
--api-keycommand-line argument.NVIDIA_API_KEYenvironment variable.- Interactive prompt.
Options
| Option | Description |
|---|---|
--api-key KEY |
NVIDIA API key |
--base-url URL |
API base URL (Default: https://integrate.api.nvidia.com/v1) |
--limit N |
Stop after N successful benchmarks |
--pattern REGEX |
Only consider model ids matching REGEX |
--timeout SECONDS |
Request timeout for each HTTP call |
--repeats N |
Requests per model |
--json |
Emit JSON instead of a text table |
--rpm N |
Request rate cap (Default: 40) |
--all-models |
Benchmark full catalog instead of chat-only default |
--refresh-cache |
Ignore the local skip cache for this run |
Environment Variables
NVIDIA_API_KEY: Your NVIDIA API key.NIMBENCH_CACHE_DIR: Set this to override the default local skip cache directory.
🧪 Testing
Run the comprehensive test suite:
python3 -m unittest discover tests
Built with 💚 for the LLM community.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file nimbench-0.1.0.tar.gz.
File metadata
- Download URL: nimbench-0.1.0.tar.gz
- Upload date:
- Size: 181.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
76ced5d45a3bf585cef90a98bab9a40110fe45df3a19c305ba5e4ff89285a127
|
|
| MD5 |
a4df68a00116fefd041a20077624ed50
|
|
| BLAKE2b-256 |
26e71aa17f7f64dc790b6279afcc261f4e7bc885e0109e4591843805db690d0a
|
Provenance
The following attestation bundles were made for nimbench-0.1.0.tar.gz:
Publisher:
publish.yml on youxufkhan/nimbench
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
nimbench-0.1.0.tar.gz -
Subject digest:
76ced5d45a3bf585cef90a98bab9a40110fe45df3a19c305ba5e4ff89285a127 - Sigstore transparency entry: 1972425266
- Sigstore integration time:
-
Permalink:
youxufkhan/nimbench@b162777af19de61c07784663d1f86a375338ddf8 -
Branch / Tag:
- Owner: https://github.com/youxufkhan
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@b162777af19de61c07784663d1f86a375338ddf8 -
Trigger Event:
release
-
Statement type:
File details
Details for the file nimbench-0.1.0-py3-none-any.whl.
File metadata
- Download URL: nimbench-0.1.0-py3-none-any.whl
- Upload date:
- Size: 176.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e58d4e40e701c8cab54f8f5033bb35f4549f7bd9797382485df2406610fb215a
|
|
| MD5 |
c0899792d8ddba6b61db01793f302c44
|
|
| BLAKE2b-256 |
2612ead95d0d16523e5ea868b7ba90a8d6028d5193eeb6482f8a9e768893dd0f
|
Provenance
The following attestation bundles were made for nimbench-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on youxufkhan/nimbench
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
nimbench-0.1.0-py3-none-any.whl -
Subject digest:
e58d4e40e701c8cab54f8f5033bb35f4549f7bd9797382485df2406610fb215a - Sigstore transparency entry: 1972425386
- Sigstore integration time:
-
Permalink:
youxufkhan/nimbench@b162777af19de61c07784663d1f86a375338ddf8 -
Branch / Tag:
- Owner: https://github.com/youxufkhan
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@b162777af19de61c07784663d1f86a375338ddf8 -
Trigger Event:
release
-
Statement type: