nimbench

A lightweight, high-performance benchmarking tool for NVIDIA NIM LLMs

These details have not been verified by PyPI

Project links

Development Status
- 4 - Beta
Environment
- Console
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
Topic
- System :: Benchmark
- Utilities

Project description

nimbench cli

A lightweight, high-performance benchmarking tool for NVIDIA NIM LLMs.
Measure latency, throughput, and reliability with style.

🚀 Overview

nimbench is a surgical CLI tool designed to benchmark NVIDIA NIM (NVIDIA Inference Microservices) chat models. Powered by httpx for connection-pooled requests and rich for beautiful terminal presentation, it handles model discovery, intelligent filtering, and robust benchmarking, providing you with a clean, formatted performance report.

✨ Key Features

🔍 Auto-Discovery: Automatically finds and ranks all available models from your NVIDIA NIM endpoint.
📊 Precise Metrics: Measures Median, Min, Max latency and Tokens Per Second (TPS).
⏱️ Progress & ETA: Live interactive progress bar with percentage and estimated time remaining.
🌈 Rich Terminal UI: Beautiful, color-coded status tables and highlights using rich.
🔌 Connection Pooling: Uses httpx to reuse TCP connections, minimizing handshake overhead for accurate latency comparisons.
🛡️ Intelligent Retries: Automatically handles rate limits (429) by respecting Retry-After headers and applies temperature fallbacks when needed.
📝 Failure Analysis: Detailed breakdown of failure reasons (Not Provisioned, Timeout, Unsupported, etc.).
💾 Skip Cache: Remembers failed models to speed up subsequent runs.

🔬 What it measures

nimbench measures wall-clock request time for a minimal POST /v1/chat/completions call. It is designed to evaluate request/response latency rather than long-form output quality.

Default Request Shape:

Prompt: Reply with one short word.
Max Tokens: 8
Temperature: 0.0 (with automatic fallback to 0.1 if rejected).

The CLI reports tokens per second for each model. It uses server-provided metrics when available, or derives an approximate rate from completion_tokens / wall_time.

🛠️ How it behaves

Discovery: Fetches all models from GET /v1/models and filters for likely chat-capable IDs.
Sequential Execution: Benchmarking is performed sequentially to preserve the 40 RPM (Requests Per Minute) cap.
Intelligent Skipping: A local skip cache is maintained for models that are not provisioned, reject chat input, or repeatedly timeout.
Cap Logic: The --limit flag means "stop after N successful benchmarks", preventing your rate limit from being wasted on unavailable models.

📦 Installation

Requires Python 3.10+.

git clone https://github.com/your-username/nimbench.git
cd nimbench
pip install -e .

🚀 Quick Start

Benchmark the top 10 most likely chat models:

python3 -m nimbench --limit 10

Advanced Usage

# Benchmark everything (including non-chat) with 3 repeats each
python3 -m nimbench --all-models --repeats 3

# Filter for specific models using regex
python3 -m nimbench --pattern "llama|nemotron|mistral"

# Export results to JSON
python3 -m nimbench --limit 5 --json > results.json

⚙️ Configuration & Options

API Key Precedence

--api-key command-line argument.
NVIDIA_API_KEY environment variable.
Interactive prompt.

Options

Option	Description
`--api-key KEY`	NVIDIA API key
`--base-url URL`	API base URL (Default: `https://integrate.api.nvidia.com/v1`)
`--limit N`	Stop after N successful benchmarks
`--pattern REGEX`	Only consider model ids matching REGEX
`--timeout SECONDS`	Request timeout for each HTTP call
`--repeats N`	Requests per model
`--json`	Emit JSON instead of a text table
`--rpm N`	Request rate cap (Default: 40)
`--all-models`	Benchmark full catalog instead of chat-only default
`--refresh-cache`	Ignore the local skip cache for this run

Environment Variables

NVIDIA_API_KEY: Your NVIDIA API key.
NIMBENCH_CACHE_DIR: Set this to override the default local skip cache directory.

🧪 Testing

Run the comprehensive test suite:

python3 -m unittest discover tests

Built with 💚 for the LLM community.

Project details

These details have not been verified by PyPI

Project links

Development Status
- 4 - Beta
Environment
- Console
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
Topic
- System :: Benchmark
- Utilities

Release history Release notifications | RSS feed

This version

0.1.0

Jun 26, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nimbench-0.1.0.tar.gz (181.3 kB view details)

Uploaded Jun 26, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

nimbench-0.1.0-py3-none-any.whl (176.6 kB view details)

Uploaded Jun 26, 2026 Python 3

File details

Details for the file nimbench-0.1.0.tar.gz.

File metadata

Download URL: nimbench-0.1.0.tar.gz
Upload date: Jun 26, 2026
Size: 181.3 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for nimbench-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`76ced5d45a3bf585cef90a98bab9a40110fe45df3a19c305ba5e4ff89285a127`
MD5	`a4df68a00116fefd041a20077624ed50`
BLAKE2b-256	`26e71aa17f7f64dc790b6279afcc261f4e7bc885e0109e4591843805db690d0a`

See more details on using hashes here.

Provenance

The following attestation bundles were made for nimbench-0.1.0.tar.gz:

Publisher: publish.yml on youxufkhan/nimbench

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: nimbench-0.1.0.tar.gz
- Subject digest: 76ced5d45a3bf585cef90a98bab9a40110fe45df3a19c305ba5e4ff89285a127
- Sigstore transparency entry: 1972425266
- Sigstore integration time: Jun 26, 2026
Source repository:
- Permalink: youxufkhan/nimbench@b162777af19de61c07784663d1f86a375338ddf8
- Branch / Tag:
- Owner: https://github.com/youxufkhan
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@b162777af19de61c07784663d1f86a375338ddf8
- Trigger Event: release

File details

Details for the file nimbench-0.1.0-py3-none-any.whl.

File metadata

Download URL: nimbench-0.1.0-py3-none-any.whl
Upload date: Jun 26, 2026
Size: 176.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for nimbench-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`e58d4e40e701c8cab54f8f5033bb35f4549f7bd9797382485df2406610fb215a`
MD5	`c0899792d8ddba6b61db01793f302c44`
BLAKE2b-256	`2612ead95d0d16523e5ea868b7ba90a8d6028d5193eeb6482f8a9e768893dd0f`

See more details on using hashes here.

Provenance

The following attestation bundles were made for nimbench-0.1.0-py3-none-any.whl:

Publisher: publish.yml on youxufkhan/nimbench

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: nimbench-0.1.0-py3-none-any.whl
- Subject digest: e58d4e40e701c8cab54f8f5033bb35f4549f7bd9797382485df2406610fb215a
- Sigstore transparency entry: 1972425386
- Sigstore integration time: Jun 26, 2026
Source repository:
- Permalink: youxufkhan/nimbench@b162777af19de61c07784663d1f86a375338ddf8
- Branch / Tag:
- Owner: https://github.com/youxufkhan
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@b162777af19de61c07784663d1f86a375338ddf8
- Trigger Event: release

nimbench 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

🚀 Overview

✨ Key Features

🔬 What it measures

🛠️ How it behaves

📦 Installation

🚀 Quick Start

Advanced Usage

⚙️ Configuration & Options

API Key Precedence

Options

Environment Variables

🧪 Testing

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance