AIPerf is a package for performance testing of AI models

Project description

AIPerf

| Design Proposals | Migrating from Genai-Perf | CLI Options

AIPerf is a comprehensive benchmarking tool for measuring the performance of generative AI models served by your preferred inference solution. It provides detailed metrics via a command line display as well as extensive benchmark performance reports.

AIPerf provides multiprocess and kubernetes support (coming soon) out of the box for a single scalable solution.

Features

Scalable via multiprocess or Kubernetes (coming soon) support
Modular design for easy user modification
Several benchmarking modes:
- concurrency
- request-rate
- request-rate with a maximum concurrency
- trace replay
Public dataset support

Supported APIs

OpenAI chat completions
OpenAI completions
OpenAI embeddings
OpenAI audio: request throughput and latency
OpenAI images: request throughput and latency
NIM rankings

Installation

pip install git+https://github.com/ai-dynamo/aiperf.git

Quick Start

Basic Usage

Run a simple benchmark against a model:

aiperf profile \
  --model your_model_name \
  --url http://localhost:8000 \
  --endpoint-type chat
  --streaming

Example with Custom Configuration

aiperf profile \
  --model Qwen/Qwen3-0.6B \
  --url http://localhost:8000 \
  --endpoint-type chat \
  --concurrency 10 \
  --request-count 100 \
  --streaming

Example output:

NVIDIA AIPerf | LLM Metrics
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃                               Metric ┃       avg ┃    min ┃    max ┃    p99 ┃    p90 ┃    p75 ┃   std ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│             Time to First Token (ms) │     18.26 │  11.22 │ 106.32 │  68.82 │  27.76 │  16.62 │ 12.07 │
│            Time to Second Token (ms) │     11.40 │   0.02 │  85.91 │  34.54 │  12.59 │  11.65 │  7.01 │
│                 Request Latency (ms) │    487.30 │ 267.07 │ 769.57 │ 715.99 │ 580.83 │ 536.17 │ 79.60 │
│             Inter Token Latency (ms) │     11.23 │   8.80 │  13.17 │  12.48 │  11.73 │  11.37 │  0.45 │
│     Output Token Throughput Per User │     89.23 │  75.93 │ 113.60 │ 102.28 │  90.91 │  90.29 │  3.70 │
│                    (tokens/sec/user) │           │        │        │        │        │        │       │
│      Output Sequence Length (tokens) │     42.83 │  24.00 │  65.00 │  64.00 │  52.00 │  47.00 │  7.21 │
│       Input Sequence Length (tokens) │     10.00 │  10.00 │  10.00 │  10.00 │  10.00 │  10.00 │  0.00 │
│ Output Token Throughput (tokens/sec) │ 10,944.03 │    N/A │    N/A │    N/A │    N/A │    N/A │   N/A │
│    Request Throughput (requests/sec) │    255.54 │    N/A │    N/A │    N/A │    N/A │    N/A │   N/A │
│             Request Count (requests) │    711.00 │    N/A │    N/A │    N/A │    N/A │    N/A │   N/A │
└──────────────────────────────────────┴───────────┴────────┴────────┴────────┴────────┴────────┴───────┘

Known Issues

When setting the OSL via the --output-tokens-mean option, if --extra-inputs ignore_eos:true is not set currently, then AIPerf cannot guarantee a given OSL constraint. This is being worked on to remove this requirement in the future.

Project details

Release history Release notifications | RSS feed

0.8.0

May 16, 2026

0.7.0

Apr 7, 2026

0.6.0.post1

Mar 12, 2026

0.6.0

Mar 10, 2026

0.5.0

Feb 11, 2026

0.4.0

Jan 16, 2026

0.3.0

Nov 20, 2025

0.2.0

Oct 24, 2025

This version

0.1.1

Sep 18, 2025

0.1.0 yanked

Nov 14, 2024

Reason this release was yanked:

Yanking deliberately non-functional placeholder version

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

aiperf-0.1.1-py3-none-any.whl (2.8 MB view details)

Uploaded Sep 18, 2025 Python 3

File details

Details for the file aiperf-0.1.1-py3-none-any.whl.

File metadata

Download URL: aiperf-0.1.1-py3-none-any.whl
Upload date: Sep 18, 2025
Size: 2.8 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.0.0 CPython/3.10.18

File hashes

Hashes for aiperf-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d1a54493ccef9f8ba5b00fbb5c66718c120d95129b5982e0aa5cb63e2604d2aa`
MD5	`6d719133f3f08189ce769daadd03f615`
BLAKE2b-256	`8aceea19a691ba92e51eed0b90fd7bf53c2ef97d8b8be458f46bc45f79c98541`

See more details on using hashes here.

aiperf 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Meta