AIPerf is a package for performance testing of AI models
Project description
AIPerf
| Design Proposals | Migrating from Genai-Perf | CLI Options
AIPerf is a comprehensive benchmarking tool for measuring the performance of generative AI models served by your preferred inference solution. It provides detailed metrics via a command line display as well as extensive benchmark performance reports.
AIPerf provides multiprocess and kubernetes support (coming soon) out of the box for a single scalable solution.
Features
- Scalable via multiprocess or Kubernetes (coming soon) support
- Modular design for easy user modification
- Several benchmarking modes:
- concurrency
- request-rate
- request-rate with a maximum concurrency
- trace replay
- Public dataset support
Supported APIs
- OpenAI chat completions
- OpenAI completions
- OpenAI embeddings
- OpenAI audio: request throughput and latency
- OpenAI images: request throughput and latency
- NIM rankings
Installation
pip install git+https://github.com/ai-dynamo/aiperf.git
Quick Start
Basic Usage
Run a simple benchmark against a model:
aiperf profile \
--model your_model_name \
--url http://localhost:8000 \
--endpoint-type chat
--streaming
Example with Custom Configuration
aiperf profile \
--model Qwen/Qwen3-0.6B \
--url http://localhost:8000 \
--endpoint-type chat \
--concurrency 10 \
--request-count 100 \
--streaming
Example output:
NVIDIA AIPerf | LLM Metrics
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ Metric ┃ avg ┃ min ┃ max ┃ p99 ┃ p90 ┃ p75 ┃ std ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ Time to First Token (ms) │ 18.26 │ 11.22 │ 106.32 │ 68.82 │ 27.76 │ 16.62 │ 12.07 │
│ Time to Second Token (ms) │ 11.40 │ 0.02 │ 85.91 │ 34.54 │ 12.59 │ 11.65 │ 7.01 │
│ Request Latency (ms) │ 487.30 │ 267.07 │ 769.57 │ 715.99 │ 580.83 │ 536.17 │ 79.60 │
│ Inter Token Latency (ms) │ 11.23 │ 8.80 │ 13.17 │ 12.48 │ 11.73 │ 11.37 │ 0.45 │
│ Output Token Throughput Per User │ 89.23 │ 75.93 │ 113.60 │ 102.28 │ 90.91 │ 90.29 │ 3.70 │
│ (tokens/sec/user) │ │ │ │ │ │ │ │
│ Output Sequence Length (tokens) │ 42.83 │ 24.00 │ 65.00 │ 64.00 │ 52.00 │ 47.00 │ 7.21 │
│ Input Sequence Length (tokens) │ 10.00 │ 10.00 │ 10.00 │ 10.00 │ 10.00 │ 10.00 │ 0.00 │
│ Output Token Throughput (tokens/sec) │ 10,944.03 │ N/A │ N/A │ N/A │ N/A │ N/A │ N/A │
│ Request Throughput (requests/sec) │ 255.54 │ N/A │ N/A │ N/A │ N/A │ N/A │ N/A │
│ Request Count (requests) │ 711.00 │ N/A │ N/A │ N/A │ N/A │ N/A │ N/A │
└──────────────────────────────────────┴───────────┴────────┴────────┴────────┴────────┴────────┴───────┘
Known Issues
- When setting the OSL via the
--output-tokens-meanoption, if--extra-inputs ignore_eos:trueis not set currently, then AIPerf cannot guarantee a given OSL constraint. This is being worked on to remove this requirement in the future.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file aiperf-0.1.1-py3-none-any.whl.
File metadata
- Download URL: aiperf-0.1.1-py3-none-any.whl
- Upload date:
- Size: 2.8 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.10.18
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d1a54493ccef9f8ba5b00fbb5c66718c120d95129b5982e0aa5cb63e2604d2aa
|
|
| MD5 |
6d719133f3f08189ce769daadd03f615
|
|
| BLAKE2b-256 |
8aceea19a691ba92e51eed0b90fd7bf53c2ef97d8b8be458f46bc45f79c98541
|