Skip to main content

Optimum-Benchmark is a unified multi-backend utility for benchmarking Transformers, Timm, Diffusers and Sentence-Transformers with full support of Optimum's hardware optimizations & quantization schemes.

Project description

Optimum-Benchmark Logo

All benchmarks are wrong, some will cost you less than others.

Optimum-Benchmark 🏋️

PyPI - Python Version PyPI - Version PyPI - Downloads PyPI - Implementation PyPI - Format PyPI - License

Optimum-Benchmark is a unified multi-backend & multi-device utility for benchmarking Transformers, Diffusers, PEFT, TIMM and Optimum libraries, along with all their supported optimizations & quantization schemes, for inference & training, in distributed & non-distributed settings, in the most correct, efficient and scalable way possible.

News 📰

  • LlamaCpp backend for benchmarking llama-cpp-python bindings with all its supported devices 🚀
  • 🥳 PyPI package is now available for installation: pip install optimum-benchmark 🎉 check it out !
  • Model loading latency/memory/energy tracking for all backends in the inference scenario 🚀
  • numactl support for Process and Torchrun launchers to control the NUMA nodes on which the benchmark runs.
  • 4 minimal docker images (cpu, cuda, rocm, cuda-ort) in packages for testing, benchmarking and reproducibility 🐳
  • vLLM backend for benchmarking vLLM's inference engine 🚀
  • Hosting the codebase of the LLM-Perf Leaderboard 🥇
  • Py-TXI backend for benchmarking Py-TXI 🚀
  • Python API for running isolated and distributed benchmarks with Python scripts 🐍
  • Simpler CLI interface for running benchmarks (runs and sweeps) using the Hydra 🧪

Motivations 🎯

  • HuggingFace hardware partners wanting to know how their hardware performs compared to another hardware on the same models.
  • HuggingFace ecosystem users wanting to know how their chosen model performs in terms of latency, throughput, memory usage, energy consumption, etc compared to another model.
  • Benchmarking hardware & backend specific optimizations & quantization schemes that can be applied to models and improve their computational/memory/energy efficiency.

 

[!Note] Optimum-Benchmark is a work in progress and is not yet ready for production use, but we're working hard to make it so. Please keep an eye on the project and help us improve it and make it more useful for the community. We're looking forward to your feedback and contributions. 🚀  

CI Status 🚦

Optimum-Benchmark is continuously and intensively tested on a variety of devices, backends, scenarios and launchers to ensure its stability with over 300 tests running on every PR (you can request more tests if you want to).

API 📈

API_CPU API_CUDA API_MISC API_ROCM

CLI 📈

CLI_CPU_LLAMA_CPP CLI_CPU_NEURAL_COMPRESSOR CLI_CPU_ONNXRUNTIME CLI_CPU_OPENVINO CLI_CPU_PYTORCH CLI_CPU_PY_TXI CLI_CUDA_ONNXRUNTIME CLI_CUDA_PYTORCH_MULTI_GPU CLI_CUDA_PYTORCH_SINGLE_GPU CLI_CUDA_PY_TXI CLI_CUDA_TENSORRT_LLM_SINGLE_GPU CLI_CUDA_TORCH_ORT_MULTI_GPU CLI_CUDA_TORCH_ORT_SINGLE_GPU CLI_CUDA_VLLM_SINGLE_GPU CLI_MISC CLI_ROCM_PYTORCH_MULTI_GPU CLI_ROCM_PYTORCH_SINGLE_GPU

Quickstart 🚀

Installation 📥

You can install the latest released version of optimum-benchmark on PyPI:

pip install optimum-benchmark

or you can install the latest version from the main branch on GitHub:

pip install git+https://github.com/huggingface/optimum-benchmark.git

or if you want to tinker with the code, you can clone the repository and install it in editable mode:

git clone https://github.com/huggingface/optimum-benchmark.git
cd optimum-benchmark
pip install -e .
Advanced install options

Depending on the backends you want to use, you can install optimum-benchmark with the following extras:

  • PyTorch (default): pip install optimum-benchmark
  • OpenVINO: pip install optimum-benchmark[openvino]
  • Torch-ORT: pip install optimum-benchmark[torch-ort]
  • OnnxRuntime: pip install optimum-benchmark[onnxruntime]
  • TensorRT-LLM: pip install optimum-benchmark[tensorrt-llm]
  • OnnxRuntime-GPU: pip install optimum-benchmark[onnxruntime-gpu]
  • Neural Compressor: pip install optimum-benchmark[neural-compressor]
  • Py-TXI: pip install optimum-benchmark[py-txi]
  • vLLM: pip install optimum-benchmark[vllm]

We also support the following extra extra dependencies:

  • autoawq
  • auto-gptq
  • sentence-transformers
  • bitsandbytes
  • codecarbon
  • flash-attn
  • deepspeed
  • diffusers
  • timm
  • peft

Running benchmarks using the Python API 🧪

You can run benchmarks from the Python API, using the Benchmark class and its launch method. It takes a BenchmarkConfig object as input, runs the benchmark in an isolated process and returns a BenchmarkReport object containing the benchmark results.

Here's an example of how to run an isolated benchmark using the pytorch backend, torchrun launcher and inference scenario with latency and memory tracking enabled.

from optimum_benchmark import Benchmark, BenchmarkConfig, TorchrunConfig, InferenceConfig, PyTorchConfig
from optimum_benchmark.logging_utils import setup_logging

setup_logging(level="INFO", handlers=["console"])

if __name__ == "__main__":
    launcher_config = TorchrunConfig(nproc_per_node=2)
    scenario_config = InferenceConfig(latency=True, memory=True)
    backend_config = PyTorchConfig(model="gpt2", device="cuda", device_ids="0,1", no_weights=True)
    benchmark_config = BenchmarkConfig(
        name="pytorch_gpt2",
        scenario=scenario_config,
        launcher=launcher_config,
        backend=backend_config,
    )
    benchmark_report = Benchmark.launch(benchmark_config)

    # log the benchmark in terminal
    benchmark_report.log() # or print(benchmark_report)

    # convert artifacts to a dictionary or dataframe
    benchmark_config.to_dict() # or benchmark_config.to_dataframe()

    # save artifacts to disk as json or csv files
    benchmark_report.save_csv("benchmark_report.csv") # or benchmark_report.save_json("benchmark_report.json")

    # push artifacts to the hub
    benchmark_config.push_to_hub("IlyasMoutawwakil/pytorch_gpt2") # or benchmark_config.push_to_hub("IlyasMoutawwakil/pytorch_gpt2")

    # or merge them into a single artifact
    benchmark = Benchmark(config=benchmark_config, report=benchmark_report)
    benchmark.save_json("benchmark.json") # or benchmark.save_csv("benchmark.csv")
    benchmark.push_to_hub("IlyasMoutawwakil/pytorch_gpt2")

    # load artifacts from the hub
    benchmark = Benchmark.from_hub("IlyasMoutawwakil/pytorch_gpt2") # or Benchmark.from_hub("IlyasMoutawwakil/pytorch_gpt2")

    # or load them from disk
    benchmark = Benchmark.load_json("benchmark.json") # or Benchmark.load_csv("benchmark_report.csv")

If you're on VSCode, you can hover over the configuration classes to see the available parameters and their descriptions. You can also see the available parameters in the Features section below.

Running benchmarks using the Hydra CLI 🧪

You can also run a benchmark using the command line by specifying the configuration directory and the configuration name. Both arguments are mandatory for hydra. --config-dir is the directory where the configuration files are stored and --config-name is the name of the configuration file without its .yaml extension.

optimum-benchmark --config-dir examples/ --config-name pytorch_bert

This will run the benchmark using the configuration in examples/pytorch_bert.yaml and store the results in runs/pytorch_bert.

The resulting files are :

  • benchmark_config.json which contains the configuration used for the benchmark, including the backend, launcher, scenario and the environment in which the benchmark was run.
  • benchmark_report.json which contains a full report of the benchmark's results, like latency measurements, memory usage, energy consumption, etc.
  • benchmark.json contains both the report and the configuration in a single file.
  • benchmark.log contains the logs of the benchmark run.
Advanced CLI options

Configuration overrides 🎛️

It's easy to override the default behavior of a benchmark from the command line of an already existing configuration file. For example, to run the same benchmark on a different device, you can use the following command:

optimum-benchmark --config-dir examples/ --config-name pytorch_bert backend.model=gpt2 backend.device=cuda

Configuration sweeps 🧹

You can easily run configuration sweeps using the --multirun option. By default, configurations will be executed serially but other kinds of executions are supported with hydra's launcher plugins (e.g. hydra/launcher=joblib).

optimum-benchmark --config-dir examples --config-name pytorch_bert -m backend.device=cpu,cuda

Configurations structure 📁

You can create custom and more complex configuration files following these examples. They are heavily commented to help you understand the structure of the configuration files.

Features 🎨

optimum-benchmark allows you to run benchmarks with minimal configuration. A benchmark is defined by three main components:

  • The launcher to use (e.g. process)
  • The scenario to follow (e.g. training)
  • The backend to run on (e.g. onnxruntime)

Launchers 🚀

  • Process launcher (launcher=process); Launches the benchmark in an isolated process.
  • Torchrun launcher (launcher=torchrun); Launches the benchmark in multiples processes using torch.distributed.
  • Inline launcher (launcher=inline), not recommended for benchmarking, only for debugging purposes.
General Launcher features 🧰
  • Assert GPU devices (NVIDIA & AMD) isolation (launcher.device_isolation=true). This feature makes sure no other processes are running on the targeted GPU devices other than the benchmark. Espepecially useful when running benchmarks on shared resources.

Scenarios 🏋

  • Training scenario (scenario=training) which benchmarks the model using the trainer class with a randomly generated dataset.
  • Inference scenario (scenario=inference) which benchmakrs the model's inference method (forward/call/generate) with randomly generated inputs.
Inference scenario features 🧰
  • Memory tracking (scenario.memory=true)
  • Energy and efficiency tracking (scenario.energy=true)
  • Latency and throughput tracking (scenario.latency=true)
  • Warm up runs before inference (scenario.warmup_runs=20)
  • Inputs shapes control (e.g. scenario.input_shapes.sequence_length=128)
  • Forward, Call and Generate kwargs (e.g. for an LLM scenario.generate_kwargs.max_new_tokens=100, for a diffusion model scenario.call_kwargs.num_images_per_prompt=4)

See InferenceConfig for more information.

Training scenario features 🧰
  • Memory tracking (scenario.memory=true)
  • Energy and efficiency tracking (scenario.energy=true)
  • Latency and throughput tracking (scenario.latency=true)
  • Warm up steps before training (scenario.warmup_steps=20)
  • Dataset shapes control (e.g. scenario.dataset_shapes.sequence_length=128)
  • Training arguments control (e.g. scenario.training_args.per_device_train_batch_size=4)

See TrainingConfig for more information.

Backends & Devices 📱

  • Pytorch backend for CPU (backend=pytorch, backend.device=cpu)
  • Pytorch backend for CUDA (backend=pytorch, backend.device=cuda, backend.device_ids=0,1)
  • Pytorch backend for Habana Gaudi Processor (backend=pytorch, backend.device=hpu, backend.device_ids=0,1)
  • OnnxRuntime backend for CPUExecutionProvider (backend=onnxruntime, backend.device=cpu)
  • OnnxRuntime backend for CUDAExecutionProvider (backend=onnxruntime, backend.device=cuda)
  • OnnxRuntime backend for ROCMExecutionProvider (backend=onnxruntime, backend.device=cuda, backend.provider=ROCMExecutionProvider)
  • OnnxRuntime backend for TensorrtExecutionProvider (backend=onnxruntime, backend.device=cuda, backend.provider=TensorrtExecutionProvider)
  • Py-TXI backend for CPU and GPU (backend=py-txi, backend.device=cpu or backend.device=cuda)
  • Neural Compressor backend for CPU (backend=neural-compressor, backend.device=cpu)
  • TensorRT-LLM backend for CUDA (backend=tensorrt-llm, backend.device=cuda)
  • Torch-ORT backend for CUDA (backend=torch-ort, backend.device=cuda)
  • OpenVINO backend for CPU (backend=openvino, backend.device=cpu)
  • OpenVINO backend for GPU (backend=openvino, backend.device=gpu)
  • vLLM backend for CUDA (backend=vllm, backend.device=cuda)
  • vLLM backend for ROCM (backend=vllm, backend.device=rocm)
  • vLLM backend for CPU (backend=vllm, backend.device=cpu)
General backend features 🧰
  • Device selection (backend.device=cuda), can be cpu, cuda, mps, etc.
  • Device ids selection (backend.device_ids=0,1), can be a list of device ids to run the benchmark on multiple devices.
  • Model selection (backend.model=gpt2), can be a model id from the HuggingFace model hub or an absolute path to a model folder.
  • "No weights" feature, to benchmark models without downloading their weights, using randomly initialized weights (backend.no_weights=true)
Backend specific features 🧰

For more information on the features of each backend, you can check their respective configuration files:

Contributing 🤝

Contributions are welcome! And we're happy to help you get started. Feel free to open an issue or a pull request. Things that we'd like to see:

  • More backends (Tensorflow, TFLite, Jax, etc).
  • More tests (for optimizations and quantization schemes).
  • More hardware support (Habana Gaudi Processor (HPU), Apple M series, etc).
  • Task evaluators for the most common tasks (would be great for output regression).

To get started, you can check the CONTRIBUTING.md file.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

optimum-benchmark-0.4.0.tar.gz (81.3 kB view details)

Uploaded Source

Built Distribution

optimum_benchmark-0.4.0-py3-none-any.whl (110.0 kB view details)

Uploaded Python 3

File details

Details for the file optimum-benchmark-0.4.0.tar.gz.

File metadata

  • Download URL: optimum-benchmark-0.4.0.tar.gz
  • Upload date:
  • Size: 81.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.14

File hashes

Hashes for optimum-benchmark-0.4.0.tar.gz
Algorithm Hash digest
SHA256 a2f0f767eae1f31ea7aef36efc079cce2b9ee070d2c6334fe21d254fb81ffe5c
MD5 01ff0a4e091606df0a39b41895f7d825
BLAKE2b-256 3c12122a71e15e050aae87539a10c8b257fa52181824c6619aafe964a07a98cb

See more details on using hashes here.

File details

Details for the file optimum_benchmark-0.4.0-py3-none-any.whl.

File metadata

File hashes

Hashes for optimum_benchmark-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 55daaad7b3851f637a8903162b3645da648567d1c25c9cac5668899bdc3eafe1
MD5 a7be42c1a4088cfcb2fc3df7e3a3f0b1
BLAKE2b-256 6763b48643c00b81d9a81cb4086b194f1b1e6b0c36a61f71d86103c3749ddb06

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page