A package for benchmarking the speed of different PyTorch conversion options

These details have not been verified by PyPI

Project links

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

alma

A Python library for benchmarking PyTorch model speed for different conversion options 🚀

The motivation of alma is to make it easy for people to benchmark their models for different conversion options, e.g. eager, tracing, scripting, torch.compile, torch.export, ONNX, Tensort, etc. The library is designed to be simple to use, with benchmarking provided via a single API call, and to be easily extensible for adding new conversion options.

Beyond just benchmarking, alma is designed to be a one-stop-shop for all model conversion options, so that one can learn about the different conversion options, how to implement them, and how they affect model speed and performance.

Getting Started
- Installation
- Docker
Basic Usage
Examples
Conversion Options
Future Work
How to Contribute

Getting Started

Installation

alma is available as a Python package.

One can install the package from python package index by running

pip install alma-torch

Alternatively, it can be installed from the root of this repository (save level as this README) by running:

pip install -e .

Docker

We recommend that you build the provided Dockerfile to ensure an easy installation of all of the system dependencies and the alma pip packages.

Build the Docker Image
```
bash scripts/build_docker.sh
```
Run the Docker Container
Create and start a container named alma:
```
bash scripts/run_docker.sh
```
Access the Running Container
Enter the container's shell:
```
docker exec -it alma bash
```
Mount Your Repository
By default, the run_docker.sh script mounts your /home directory to /home inside the container.
If your alma repository is in a different location, update the bind mount, for example:
```
-v /Users/myuser/alma:/home/alma
```

Basic usage

The core API is benchmark_model, which is used to benchmark the speed of a model for different conversion options. The usage is as follows:

from alma import benchmark_model
from alma.benchmark.log import display_all_results

device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")

# Load the model
model = ...
model = model.to(device)

# Load the dataloader used in benchmarking
data_loader = ...

# Set the configuration
config = {
    "batch_size": 128,
    "n_samples": 4096,
}

# Choose with conversions to benchmark:
conversions = ["EAGER", "EXPORT+EAGER"]

# Benchmark the model
results = benchmark_model(model, config, conversions, data_loader=data_loader)

# Print all results
display_all_results(results)

The results will look like this, depending on one's model, dataloader, and hardware.

EAGER results:
device: cuda:0
Total elapsed time: 0.4148 seconds
Total inference time (model only): 0.0436 seconds
Total samples: 5000
Throughput: 12054.50 samples/second

EXPORT+EAGER results:
device: cuda:0
Total elapsed time: 0.3906 seconds
Total inference time (model only): 0.0394 seconds
Total samples: 5000
Throughput: 12800.82 samples/second

Examples:

For extensive examples on how to use alma, as well as simple clean examples on how train a model and quantize it, see the MNIST example directory. These more advanced use cases include:

Feeding in a single tensor rather than a dataloader, and having the data tensor implicitly initialise an internal data loader inside of benchmark_model.
Using argparser for easy control and experimentation, including selecting conversion methods with numerical indices.
Dealing with error handling. If any conversion method fails, alma will fail gracefully for that method and one can access tht error message and traceback from the returned object.
Debugging and logging. A lot of the conversion methods have very verbose logging. We have opted to mostly silence those logs. However, if one wants access to those logs, one should use the setup_logging function and set the debugging level to DEBUG.

For a short working example on a simple Linear+ReLU, see the linear example.

Conversion Options

Naming conventions

The naming convention for conversion options is to use short but descriptive names, e.g. EAGER, EXPORT+EAGER, EXPORT+TENSORRT, etc. If multiple "techniques" are used in a single conversion option, then the names are separated by a + sign in chronological order of operation. Underscores _ are used within each technique name to seperate the words for readability, e.g. EXPORT+AOT_INDUCTOR, where EXPORT and AOT_INDUCTOR are considered seperate steps. All conversion options are located in the src/alma/conversions/ directory. Within this directory:

Code

All conversion options are located in the src/alma/conversions/ directory. In this directory:

The options/ subdirectory contains one Python file per conversion option (or a closely related family of options, e.g. torch.compile backends).
The main selection logic for these options is found in select.py. This is just a glorified match-case statement that returns the forward calls of each model conversion option, which is returned to the benchmarking loop. It is that simple!

At the risk of some code duplication, we have chosen to keep the conversion options separate, so that one can easily add new conversion options without having to modify the existing ones. It also makes it easier for the user to see what conversion options are available, and to understand what each conversion option does.

Options Summary

Below is a table summarizing the currently supported conversion options and their identifiers:

ID	Conversion Option
0	EAGER
1	EXPORT+EAGER
2	ONNX_CPU
3	ONNX_GPU
4	ONNX+DYNAMO_EXPORT
5	COMPILE_CUDAGRAPH
6	COMPILE_INDUCTOR_DEFAULT
7	COMPILE_INDUCTOR_REDUCE_OVERHEAD
8	COMPILE_INDUCTOR_MAX_AUTOTUNE
9	COMPILE_INDUCTOR_EAGER_FALLBACK
10	COMPILE_ONNXRT
11	COMPILE_OPENXLA
12	COMPILE_TVM
13	EXPORT+AI8WI8_FLOAT_QUANTIZED
14	EXPORT+AI8WI8_FLOAT_QUANTIZED+AOT_INDUCTOR
15	EXPORT+AI8WI8_FLOAT_QUANTIZED+RUN_DECOMPOSITION
16	EXPORT+AI8WI8_FLOAT_QUANTIZED+RUN_DECOMPOSITION+AOT_INDUCTOR
17	EXPORT+AI8WI8_STATIC_QUANTIZED
18	EXPORT+AI8WI8_STATIC_QUANTIZED+AOT_INDUCTOR
19	EXPORT+AI8WI8_STATIC_QUANTIZED+RUN_DECOMPOSITION
20	EXPORT+AI8WI8_STATIC_QUANTIZED+RUN_DECOMPOSITION+AOT_INDUCTOR
21	EXPORT+AOT_INDUCTOR
22	EXPORT+COMPILE_CUDAGRAPH
23	EXPORT+COMPILE_INDUCTOR_DEFAULT
24	EXPORT+COMPILE_INDUCTOR_REDUCE_OVERHE
25	EXPORT+COMPILE_INDUCTOR_MAX_AUTOTUNE
26	EXPORT+COMPILE_INDUCTOR_EAGER_FALLBACK
27	EXPORT+COMPILE_ONNXRT
28	EXPORT+COMPILE_OPENXLA
29	EXPORT+COMPILE_TVM
30	NATIVE_CONVERT_AI8WI8_STATIC_QUANTIZED
31	NATIVE_FAKE_QUANTIZED_AI8WI8_STATIC
32	TENSORRT

These conversion options are also all hard-coded in the alma/conversions/select.py file, which is the source of truth.

Future work:

Add more conversion options. This is a work in progress, and we are always looking for more conversion options.
Multi-device benchmarking. Currently alma only supports single-device benchmarking, but ideally a model could be split across multiple devices.
Integrating conversion options beyond PyTorch, e.g. HuggingFace, JAX, llama.cpp, etc.

How to contribute:

Contributions are welcome! If you have a new conversion option, feature, or other you would like to add, so that the whole community can benefit, please open a pull request! We are always looking for new conversion options, and we are happy to help you get started with adding a new conversion option/feature!

See the CONTRIBUTING.md file for more detailed information on how to contribute.

Citation

@Misc{alma,
  title =        {Alma: One-stop-shop for PyTorch model speed benchmarking for all conversion types.},
  author =       {Oscar Savolainen and Saif Haq},
  howpublished = {\url{https://github.com/saifhaq/alma}},
  year =         {2024}
}

Project details

These details have not been verified by PyPI

Project links

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

0.3.7

Jan 21, 2025

0.3.6

Jan 5, 2025

0.3.5

Jan 4, 2025

0.3.4

Jan 4, 2025

0.3.3

Jan 4, 2025

0.3.2

Jan 4, 2025

0.3.1

Jan 4, 2025

0.3.0

Dec 23, 2024

0.2.9

Dec 21, 2024

0.2.8

Dec 20, 2024

0.2.7

Dec 19, 2024

0.2.6

Dec 19, 2024

0.2.5

Dec 18, 2024

0.2.4

Dec 18, 2024

0.2.1

Dec 15, 2024

0.2.0

Dec 14, 2024

0.1.33

Dec 14, 2024

0.1.32

Dec 12, 2024

0.1.31

Dec 12, 2024

This version

0.1.30

Dec 12, 2024

0.1.29

Dec 12, 2024

0.1.28

Dec 12, 2024

0.1.27

Dec 12, 2024

0.1.26

Dec 10, 2024

0.1.25

Dec 10, 2024

0.1.24

Dec 9, 2024

0.1.23

Dec 9, 2024

0.1.22

Dec 9, 2024

0.1.21

Dec 9, 2024

0.1.20

Dec 9, 2024

0.1.19

Dec 9, 2024

0.1.18

Dec 9, 2024

0.1.17

Dec 7, 2024

0.0.0

Dec 18, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

alma_torch-0.1.30.tar.gz (33.2 kB view details)

Uploaded Dec 12, 2024 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

alma_torch-0.1.30-py3-none-any.whl (42.9 kB view details)

Uploaded Dec 12, 2024 Python 3

File details

Details for the file alma_torch-0.1.30.tar.gz.

File metadata

Download URL: alma_torch-0.1.30.tar.gz
Upload date: Dec 12, 2024
Size: 33.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.0.1 CPython/3.10.12

File hashes

Hashes for alma_torch-0.1.30.tar.gz
Algorithm	Hash digest
SHA256	`76ccf1f04ad9eea9eb17a3391fd70352a6328ebc1a4966450a38d96b789850e8`
MD5	`31a52d209ad19611f2ea21e444315b3d`
BLAKE2b-256	`e698c2a13fa21b42c22fad245aaa149dd75e24bf8e0f9f1de12bd08ce6ca9631`

See more details on using hashes here.

File details

Details for the file alma_torch-0.1.30-py3-none-any.whl.

File metadata

Download URL: alma_torch-0.1.30-py3-none-any.whl
Upload date: Dec 12, 2024
Size: 42.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.0.1 CPython/3.10.12

File hashes

Hashes for alma_torch-0.1.30-py3-none-any.whl
Algorithm	Hash digest
SHA256	`8eddb71b7736798e5e9a50b21f675acdf24e54d45700f8453cb2e5713a55eabd`
MD5	`5a0e0babd0a9a5bd8790125673c1a48c`
BLAKE2b-256	`234f1e0c6723980e1c88cbd9bfba1fa90aa8f0ec2839896b8d8d4c8622e03a2d`

See more details on using hashes here.

alma-torch 0.1.30

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

alma

Table of Contents

Getting Started

Installation

Docker

Basic usage

Examples:

Conversion Options

Naming conventions

Code

Options Summary

Future work:

How to contribute:

Citation

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes