Efficient GPU Kernels in Triton for Quantized Vision Transformers
Project description
QAttn
Welcome to the QAttn documentation! QAttn (pronounced like katana) is a python only framework with GPU kernels implemented in Triton for quantized vision transformers. This framework implements integer and mixed-precision kernels for operations within vision transformers (currently matrix multiplication and attention) for static and dynamic quantization.
Installation
To install the package, run
pip install qattn
or install from source to get the latest bleeding-edge source version.
pip install git+https://github.com/ibm/qattn.git
This package depends on Triton, requiring NVIDIA GPU (preferably Ampere or newer), and is tested only on Linux.
To install and modify source code, you can clone the repository locally and install it in editable mode.
git clone https://github.com/ibm/qattn.git
cd qattn
pip install -e .
Usage
In the Examples section, we present static and dynamic quantization usage samples using QAttn. QAttn is designed to be compatible with PyTorch FX-Quantization to replace dynamically models' graph floating-point modules with quantized ones. This comes with the downside of being unable to capture the control statements in the graph.
Future direction
In the future, we will support the rest of the basic Vision Transformers operations (GELU, LayerNorm, Add, etc.) for fully quantized models. Next, we will move to the PyTorch 2.0 torchdynamo graph capture to enable integration with torch.compile
.
Citation
If you use the project in your research paper or thesis, we would appreciate to use following citation:
@InProceedings{Kluska_2024_CVPR,
author = {Kluska, Piotr and Castell\'o, Adri\'an and Scheidegger, Florian and Malossi, A. Cristiano I. and Quintana-Ort{\'\i}, Enrique S.},
title = {QAttn: Efficient GPU Kernels for Mixed-precision Vision Transformers},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops},
month = {June},
year = {2024},
pages = {3648-3657}
}
Acknowledgments
The work is conducted within the project APROPOS. This project has received funding from the European Union’s Horizon 2020 (H2020) Marie Sklodowska-Curie Innovative Training Networks H2020-MSCA-ITN-2020 call, under the Grant Agreement no 956090. Project link: https://apropos-project.eu/
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
File details
Details for the file qattn-0.1.1-py3-none-any.whl
.
File metadata
- Download URL: qattn-0.1.1-py3-none-any.whl
- Upload date:
- Size: 35.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.9.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0f3d48152182c8e51d4dc53ae6b0048df0b3bec023acd3cbe477c1ae4ec42332 |
|
MD5 | 7dee8c288931d82ffae9985310581322 |
|
BLAKE2b-256 | 77bc51e4176b1b6d67a221acc8a3245e481d8ba2d604f3fed6e3946e19851f8b |