Skip to main content

Efficient GPU Kernels in Triton for Quantized Vision Transformers

Project description

QAttn

Welcome to the QAttn documentation! QAttn (pronounced like katana) is a python only framework with GPU kernels implemented in Triton for quantized vision transformers. This framework implements integer and mixed-precision kernels for operations within vision transformers (currently matrix multiplication and attention) for static and dynamic quantization.

Installation

To install the package, run

pip install qattn

or install from source to get the latest bleeding-edge source version.

pip install git+https://github.com/ibm/qattn.git

This package depends on Triton, requiring NVIDIA GPU (preferably Ampere or newer), and is tested only on Linux.

To install and modify source code, you can clone the repository locally and install it in editable mode.

git clone https://github.com/ibm/qattn.git
cd qattn
pip install -e .

Usage

In the Examples section, we present static and dynamic quantization usage samples using QAttn. QAttn is designed to be compatible with PyTorch FX-Quantization to replace dynamically models' graph floating-point modules with quantized ones. This comes with the downside of being unable to capture the control statements in the graph.

Future direction

In the future, we will support the rest of the basic Vision Transformers operations (GELU, LayerNorm, Add, etc.) for fully quantized models. Next, we will move to the PyTorch 2.0 torchdynamo graph capture to enable integration with torch.compile.

Citation

If you use the project in your research paper or thesis, we would appreciate to use following citation:

@InProceedings{Kluska_2024_CVPR,
    author    = {Kluska, Piotr and Castell\'o, Adri\'an and Scheidegger, Florian and Malossi, A. Cristiano I. and Quintana-Ort{\'\i}, Enrique S.},
    title     = {QAttn: Efficient GPU Kernels for Mixed-precision Vision Transformers},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops},
    month     = {June},
    year      = {2024},
    pages     = {3648-3657}
}

Acknowledgments

The work is conducted within the project APROPOS. This project has received funding from the European Union’s Horizon 2020 (H2020) Marie Sklodowska-Curie Innovative Training Networks H2020-MSCA-ITN-2020 call, under the Grant Agreement no 956090. Project link: https://apropos-project.eu/

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

qattn-0.1.1-py3-none-any.whl (35.4 kB view details)

Uploaded Python 3

File details

Details for the file qattn-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: qattn-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 35.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.9.13

File hashes

Hashes for qattn-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 0f3d48152182c8e51d4dc53ae6b0048df0b3bec023acd3cbe477c1ae4ec42332
MD5 7dee8c288931d82ffae9985310581322
BLAKE2b-256 77bc51e4176b1b6d67a221acc8a3245e481d8ba2d604f3fed6e3946e19851f8b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page