Skip to main content

AIMET torch Package

Project description

AI Model Efficiency Toolkit (AIMET) for torch

[GitHub] [Docs] [Forums]

AIMET is a library that provides advanced model quantization and compression techniques for trained neural network models. It provides features that have been proven to improve run-time performance of deep learning neural network models with lower compute and memory requirements and minimal impact to task accuracy. AIMET is designed to work with PyTorch, TensorFlow and ONNX models.

Table of Contents

Requirements

This package supports the following environment:

  • 64-bit Intel x86-compatible processor
  • Linux Ubuntu: (22.04 LTS with Python 3.10) or (20.04 LTS with Python 3.8)
  • torch 2.2.2

Why AIMET?

  • Supports advanced quantization techniques: Inference using integer runtimes is significantly faster than using floating-point runtimes. For example, models run 5x-15x faster on the Qualcomm Hexagon DSP than on the Qualcomm Kyro CPU. In addition, 8-bit precision models have a 4x smaller footprint than 32-bit precision models. However, maintaining model accuracy when quantizing ML models is often challenging. AIMET solves this using novel techniques like Data-Free Quantization that provide state-of-the-art INT8 results on several popular models.
  • Supports advanced model compression techniques that enable models to run faster at inference-time and require less memory
  • AIMET is designed to automate optimization of neural networks avoiding time-consuming and tedious manual tweaking. AIMET also provides user-friendly APIs that allow users to make calls directly from their TensorFlow or PyTorch or ONNX pipelines.

Please visit the AIMET on Github Pages for more details.

Supported Features

Quantization

  • Cross-Layer Equalization: Equalize weight tensors to reduce amplitude variation across channels
  • Bias Correction: Corrects shift in layer outputs introduced due to quantization
  • Adaptive Rounding: Learn the optimal rounding given unlabelled data
  • Quantization Simulation: Simulate on-target quantized inference accuracy
  • Quantization-aware Training: Use quantization simulation to train the model further to improve accuracy

Model Compression

  • Spatial SVD: Tensor decomposition technique to split a large layer into two smaller ones
  • Channel Pruning: Removes redundant input channels from a layer and reconstructs layer weights
  • Per-layer compression-ratio selection: Automatically selects how much to compress each layer in the model

Visualization

  • Weight ranges: Inspect visually if a model is a candidate for applying the Cross Layer Equalization technique. And the effect after applying the technique
  • Per-layer compression sensitivity: Visually get feedback about the sensitivity of any given layer in the model to compression

What's New

Some recently added features include

  • Adaptive Rounding (AdaRound): Learn the optimal rounding given unlabelled data
  • Quantization-aware Training (QAT) for recurrent models (including with RNNs, LSTMs and GRUs)

Results

AIMET can quantize an existing 32-bit floating-point model to an 8-bit fixed-point model without sacrificing much accuracy and without model fine-tuning.

DFQ

The DFQ method applied to several popular networks, such as MobileNet-v2 and ResNet-50, result in less than 0.9% loss in accuracy all the way down to 8-bit quantization, in an automated way without any training data.

Models FP32 INT8 Simulation
MobileNet v2 (top1) 71.72 % 71.08 %
ResNet 50 (top1) 76.05 % 75.45 %
DeepLab v3 (mIOU) 72.65 % 71.91 %

AdaRound (Adaptive Rounding)

ADAS Object Detect

For this example ADAS object detection model, which was challenging to quantize to 8-bit precision, AdaRound can recover the accuracy to within 1% of the FP32 accuracy.

Configuration mAP
(Mean Average Precision)
FP32 82.20 %
Nearest Rounding (INT8 weights, INT8 acts) 49.85 %
AdaRound (INT8 weights, INT8 acts) 81.21 %

DeepLabv3 Semantic Segmentation

For some models like the DeepLabv3 semantic segmentation model, AdaRound can even quantize the model weights to 4-bit precision without a significant drop in accuracy.

Configuration mIOU
(Mean intersection over union)
FP32 72.94 %
Nearest Rounding (INT8 weights, INT8 acts) 6.09 %
AdaRound (INT8 weights, INT8 acts) 70.86 %

Quantization for Recurrent Models

AIMET supports quantization simulation and quantization-aware training (QAT) for recurrent models (RNN, LSTM, GRU). Using QAT feature in AIMET, a DeepSpeech2 model with bi-directional LSTMs can be quantized to 8-bit precision with minimal drop in accuracy.

DeepSpeech2
(using bi-directional LSTMs)
Word Error Rate
FP32 9.92 %
INT8 10.22 %

Model Compression

AIMET can also significantly compress models. For popular models, such as Resnet-50 and Resnet-18, compression with spatial SVD plus channel pruning achieves 50% MAC (multiply-accumulate) reduction while retaining accuracy within approx. 1% of the original uncompressed model.

Models Uncompressed 50% Compressed
ResNet18 (top1) 69.76 % 68.56 %
ResNet 50 (top1) 76.05 % 75.75 %

Resources

Contributions

Thanks for your interest in contributing to AIMET! Please read our Contributions Page for more information on contributing features or bug fixes. We look forward to your participation!

Team

AIMET aims to be a community-driven project maintained by Qualcomm Innovation Center, Inc.

License

AIMET is licensed under the BSD 3-clause "New" or "Revised" License. Check out the LICENSE for more details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

aimet_torch-1.34.0-cp310-cp310-manylinux_2_34_x86_64.whl (11.2 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.34+ x86-64

aimet_torch-1.34.0-cp38-cp38-manylinux_2_29_x86_64.whl (12.3 MB view details)

Uploaded CPython 3.8 manylinux: glibc 2.29+ x86-64

File details

Details for the file aimet_torch-1.34.0-cp310-cp310-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for aimet_torch-1.34.0-cp310-cp310-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 43a17bab8675dfcd81b3ed734cf1a7b3de4b4fb684604f7f4762aba55b673b2e
MD5 b1b5e140d194e43df7d7ba3d4f6f6dc0
BLAKE2b-256 b9e409d53aedaf24e6c64251d491300e264b763424969de73cc8dae2103f4ec9

See more details on using hashes here.

File details

Details for the file aimet_torch-1.34.0-cp38-cp38-manylinux_2_29_x86_64.whl.

File metadata

File hashes

Hashes for aimet_torch-1.34.0-cp38-cp38-manylinux_2_29_x86_64.whl
Algorithm Hash digest
SHA256 09d4a89ae593aaa5455c4abb178c4049aa8b5b7e0fd3b01e5228e65664afcf80
MD5 073defa1734005965cbf385331989df2
BLAKE2b-256 57972c04270248a9c4502d707072b12fdd5cd5c3d3735f2f81a7b81cfef43408

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page