Quantization Techniques
Project description
FMS Model Optimizer
Introduction
FMS Model Optimizer is a framework for developing reduced precision neural network models. Quantization techniques, such as quantization-aware-training (QAT), post-training quantization (PTQ), and several other optimization techniques on popular deep learning workloads are supported.
Highlights
- Python API to enable model quantization: With the addition of a few lines of codes, module-level and/or function-level operations replacement will be performed.
- Robust: Verified for INT 8/4-bit quantization on important vision/speech/NLP/object detection/LLMs.
- Flexible: Options to analyze the network using PyTorch Dynamo, apply best practices, such as clip_val initialization, layer-level precision setting, optimizer param group setting, etc. during quantization.
- State-of-the-art INT and FP quantization techniques for weights and activations, such as SmoothQuant, SAWB+ and PACT+.
- Supports key compute-intensive operations like Conv2d, Linear, LSTM, MM and BMM
Supported Models
| GPTQ | FP8 | PTQ | QAT | |
|---|---|---|---|---|
| Granite | :white_check_mark: | :white_check_mark: | :white_check_mark: | :black_square_button: |
| Llama | :white_check_mark: | :white_check_mark: | :white_check_mark: | :black_square_button: |
| Mixtral | :white_check_mark: | :white_check_mark: | :white_check_mark: | :black_square_button: |
| BERT/Roberta | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: |
Note: Direct QAT on LLMs is not recommended
Getting Started
Requirements
- 🐧 Linux system with Nvidia GPU (V100/A100/H100)
- Python 3.10 to Python 3.12
- CUDA >=12
Optional packages based on optimization functionality required:
- GPTQ is a popular compression method for LLMs:
- If you want to experiment with INT8 deployment in QAT and PTQ examples:
- FP8 is a reduced precision format like INT8:
- Nvidia A100 family or higher
- llm-compressor
- To enable compute graph plotting function (mostly for troubleshooting purpose):
[!NOTE] PyTorch version should be < 2.4 if you would like to experiment deployment with external INT8 kernel.
Installation
We recommend using a Python virtual environment with Python 3.9+. Here is how to setup a virtual environment using Python venv:
python3 -m venv fms_mo_venv
source fms_mo_venv/bin/activate
[!TIP] If you use pyenv, Conda Miniforge or other such tools for Python version management, create the virtual environment with that tool instead of venv. Otherwise, you may have issues with installed packages not being found as they are linked to your Python version management tool and not
venv.
There are 2 ways to install the FMS Model Optimizer as follows:
From Release
To install from release (PyPi package):
python3 -m venv fms_mo_venv
source fms_mo_venv/bin/activate
pip install fms-model-optimizer
From Source
To install from source(GitHub Repository):
python3 -m venv fms_mo_venv
source fms_mo_venv/bin/activate
git clone https://github.com/foundation-model-stack/fms-model-optimizer
cd fms-model-optimizer
pip install -e .
Optional Dependencies
The following optional dependencies are available:
fp8:llmcompressorpackage for fp8 quantizationgptq:GPTQModelpackage for W4A16 quantizationmx:microxcalingpackage for MX quantizationopt: Shortcut forfp8,gptq, andmxinstallsaiu:ibm-fmspackage for AIU model deploymenttorchvision:torchpackage for image recognition training and inferencetriton:tritonpackage for matrix multiplication kernelsexamples: Dependencies needed for examplesvisualize: Dependencies for visualizing models and performance datatest: Dependencies needed for unit testingdev: Dependencies needed for development
To install an optional dependency, modify the pip install commands above with a list of these names enclosed in brackets. The example below installs llm-compressor and torchvision with FMS Model Optimizer:
pip install fms-model-optimizer[fp8,torchvision]
pip install -e .[fp8,torchvision]
If you have already installed FMS Model Optimizer, then only the optional packages will be installed.
Try It Out!
To help you get up and running as quickly as possible with the FMS Model Optimizer framework, check out the following resources which demonstrate how to use the framework with different quantization techniques:
- Jupyter notebook tutorials (It is recommended to begin here):
- Quantization tutorial:
- Visualizes a random Gaussian tensor step-by-step along the quantization process
- Build a quantizer and quantized convolution module based on this process
- Quantization tutorial:
- Python script examples
Docs
Dive into the design document to get a better understanding of the framework motivation and concepts.
Contributing
Check out our contributing guide to learn how to contribute.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file fms_model_optimizer-0.7.0.tar.gz.
File metadata
- Download URL: fms_model_optimizer-0.7.0.tar.gz
- Upload date:
- Size: 5.2 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
826b8db920c98f260f6bbdb051c1b7c082cf57366eaac6458a47ad48ce4b3a6f
|
|
| MD5 |
7ffc75cced0de0c2543bba1f9b3554e5
|
|
| BLAKE2b-256 |
9477a3ceb52b16a4cd8949c1d15618fb4da03917de1af2ebb4d05be62736d669
|
Provenance
The following attestation bundles were made for fms_model_optimizer-0.7.0.tar.gz:
Publisher:
pypi.yml on foundation-model-stack/fms-model-optimizer
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
fms_model_optimizer-0.7.0.tar.gz -
Subject digest:
826b8db920c98f260f6bbdb051c1b7c082cf57366eaac6458a47ad48ce4b3a6f - Sigstore transparency entry: 649746219
- Sigstore integration time:
-
Permalink:
foundation-model-stack/fms-model-optimizer@65acdec0fb09d919651b0b33ed6bd892e167a4c7 -
Branch / Tag:
refs/tags/v0.7.0 - Owner: https://github.com/foundation-model-stack
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pypi.yml@65acdec0fb09d919651b0b33ed6bd892e167a4c7 -
Trigger Event:
release
-
Statement type:
File details
Details for the file fms_model_optimizer-0.7.0-py3-none-any.whl.
File metadata
- Download URL: fms_model_optimizer-0.7.0-py3-none-any.whl
- Upload date:
- Size: 361.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f9b02b05f207dc1e55cf04030a5db11614dfc37c79e67ea466d68f00526210f2
|
|
| MD5 |
58dd1fcf3006589feefdae72690fe7de
|
|
| BLAKE2b-256 |
fb1a76c78c9a3702e6a533e32a403b742e9ae90e07b6965117afe647e1ca5502
|
Provenance
The following attestation bundles were made for fms_model_optimizer-0.7.0-py3-none-any.whl:
Publisher:
pypi.yml on foundation-model-stack/fms-model-optimizer
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
fms_model_optimizer-0.7.0-py3-none-any.whl -
Subject digest:
f9b02b05f207dc1e55cf04030a5db11614dfc37c79e67ea466d68f00526210f2 - Sigstore transparency entry: 649746226
- Sigstore integration time:
-
Permalink:
foundation-model-stack/fms-model-optimizer@65acdec0fb09d919651b0b33ed6bd892e167a4c7 -
Branch / Tag:
refs/tags/v0.7.0 - Owner: https://github.com/foundation-model-stack
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pypi.yml@65acdec0fb09d919651b0b33ed6bd892e167a4c7 -
Trigger Event:
release
-
Statement type: