Quantization Techniques

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

mhickey

These details have not been verified by PyPI

Project description

FMS Model Optimizer

Lint Tests Build Release License

Introduction

FMS Model Optimizer is a framework for developing reduced precision neural network models. Quantization techniques, such as quantization-aware-training (QAT), post-training quantization (PTQ), and several other optimization techniques on popular deep learning workloads are supported.

Highlights

Python API to enable model quantization: With the addition of a few lines of codes, module-level and/or function-level operations replacement will be performed.
Robust: Verified for INT 8/4-bit quantization on important vision/speech/NLP/object detection/LLMs.
Flexible: Options to analyze the network using PyTorch Dynamo, apply best practices, such as clip_val initialization, layer-level precision setting, optimizer param group setting, etc. during quantization.
State-of-the-art INT and FP quantization techniques for weights and activations, such as SmoothQuant, SAWB+ and PACT+.
Supports key compute-intensive operations like Conv2d, Linear, LSTM, MM and BMM

Supported Models

	GPTQ	FP8	PTQ	QAT
Granite	:white_check_mark:	:white_check_mark:	:white_check_mark:	:black_square_button:
Llama	:white_check_mark:	:white_check_mark:	:white_check_mark:	:black_square_button:
Mixtral	:white_check_mark:	:white_check_mark:	:white_check_mark:	:black_square_button:
BERT/Roberta	:white_check_mark:	:white_check_mark:	:white_check_mark:	:white_check_mark:

Note: Direct QAT on LLMs is not recommended

Getting Started

Requirements

🐧 Linux system with Nvidia GPU (V100/A100/H100)
Python 3.10 to Python 3.12
CUDA >=12

Optional packages based on optimization functionality required:

GPTQ is a popular compression method for LLMs:
- gptqmodel or build from source
If you want to experiment with INT8 deployment in QAT and PTQ examples:
- Nvidia GPU with compute capability > 8.0 (A100 family or higher)
- Option 1:
  - Ninja
  - Clone the CUTLASS repository
  - PyTorch 2.3.1 (as newer version will cause issue for the custom CUDA kernel used in these examples)
- Option 2:
  - use triton kernel included. But this kernel is currently not faster than FP16.
FP8 is a reduced precision format like INT8:
- Nvidia A100 family or higher
- llm-compressor
To enable compute graph plotting function (mostly for troubleshooting purpose):

[!NOTE] PyTorch version should be < 2.4 if you would like to experiment deployment with external INT8 kernel.

Installation

We recommend using a Python virtual environment with Python 3.9+. Here is how to setup a virtual environment using Python venv:

python3 -m venv fms_mo_venv
source fms_mo_venv/bin/activate

[!TIP] If you use pyenv, Conda Miniforge or other such tools for Python version management, create the virtual environment with that tool instead of venv. Otherwise, you may have issues with installed packages not being found as they are linked to your Python version management tool and not venv.

There are 2 ways to install the FMS Model Optimizer as follows:

From Release

To install from release (PyPi package):

python3 -m venv fms_mo_venv
source fms_mo_venv/bin/activate
pip install fms-model-optimizer

From Source

To install from source(GitHub Repository):

python3 -m venv fms_mo_venv
source fms_mo_venv/bin/activate
git clone https://github.com/foundation-model-stack/fms-model-optimizer
cd fms-model-optimizer
pip install -e .

Optional Dependencies

The following optional dependencies are available:

fp8: llmcompressor package for fp8 quantization
gptq: GPTQModel package for W4A16 quantization
mx: microxcaling package for MX quantization
opt: Shortcut for fp8, gptq, and mx installs
aiu: ibm-fms package for AIU model deployment
torchvision: torch package for image recognition training and inference
triton: triton package for matrix multiplication kernels
examples: Dependencies needed for examples
visualize: Dependencies for visualizing models and performance data
test: Dependencies needed for unit testing
dev: Dependencies needed for development

To install an optional dependency, modify the pip install commands above with a list of these names enclosed in brackets. The example below installs llm-compressor and torchvision with FMS Model Optimizer:

pip install fms-model-optimizer[fp8,torchvision]

pip install -e .[fp8,torchvision]

If you have already installed FMS Model Optimizer, then only the optional packages will be installed.

Try It Out!

To help you get up and running as quickly as possible with the FMS Model Optimizer framework, check out the following resources which demonstrate how to use the framework with different quantization techniques:

Jupyter notebook tutorials (It is recommended to begin here):
- Quantization tutorial:
  - Visualizes a random Gaussian tensor step-by-step along the quantization process
  - Build a quantizer and quantized convolution module based on this process
Python script examples

Docs

Dive into the design document to get a better understanding of the framework motivation and concepts.

Contributing

Check out our contributing guide to learn how to contribute.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

mhickey

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.8.2

Apr 3, 2026

0.8.1

Feb 23, 2026

0.8.0

Dec 8, 2025

0.7.0

Oct 28, 2025

0.6.0

Aug 7, 2025

0.5.0

Jul 17, 2025

0.4.1

Jul 11, 2025

0.4.0

Jul 11, 2025

0.3.0

Jun 10, 2025

0.2.0

Dec 13, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fms_model_optimizer-0.8.2.tar.gz (5.2 MB view details)

Uploaded Apr 3, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

fms_model_optimizer-0.8.2-py3-none-any.whl (361.7 kB view details)

Uploaded Apr 3, 2026 Python 3

File details

Details for the file fms_model_optimizer-0.8.2.tar.gz.

File metadata

Download URL: fms_model_optimizer-0.8.2.tar.gz
Upload date: Apr 3, 2026
Size: 5.2 MB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for fms_model_optimizer-0.8.2.tar.gz
Algorithm	Hash digest
SHA256	`fffd6526dbc6e454d68ab424755b4b31e87205366af8d1f8af3ac7a9513b0646`
MD5	`cc3699f9d2f81f289cc32ceb550e3478`
BLAKE2b-256	`fcafe62b2b3745ef0c088f3c2edfc229bd738af5edc216443ecb0b5a0f0546cc`

See more details on using hashes here.

Provenance

The following attestation bundles were made for fms_model_optimizer-0.8.2.tar.gz:

Publisher: pypi.yml on foundation-model-stack/fms-model-optimizer

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: fms_model_optimizer-0.8.2.tar.gz
- Subject digest: fffd6526dbc6e454d68ab424755b4b31e87205366af8d1f8af3ac7a9513b0646
- Sigstore transparency entry: 1227694942
- Sigstore integration time: Apr 3, 2026
Source repository:
- Permalink: foundation-model-stack/fms-model-optimizer@d4988514db45e99631394c19c05961d4d1b89595
- Branch / Tag: refs/tags/v0.8.2
- Owner: https://github.com/foundation-model-stack
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: pypi.yml@d4988514db45e99631394c19c05961d4d1b89595
- Trigger Event: release

File details

Details for the file fms_model_optimizer-0.8.2-py3-none-any.whl.

File metadata

Download URL: fms_model_optimizer-0.8.2-py3-none-any.whl
Upload date: Apr 3, 2026
Size: 361.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for fms_model_optimizer-0.8.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`4c145226b25e6326bea0ed4a214ab9eb11f7eda3b6b031bd1aa65bb9f150fc58`
MD5	`1f92a6a856bf91e988540d3520ecae66`
BLAKE2b-256	`feccaa91ee9fb16db2c5d1f8ff6de049422b6aaafd5f63027b48e20bda0254fe`

See more details on using hashes here.

Provenance

The following attestation bundles were made for fms_model_optimizer-0.8.2-py3-none-any.whl:

Publisher: pypi.yml on foundation-model-stack/fms-model-optimizer

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: fms_model_optimizer-0.8.2-py3-none-any.whl
- Subject digest: 4c145226b25e6326bea0ed4a214ab9eb11f7eda3b6b031bd1aa65bb9f150fc58
- Sigstore transparency entry: 1227694949
- Sigstore integration time: Apr 3, 2026
Source repository:
- Permalink: foundation-model-stack/fms-model-optimizer@d4988514db45e99631394c19c05961d4d1b89595
- Branch / Tag: refs/tags/v0.8.2
- Owner: https://github.com/foundation-model-stack
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: pypi.yml@d4988514db45e99631394c19c05961d4d1b89595
- Trigger Event: release

fms-model-optimizer 0.8.2

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

FMS Model Optimizer

Introduction

Highlights

Supported Models

Getting Started

Requirements

Installation

From Release

From Source

Optional Dependencies

Try It Out!

Docs

Contributing

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance