Project description

Neural Compressor

An open-source Python library supporting popular model compression techniques for ONNX

Neural Compressor aims to provide popular model compression techniques inherited from Intel Neural Compressor yet focused on ONNX model quantization such as SmoothQuant, weight-only quantization through ONNX Runtime. In particular, the tool provides the key features, typical examples, and open collaborations as below:

Support a wide range of Intel hardware such as Intel Xeon Scalable Processors and AIPC
Validate popular LLMs such as LLama2 and broad models such as BERT-base, and ResNet50 from popular model hubs such as Hugging Face, ONNX Model Zoo, by leveraging automatic accuracy-driven quantization strategies
Collaborate with software platforms such as Microsoft Olive, and open AI ecosystem such as Hugging Face, ONNX and ONNX Runtime

Installation

Install from source

git clone https://github.com/onnx/neural-compressor.git
cd neural-compressor
pip install -r requirements.txt
pip install .

Note: Further installation methods can be found under Installation Guide.

Getting Started

Setting up the environment:

pip install onnx-neural-compressor "onnxruntime>=1.17.0" onnx

After successfully installing these packages, try your first quantization program.

Notes: please install from source before the formal pypi release.

Weight-Only Quantization (LLMs)

Following example code demonstrates Weight-Only Quantization on LLMs, device will be selected for efficiency automatically when multiple devices are available.

Run the example:

from onnx_neural_compressor.quantization import matmul_nbits_quantizer

algo_config = matmul_nbits_quantizer.RTNWeightOnlyQuantConfig()
quant = matmul_nbits_quantizer.MatMulNBitsQuantizer(
    model,
    n_bits=4,
    block_size=32,
    is_symmetric=True,
    algo_config=algo_config,
)
quant.process()
best_model = quant.model

Static Quantization

from onnx_neural_compressor.quantization import quantize, config
from onnx_neural_compressor import data_reader


class DataReader(data_reader.CalibrationDataReader):
    def __init__(self):
        self.encoded_list = []
        # append data into self.encoded_list

        self.iter_next = iter(self.encoded_list)

    def get_next(self):
        return next(self.iter_next, None)

    def rewind(self):
        self.iter_next = iter(self.encoded_list)


data_reader = DataReader()
qconfig = config.StaticQuantConfig(calibration_data_reader=data_reader)
quantize(model, output_model_path, qconfig)

Documentation

Overview
Architecture	Workflow		Examples
Feature
Quantization		SmoothQuant
Weight-Only Quantization (INT8/INT4)		Layer-Wise Quantization

Additional Content

Communication

GitHub Issues: mainly for bug reports, new feature requests, question asking, etc.
Email: welcome to raise any interesting research ideas on model compression techniques by email for collaborations.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

1.0

Jul 31, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

onnx_neural_compressor-1.0.tar.gz (95.3 kB view details)

Uploaded Jul 31, 2024 Source

Built Distribution

onnx_neural_compressor-1.0-py3-none-any.whl (143.0 kB view details)

Uploaded Jul 31, 2024 Python 3

File details

Details for the file onnx_neural_compressor-1.0.tar.gz.

File metadata

Download URL: onnx_neural_compressor-1.0.tar.gz
Upload date: Jul 31, 2024
Size: 95.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.3.0 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.4.0.post20200518 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.8.3

File hashes

Hashes for onnx_neural_compressor-1.0.tar.gz
Algorithm	Hash digest
SHA256	`7d04a517a36c1bb0e976b014dbf51bea7b6b747136a409ea0959351d4a8acce1`
MD5	`3e617eb35d800bfd56d3039ebc441c11`
BLAKE2b-256	`696a25cdb4307e361d54ca7c824e35fa325925134d5df91c50538fad846fe774`

See more details on using hashes here.

File details

Details for the file onnx_neural_compressor-1.0-py3-none-any.whl.

File metadata

Download URL: onnx_neural_compressor-1.0-py3-none-any.whl
Upload date: Jul 31, 2024
Size: 143.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.3.0 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.4.0.post20200518 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.8.3

File hashes

Hashes for onnx_neural_compressor-1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`6896dd9084e75812ce6d28bc794e8f5d7e2f8a394c23c81d5232624aa1db8722`
MD5	`d1990a350fce841ec2d4ad5ff79e54fd`
BLAKE2b-256	`9ffb750b57c3174bccb6b77518f5fa1f0cf308e38cbdf24209f62be31d8eceff`