Skip to main content

Repository of Neural Compressor ORT

Project description

Neural Compressor

An open-source Python library supporting popular model compression techniques for ONNX

python version license


Neural Compressor aims to provide popular model compression techniques inherited from Intel Neural Compressor yet focused on ONNX model quantization such as SmoothQuant, weight-only quantization through ONNX Runtime. In particular, the tool provides the key features, typical examples, and open collaborations as below:

Installation

Install from source

git clone https://github.com/onnx/neural-compressor.git
cd neural-compressor
pip install -r requirements.txt
pip install .

Note: Further installation methods can be found under Installation Guide.

Getting Started

Setting up the environment:

pip install onnx-neural-compressor "onnxruntime>=1.17.0" onnx

After successfully installing these packages, try your first quantization program.

Notes: please install from source before the formal pypi release.

Weight-Only Quantization (LLMs)

Following example code demonstrates Weight-Only Quantization on LLMs, device will be selected for efficiency automatically when multiple devices are available.

Run the example:

from onnx_neural_compressor.quantization import matmul_nbits_quantizer

algo_config = matmul_nbits_quantizer.RTNWeightOnlyQuantConfig()
quant = matmul_nbits_quantizer.MatMulNBitsQuantizer(
    model,
    n_bits=4,
    block_size=32,
    is_symmetric=True,
    algo_config=algo_config,
)
quant.process()
best_model = quant.model

Static Quantization

from onnx_neural_compressor.quantization import quantize, config
from onnx_neural_compressor import data_reader


class DataReader(data_reader.CalibrationDataReader):
    def __init__(self):
        self.encoded_list = []
        # append data into self.encoded_list

        self.iter_next = iter(self.encoded_list)

    def get_next(self):
        return next(self.iter_next, None)

    def rewind(self):
        self.iter_next = iter(self.encoded_list)


data_reader = DataReader()
qconfig = config.StaticQuantConfig(calibration_data_reader=data_reader)
quantize(model, output_model_path, qconfig)

Documentation

Overview
Architecture Workflow Examples
Feature
Quantization SmoothQuant
Weight-Only Quantization (INT8/INT4) Layer-Wise Quantization

Additional Content

Communication

  • GitHub Issues: mainly for bug reports, new feature requests, question asking, etc.
  • Email: welcome to raise any interesting research ideas on model compression techniques by email for collaborations.

Project details


Release history Release notifications | RSS feed

This version

1.0

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

onnx_neural_compressor-1.0.tar.gz (95.3 kB view details)

Uploaded Source

Built Distribution

onnx_neural_compressor-1.0-py3-none-any.whl (143.0 kB view details)

Uploaded Python 3

File details

Details for the file onnx_neural_compressor-1.0.tar.gz.

File metadata

  • Download URL: onnx_neural_compressor-1.0.tar.gz
  • Upload date:
  • Size: 95.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.4.0.post20200518 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.8.3

File hashes

Hashes for onnx_neural_compressor-1.0.tar.gz
Algorithm Hash digest
SHA256 7d04a517a36c1bb0e976b014dbf51bea7b6b747136a409ea0959351d4a8acce1
MD5 3e617eb35d800bfd56d3039ebc441c11
BLAKE2b-256 696a25cdb4307e361d54ca7c824e35fa325925134d5df91c50538fad846fe774

See more details on using hashes here.

File details

Details for the file onnx_neural_compressor-1.0-py3-none-any.whl.

File metadata

  • Download URL: onnx_neural_compressor-1.0-py3-none-any.whl
  • Upload date:
  • Size: 143.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.4.0.post20200518 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.8.3

File hashes

Hashes for onnx_neural_compressor-1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 6896dd9084e75812ce6d28bc794e8f5d7e2f8a394c23c81d5232624aa1db8722
MD5 d1990a350fce841ec2d4ad5ff79e54fd
BLAKE2b-256 9ffb750b57c3174bccb6b77518f5fa1f0cf308e38cbdf24209f62be31d8eceff

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page