Repository of Neural Compressor ORT
Project description
Neural Compressor
An open-source Python library supporting popular model compression techniques for ONNX
Neural Compressor aims to provide popular model compression techniques inherited from Intel Neural Compressor yet focused on ONNX model quantization such as SmoothQuant, weight-only quantization through ONNX Runtime. In particular, the tool provides the key features, typical examples, and open collaborations as below:
-
Support a wide range of Intel hardware such as Intel Xeon Scalable Processors and AIPC
-
Validate popular LLMs such as LLama2 and broad models such as BERT-base, and ResNet50 from popular model hubs such as Hugging Face, ONNX Model Zoo, by leveraging automatic accuracy-driven quantization strategies
-
Collaborate with software platforms such as Microsoft Olive, and open AI ecosystem such as Hugging Face, ONNX and ONNX Runtime
Installation
Install from source
git clone https://github.com/onnx/neural-compressor.git
cd neural-compressor
pip install -r requirements.txt
pip install .
Note: Further installation methods can be found under Installation Guide.
Getting Started
Setting up the environment:
pip install onnx-neural-compressor "onnxruntime>=1.17.0" onnx
After successfully installing these packages, try your first quantization program.
Notes: please install from source before the formal pypi release.
Weight-Only Quantization (LLMs)
Following example code demonstrates Weight-Only Quantization on LLMs, device will be selected for efficiency automatically when multiple devices are available.
Run the example:
from onnx_neural_compressor.quantization import matmul_nbits_quantizer
algo_config = matmul_nbits_quantizer.RTNWeightOnlyQuantConfig()
quant = matmul_nbits_quantizer.MatMulNBitsQuantizer(
model,
n_bits=4,
block_size=32,
is_symmetric=True,
algo_config=algo_config,
)
quant.process()
best_model = quant.model
Static Quantization
from onnx_neural_compressor.quantization import quantize, config
from onnx_neural_compressor import data_reader
class DataReader(data_reader.CalibrationDataReader):
def __init__(self):
self.encoded_list = []
# append data into self.encoded_list
self.iter_next = iter(self.encoded_list)
def get_next(self):
return next(self.iter_next, None)
def rewind(self):
self.iter_next = iter(self.encoded_list)
data_reader = DataReader()
qconfig = config.StaticQuantConfig(calibration_data_reader=data_reader)
quantize(model, output_model_path, qconfig)
Documentation
Overview | ||||||||
---|---|---|---|---|---|---|---|---|
Architecture | Workflow | Examples | ||||||
Feature | ||||||||
Quantization | SmoothQuant | |||||||
Weight-Only Quantization (INT8/INT4) | Layer-Wise Quantization |
Additional Content
Communication
- GitHub Issues: mainly for bug reports, new feature requests, question asking, etc.
- Email: welcome to raise any interesting research ideas on model compression techniques by email for collaborations.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file onnx_neural_compressor-1.0.tar.gz
.
File metadata
- Download URL: onnx_neural_compressor-1.0.tar.gz
- Upload date:
- Size: 95.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.3.0 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.4.0.post20200518 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.8.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7d04a517a36c1bb0e976b014dbf51bea7b6b747136a409ea0959351d4a8acce1 |
|
MD5 | 3e617eb35d800bfd56d3039ebc441c11 |
|
BLAKE2b-256 | 696a25cdb4307e361d54ca7c824e35fa325925134d5df91c50538fad846fe774 |
File details
Details for the file onnx_neural_compressor-1.0-py3-none-any.whl
.
File metadata
- Download URL: onnx_neural_compressor-1.0-py3-none-any.whl
- Upload date:
- Size: 143.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.3.0 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.4.0.post20200518 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.8.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6896dd9084e75812ce6d28bc794e8f5d7e2f8a394c23c81d5232624aa1db8722 |
|
MD5 | d1990a350fce841ec2d4ad5ff79e54fd |
|
BLAKE2b-256 | 9ffb750b57c3174bccb6b77518f5fa1f0cf308e38cbdf24209f62be31d8eceff |