Skip to main content

Repository of Intel® Neural Compressor

Project description

Introduction to Intel® Neural Compressor

Intel® Neural Compressor (formerly known as Intel® Low Precision Optimization Tool) is an open-source Python library running on Intel CPUs and GPUs, which delivers unified interfaces across multiple deep learning frameworks for popular network compression technologies, such as quantization, pruning, knowledge distillation. This tool supports automatic accuracy-driven tuning strategies to help user quickly find out the best quantized model. It also implements different weight pruning algorithms to generate pruned model with predefined sparsity goal and supports knowledge distillation to distill the knowledge from the teacher model to the student model.

Note

GPU support is under development.

Visit the Intel® Neural Compressor online document website at: https://intel.github.io/neural-compressor.

Infrastructure

Intel® Neural Compressor features an architecture and workflow that aids in increasing performance and faster deployments across infrastructures.

Architecture

Architecture

Click the image to enlarge it.

Workflow

Workflow

Click the image to enlarge it.

Supported Frameworks

Supported deep learning frameworks are:

Note: Intel Optimized TensorFlow 2.5.0 requires to set environment variable TF_ENABLE_MKL_NATIVE_FORMAT=0 before running Neural Compressor quantization or deploying the quantized model.

Note: From the official TensorFlow 2.6.0, oneDNN support has been upstreamed. Download the official TensorFlow 2.6.0 binary for the CPU device and set the environment variable TF_ENABLE_ONEDNN_OPTS=1 before running the quantization process or deploying the quantized model.

Installation

Select the installation based on your operating system.

Linux Installation

You can install Neural Compressor using one of three options: Install just the library from binary or source, or get the Intel-optimized framework together with the library by installing the Intel® oneAPI AI Analytics Toolkit.

Prerequisites

The following prerequisites and requirements must be satisfied for a successful installation:

  • Python version: 3.7 or 3.8 or 3.9

  • C++ compiler: 7.2.1 or above

  • CMake: 3.12 or above

common build issues

Issue 1: ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 88 from C header, got 80 from PyObject

Solution: reinstall pycocotools by "pip install pycocotools --no-cache-dir"

Issue 2: ImportError: libGL.so.1: cannot open shared object file: No such file or directory

Solution: apt install or yum install opencv

Option 1 Install from binary

# install stable version from pip
pip install neural-compressor

# install nightly version from pip
pip install -i https://test.pypi.org/simple/ neural-compressor

# install stable version from from conda
conda install neural-compressor -c conda-forge -c intel 

Option 2 Install from source

git clone https://github.com/intel/neural-compressor.git
cd neural-compressor
git submodule sync
git submodule update --init --recursive
pip install -r requirements.txt
python setup.py install

Option 3 Install from AI Kit

The Intel® Neural Compressor library is released as part of the Intel® oneAPI AI Analytics Toolkit (AI Kit). The AI Kit provides a consolidated package of Intel's latest deep learning and machine optimizations all in one place for ease of development. Along with Neural Compressor, the AI Kit includes Intel-optimized versions of deep learning frameworks (such as TensorFlow and PyTorch) and high-performing Python libraries to streamline end-to-end data science and AI workflows on Intel architectures.

The AI Kit is distributed through many common channels, including from Intel's website, YUM, APT, Anaconda, and more. Select and download the AI Kit distribution package that's best suited for you and follow the Get Started Guide for post-installation instructions.

Download AI Kit AI Kit Get Started Guide

Windows Installation

Prerequisites

The following prerequisites and requirements must be satisfied for a successful installation:

  • Python version: 3.7 or 3.8 or 3.9

  • Download and install anaconda.

  • Create a virtual environment named nc in anaconda:

    # Here we install python 3.7 for instance. You can also choose python 3.8 or 3.9.
    conda create -n nc python=3.7
    conda activate nc
    

Installation options

Option 1 Install from binary

# install stable version from pip
pip install neural-compressor

# install nightly version from pip
pip install -i https://test.pypi.org/simple/ neural-compressor

# install from conda
conda install pycocotools -c esri   
conda install neural-compressor -c conda-forge -c intel

Option 2 Install from source

git clone https://github.com/intel/neural-compressor.git
cd neural-compressor
git submodule sync
git submodule update --init --recursive
pip install -r requirements.txt
python setup.py install

Documentation

Get Started

  • APIs explains Intel® Neural Compressor's API.
  • GUI provides web-based UI service to make quantization easier.
  • Transform introduces how to utilize Neural Compressor's built-in data processing and how to develop a custom data processing method.
  • Dataset introduces how to utilize Neural Compressor's built-in dataset and how to develop a custom dataset.
  • Metric introduces how to utilize Neural Compressor's built-in metrics and how to develop a custom metric.
  • Tutorial provides comprehensive instructions on how to utilize Neural Compressor's features with examples.
  • Examples are provided to demonstrate the usage of Neural Compressor in different frameworks: TensorFlow, PyTorch, MXNet, and ONNX Runtime.
  • Intel oneAPI AI Analytics Toolkit Get Started Guide explains the AI Kit components, installation and configuration guides, and instructions for building and running sample apps.
  • AI and Analytics Samples includes code samples for Intel oneAPI libraries.

Deep Dive

  • Quantization are processes that enable inference and training by performing computations at low-precision data types, such as fixed-point integers. Neural Compressor supports Post-Training Quantization (PTQ) with different quantization capabilities and Quantization-Aware Training (QAT). Note that (Dynamic Quantization) currently has limited support.
  • Pruning provides a common method for introducing sparsity in weights and activations.
  • Knowledge Distillation provides a common method for distilling knowledge from teacher model to student model.
  • Distributed Training introduces how to leverage Horovod to do multi-node training in Intel® Neural Compressor to speed up the training time.
  • Benchmarking introduces how to utilize the benchmark interface of Neural Compressor.
  • Mixed precision introduces how to enable mixed precision, including BFP16 and int8 and FP32, on Intel platforms during tuning.
  • Graph Optimization introduces how to enable graph optimization for FP32 and auto-mixed precision.
  • Model Conversion introduces how to convert TensorFlow QAT model to quantized model running on Intel platforms.
  • TensorBoard provides tensor histograms and execution graphs for tuning debugging purposes.

Advanced Topics

  • Execution Engine is a bare metal solution domain-specific NLP models as the reference for customers.
  • Adaptor is the interface between components and framework. The method to develop adaptor extension is introduced with ONNX Runtime as example.
  • Strategy can automatically optimized low-precision recipes for deep learning models to achieve optimal product objectives like inference performance and memory usage with expected accuracy criteria. The method to develop a new strategy is introduced.

Publications

Full publication list please refers to here

System Requirements

Intel® Neural Compressor supports systems based on Intel 64 architecture or compatible processors, specially optimized for the following CPUs:

  • Intel Xeon Scalable processor (formerly Skylake, Cascade Lake, Cooper Lake, and Icelake)
  • future Intel Xeon Scalable processor (code name Sapphire Rapids)

Intel® Neural Compressor requires installing the Intel-optimized framework version for the supported DL framework you use: TensorFlow, PyTorch, MXNet, or ONNX runtime.

Note: Intel Neural Compressor supports Intel-optimized and official frameworks for some TensorFlow versions. Refer to Supported Frameworks for specifics.

Validated Hardware/Software Environment

Platform OS Python Framework Version
Cascade Lake

Cooper Lake

Skylake

Ice Lake
CentOS 8.3

Ubuntu 18.04
3.7

3.8

3.9
TensorFlow 2.8.0
2.7.0
2.6.2
1.15.0UP3
PyTorch 1.10.0+cpu
1.9.0+cpu
1.8.0+cpu
IPEX
MXNet 1.8.0
1.7.0
1.6.0
ONNX Runtime 1.10.0
1.9.0
1.8.0

Validated Models

Intel® Neural Compressor provides numerous examples to show promising accuracy loss with the best performance gain. A full quantized model list on various frameworks is available in the Model List.

Validated MLPerf Models

Model Framework Support Example
ResNet50 v1.5 TensorFlow Yes Link
PyTorch Yes Link
DLRM PyTorch Yes Link
BERT-large TensorFlow Yes Link
PyTorch Yes Link
SSD-ResNet34 TensorFlow Yes Link
PyTorch Yes Link
RNN-T PyTorch Yes Link
3D-UNet TensorFlow WIP
PyTorch Yes Link

Validated Quantized Models

Framework version model Accuracy Performance/ICX8380/1s4c10ins1bs/throughput(samples/sec)
INT8 FP32 Acc Ratio[(INT8-FP32)/FP32] INT8 FP32 Performance Ratio[INT8/FP32]
tensorflow 2.6.0 resnet50v1.0 74.11% 74.27% -0.22% 1287.00 495.29 2.60x
tensorflow 2.6.0 resnet50v1.5 76.82% 76.46% 0.47% 1218.03 420.34 2.90x
tensorflow 2.6.0 resnet101 77.50% 76.45% 1.37% 849.62 345.54 2.46x
tensorflow 2.6.0 inception_v1 70.48% 69.74% 1.06% 2202.64 1058.20 2.08x
tensorflow 2.6.0 inception_v2 74.36% 73.97% 0.53% 1751.31 827.81 2.11x
tensorflow 2.6.0 inception_v3 77.28% 76.75% 0.69% 868.06 384.17 2.26x
tensorflow 2.6.0 inception_v4 80.40% 80.27% 0.16% 569.48 197.28 2.89x
tensorflow 2.6.0 inception_resnet_v2 80.44% 80.40% 0.05% 269.03 137.25 1.96x
tensorflow 2.6.0 mobilenetv1 71.79% 70.96% 1.17% 3831.42 1189.06 3.22x
tensorflow 2.6.0 mobilenetv2 71.79% 71.76% 0.04% 2570.69 1237.62 2.07x
tensorflow 2.6.0 ssd_resnet50_v1 37.86% 38.00% -0.37% 65.52 24.01 2.73x
tensorflow 2.6.0 ssd_mobilenet_v1 22.97% 23.13% -0.69% 842.46 404.04 2.08x
tensorflow 2.6.0 ssd_resnet34 21.69% 22.09% -1.81% 41.23 10.75 3.83x
Framework version model Accuracy Performance/ICX8380/1s4c10ins1bs/throughput(samples/sec)
INT8 FP32 Acc Ratio[(INT8-FP32)/FP32] INT8 FP32 Performance Ratio[INT8/FP32]
pytorch 1.9.0+cpu resnet18 69.59% 69.76% -0.24% 692.04 363.64 1.90x
pytorch 1.9.0+cpu resnet50 76.00% 76.13% -0.17% 453.10 186.67 2.43x
pytorch 1.9.0+cpu resnext101_32x8d 79.02% 79.31% -0.36% 196.27 70.08 2.80x
pytorch 1.9.0+cpu bert_base_mrpc 88.12% 88.73% -0.69% 199.32 107.34 1.86x
pytorch 1.9.0+cpu bert_base_cola 59.06% 58.84% 0.37% 198.53 105.29 1.89x
pytorch 1.9.0+cpu bert_base_sts-b 88.72% 89.27% -0.62% 203.29 107.03 1.90x
pytorch 1.9.0+cpu bert_base_sst-2 91.74% 91.86% -0.13% 197.86 105.31 1.88x
pytorch 1.9.0+cpu bert_base_rte 70.40% 69.68% 1.04% 192.90 107.25 1.80x
pytorch 1.9.0+cpu bert_large_mrpc 87.66% 88.33% -0.75% 94.08 33.84 2.78x
pytorch 1.9.0+cpu bert_large_squad 92.69 93.05 -0.38% 20.93 11.18 1.87x
pytorch 1.9.0+cpu bert_large_qnli 91.12% 91.82% -0.76% 93.75 33.73 2.78x
pytorch 1.9.0+cpu bert_large_rte 72.20% 72.56% -0.50% 52.80 33.62 1.57x
pytorch 1.9.0+cpu bert_large_cola 62.07% 62.57% -0.80% 94.97 33.77 2.81x
pytorch 1.9.0+cpu inception_v3 69.48% 69.54% -0.09% 418.59 207.77 2.01x
pytorch 1.9.0+cpu peleenet 71.61% 72.08% -0.66% 461.47 359.58 1.28x
pytorch 1.9.0+cpu yolo_v3 24.50% 24.54% -0.17% 98.11 37.50 2.62x

Validated Pruning Models

Tasks FWK Model fp32 baseline gradient sensitivity with 20% sparsity +onnx dynamic quantization on pruned model
accuracy% drop% perf gain (sample/s) accuracy% drop% perf gain (sample/s)
SST-2 pytorch bert-base accuracy = 92.32 accuracy = 91.97 -0.38 1.30x accuracy = 92.20 -0.13 1.86x
QQP pytorch bert-base [accuracy, f1] = [91.10, 88.05] [accuracy, f1] = [89.97, 86.54] [-1.24, -1.71] 1.32x [accuracy, f1] = [89.75, 86.60] [-1.48, -1.65] 1.81x
Tasks FWK Model fp32 baseline Pattern Lock on 70% Unstructured Sparsity Pattern Lock on 50% 1:2 Structured Sparsity
accuracy% drop% accuracy% drop%
MNLI pytorch bert-base [m, mm] = [84.57, 84.79] [m, mm] = [82.45, 83.27] [-2.51, -1.80] [m, mm] = [83.20, 84.11] [-1.62, -0.80]
SST-2 pytorch bert-base accuracy = 92.32 accuracy = 91.51 -0.88 accuracy = 92.20 -0.13
QQP pytorch bert-base [accuracy, f1] = [91.10, 88.05] [accuracy, f1] = [90.48, 87.06] [-0.68, -1.12] [accuracy, f1] = [90.92, 87.78] [-0.20, -0.31]
QNLI pytorch bert-base accuracy = 91.54 accuracy = 90.39 -1.26 accuracy = 90.87 -0.73
QnA pytorch bert-base [em, f1] = [79.34, 87.10] [em, f1] = [77.27, 85.75] [-2.61, -1.54] [em, f1] = [78.03, 86.50] [-1.65, -0.69]
Framework Model fp32 baseline Compression dataset acc(drop)%
Pytorch resnet18 69.76 30% sparsity on magnitude ImageNet 69.47(-0.42)
Pytorch resnet18 69.76 30% sparsity on gradient sensitivity ImageNet 68.85(-1.30)
Pytorch resnet50 76.13 30% sparsity on magnitude ImageNet 76.11(-0.03)
Pytorch resnet50 76.13 30% sparsity on magnitude and post training quantization ImageNet 76.01(-0.16)
Pytorch resnet50 76.13 30% sparsity on magnitude and quantization aware training ImageNet 75.90(-0.30)

Validated Knowledge Distillation Examples

Example Name Dataset Student
(Accuracy)
Teacher
(Accuracy)
Student With Distillation
(Accuracy Improvement)
ResNet example ImageNet ResNet18
(0.6739)
ResNet50
(0.7399)
0.6845
(0.0106)
BlendCnn example MRPC BlendCnn
(0.7034)
BERT-Base
(0.8382)
0.7034
(0)
BiLSTM example SST-2 BiLSTM
(0.7913)
RoBERTa-Base
(0.9404)
0.8085
(0.0172)

Validated Engine Examples

model Accuracy Performance/ICX8380/1s4c10ins1bs/seq_len128/throughput(samples/sec) Performance/ICX8380/2s4c20ins64bs/seq_len128/throughput(samples/sec)
INT8 FP32 Acc   Ratio[(INT8-FP32)/FP32] INT8 FP32 Preformance   Ratio[INT8/FP32] INT8 FP32 Preformance   Ratio[INT8/FP32]
bert_large_squad 90.74 90.87 -0.14% 44.9 12.33 3.64x 362.21 88.38 4.10x
distilbert_base_uncased_sst2 90.14% 90.25% -0.12% 1003.01 283.69 3.54x 2104.26 606.58 3.47x
minilm_l6_h384_uncased_sst2 89.33% 90.14% -0.90% 2739.73 999 2.74x 5389.98 2333.14 2.31x
roberta_base_mrpc 89.46% 88.97% 0.55% 506.07 142.13 3.56x 1167.09 311.5 3.75x
bert_base_nli_mean_tokens_stsb 89.27% 89.55% -0.31% 503.52 140.98 3.57x 1096.46 332.54 3.30x
bert_base_sparse_mrpc 70.34% 70.59% -0.35% 506.59 142.33 3.56x 1133.04 339.96 3.33x
distilroberta_base_wnli 56.34% 56.34% 0.00% 1026.69 290.7 3.53x 2309.9 620.81 3.72x
paraphrase_xlm_r_multilingual_v1_stsb 86.72% 87.23% -0.58% 509.68 142.73 3.57x 1169.45 311.59 3.75x
distilbert_base_uncased_mrpc 84.07% 84.07% 0.00% 1002 280.27 3.58x 2107.96 606.95 3.47x
finbert_financial_phrasebank 82.74% 82.80% -0.07% 919.12 272.48 3.37x 1101.13 331.88 3.32x
distilbert_base_uncased_emotion 93.85% 94.20% -0.37% 1003.01 283.53 3.54x 2103.22 607.08 3.46x

Additional Content

Hiring

We are hiring. Please send your resume to inc.maintainers@intel.com if you have interests in model compression techniques.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

neural_compressor-1.10.1.tar.gz (1.8 MB view details)

Uploaded Source

Built Distributions

neural_compressor-1.10.1-cp39-cp39-win_amd64.whl (2.3 MB view details)

Uploaded CPython 3.9 Windows x86-64

neural_compressor-1.10.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (73.3 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

neural_compressor-1.10.1-cp38-cp38-win_amd64.whl (2.3 MB view details)

Uploaded CPython 3.8 Windows x86-64

neural_compressor-1.10.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (73.3 MB view details)

Uploaded CPython 3.8 manylinux: glibc 2.17+ x86-64

neural_compressor-1.10.1-cp37-cp37m-win_amd64.whl (2.3 MB view details)

Uploaded CPython 3.7m Windows x86-64

neural_compressor-1.10.1-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (73.3 MB view details)

Uploaded CPython 3.7m manylinux: glibc 2.17+ x86-64

File details

Details for the file neural_compressor-1.10.1.tar.gz.

File metadata

  • Download URL: neural_compressor-1.10.1.tar.gz
  • Upload date:
  • Size: 1.8 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.4.0.post20200518 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.8.3

File hashes

Hashes for neural_compressor-1.10.1.tar.gz
Algorithm Hash digest
SHA256 0adc8572abcf42880029db11d04a58d619e2ab835d35e8809e08796da5619a48
MD5 40fa38449fe543d202d2208193d091aa
BLAKE2b-256 5aee9c18463c76c8c58ae935873aebeefdcda03b4f83a331effea8d86dfe590c

See more details on using hashes here.

File details

Details for the file neural_compressor-1.10.1-cp39-cp39-win_amd64.whl.

File metadata

  • Download URL: neural_compressor-1.10.1-cp39-cp39-win_amd64.whl
  • Upload date:
  • Size: 2.3 MB
  • Tags: CPython 3.9, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.4.0.post20200518 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.8.3

File hashes

Hashes for neural_compressor-1.10.1-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 a85db5ae6d6478d650e1e38c7325598487452ed18e3fb8701a248ff64efc9261
MD5 b40fb4d10bc880abc1c33342ebc3a5a3
BLAKE2b-256 090d881e70e5dd40d61422283f9f4e5e565ee680c77e2769c827d08ba96259cc

See more details on using hashes here.

File details

Details for the file neural_compressor-1.10.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for neural_compressor-1.10.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 43fdb435884b4499bd30a62bce9041290ccf92dab10f3471ea128b80f3ba0c7c
MD5 86dd657ec23a14e7f0e1d0c50841e054
BLAKE2b-256 0af71d8ad71c61fa3826a667c1f13e36716db65f23a6ef82803e24aebf4b29c0

See more details on using hashes here.

File details

Details for the file neural_compressor-1.10.1-cp38-cp38-win_amd64.whl.

File metadata

  • Download URL: neural_compressor-1.10.1-cp38-cp38-win_amd64.whl
  • Upload date:
  • Size: 2.3 MB
  • Tags: CPython 3.8, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.4.0.post20200518 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.8.3

File hashes

Hashes for neural_compressor-1.10.1-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 5f800426ceee8a4545e669c3ffe646a46c2674bd029825b8584645c4b47738c1
MD5 56b7d1c4eac8b4f3b10bb959f0e82cac
BLAKE2b-256 e141596c19476d9540ed74b254250d2ece552159f6ff7fc6b520f1695bc3b2b8

See more details on using hashes here.

File details

Details for the file neural_compressor-1.10.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for neural_compressor-1.10.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 683a259fe3f758d005d959fc147df54cae091cbcc9b52dc9dc735849861f60e3
MD5 b73d35040cb60ce363e0f61d881df892
BLAKE2b-256 580b77569a4003b62f53eab58bb9c101a93a1e71d6e9d5ef4183875471827ea7

See more details on using hashes here.

File details

Details for the file neural_compressor-1.10.1-cp37-cp37m-win_amd64.whl.

File metadata

  • Download URL: neural_compressor-1.10.1-cp37-cp37m-win_amd64.whl
  • Upload date:
  • Size: 2.3 MB
  • Tags: CPython 3.7m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.4.0.post20200518 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.8.3

File hashes

Hashes for neural_compressor-1.10.1-cp37-cp37m-win_amd64.whl
Algorithm Hash digest
SHA256 780bcfa25392b4ef324d73391aded37c03b814ad506f0e5c66c72cb4123ba800
MD5 d579a08e4cd1d652cc9c63058254248b
BLAKE2b-256 5b169752276c87b0ae126a4eaab5caa40fc6c30d33116f6eab006b1a498c7e83

See more details on using hashes here.

File details

Details for the file neural_compressor-1.10.1-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for neural_compressor-1.10.1-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 430473c0e85207441bf2aeea111c3f261e0f584cf628a75ce6b121cd328524d4
MD5 5eb8c54c8ffcae08ff5b6ad7bf3d7012
BLAKE2b-256 c3d852b91521f0a1e6b8f2f755c454c4985cc7acb62f0f2d6f157f8251270638

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page