Skip to main content

A nnie quantization aware training tool on pytorch.

Project description


This is a quantize aware training package for Neural Network Inference Engine(NNIE) on pytorch, it uses hisilicon quantization library to quantize module's weight and input data as fake fp32 format. To train model which is more friendly to NNIE, just import nnieqat and replace torch.nn default modules with corresponding one.

Table of Contents

  1. Installation
  2. Usage
  3. Code Examples
  4. Results
  5. Todo
  6. Reference


  • Supported Platforms: Linux
  • Accelerators and GPUs: NVIDIA GPUs via CUDA driver 10.
  • Dependencies:
    • python >= 3.5, < 4
    • llvmlite >= 0.31.0
    • pytorch >= 1.0
    • numba >= 0.42.0
    • numpy >= 1.18.1
  • Install nnieqat via pypi: $ pip install nnieqat


  • Replace default module with NNIE quantization optimized one. include:

    • torch.nn.modules.conv -> nnieqat.modules.conv
    • torch.nn.modules.linear -> nnieqat.modules.linear
    • torch.nn.modules.pooling -> nnieqat.modules.pooling
    from nnieqat.modules import convert_layers
      model = convert_layers(model)
      print(model)  # Quantized layers have "Quantized" prefix.
  • Freeze bn after a few epochs of training

    from nnieqat.gpu.quantize import freeze_bn
        if epoch > 2:
  • Unquantize weight before update it

    from nnieqat.gpu.quantize import unquant_weight
  • Dump weight quantized model

    from nnieqat.gpu.quantize import quant_weight, unquant_weight

Code Examples


  • ImageNet

    python test/ /data/imgnet/ --arch squeezenet1_1  --lr 0.001 --pretrained --epoch 10   # nnie_lr_e-3_ft
    python /data/imgnet/ --arch squeezenet1_1  --lr 0.0001 --pretrained --epoch 10  # lr_e-4_ft
    python test/ /data/imgnet/ --arch squeezenet1_1  --lr 0.0001 --pretrained --epoch 10  # nnie_lr_e-4_ft

    finetune result:

    trt_fp32 trt_int8 nnie
    torchvision 0.56992 0.56424 0.56026
    nnie_lr_e-3_ft 0.56600 0.56328 0.56612
    lr_e-4_ft 0.57884 0.57502 0.57542
    nnie_lr_e-4_ft 0.57834 0.57524 0.57730


  • Multiple GPU training support.
  • Other platforms and accelerators support.
  • Generate quantized model directly.


HiSVP 量化库使用指南

Quantizing deep convolutional networks for efficient inference: A whitepaper

8-bit Inference with TensorRT

Distilling the Knowledge in a Neural Network

Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for nnieqat, version 0.1.0b0
Filename, size File type Python version Upload date Hashes
Filename, size nnieqat-0.1.0b0-py3-none-any.whl (818.4 kB) File type Wheel Python version py3 Upload date Hashes View
Filename, size nnieqat-0.1.0b0.tar.gz (812.0 kB) File type Source Python version None Upload date Hashes View

Supported by

Pingdom Pingdom Monitoring Google Google Object Storage and Download Analytics Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page