Skip to main content

Quantization package for KerasV3

Project description

qkeras

Installation

pip install qkeras-v3

Introduction

QKeras is a quantization extension to Keras that provides drop-in replacement for some of the Keras layers, especially the ones that creates parameters and activation layers, and perform arithmetic operations, so that we can quickly create a deep quantized version of Keras network.

According to Tensorflow documentation, Keras is a high-level API to build and train deep learning models. It's used for fast prototyping, advanced research, and production, with three key advantages:

  • User friendly

Keras has a simple, consistent interface optimized for common use cases. It provides clear and actionable feedback for user errors.

  • Modular and composable

Keras models are made by connecting configurable building blocks together, with few restrictions.

  • Easy to extend

Write custom building blocks to express new ideas for research. Create new layers, loss functions, and develop state-of-the-art models.

QKeras is being designed to extend the functionality of Keras using Keras' design principle, i.e. being user friendly, modular and extensible, adding to it being "minimally intrusive" of Keras native functionality.

In order to successfully quantize a model, users need to replace variable creating layers (Dense, Conv2D, etc) by their counterparts (QDense, QConv2D, etc), and any layers that perform math operations need to be quantized afterwards.

Publications

  • Claudionor N. Coelho Jr, Aki Kuusela, Shan Li, Hao Zhuang, Jennifer Ngadiuba, Thea Klaeboe Aarrestad, Vladimir Loncar, Maurizio Pierini, Adrian Alan Pol, Sioni Summers, "Automatic heterogeneous quantization of deep neural networks for low-latency inference on the edge for particle detectors", Nature Machine Intelligence (2021), https://www.nature.com/articles/s42256-021-00356-5

  • Claudionor N. Coelho Jr., Aki Kuusela, Hao Zhuang, Thea Aarrestad, Vladimir Loncar, Jennifer Ngadiuba, Maurizio Pierini, Sioni Summers, "Ultra Low-latency, Low-area Inference Accelerators using Heterogeneous Deep Quantization with QKeras and hls4ml", http://arxiv.org/abs/2006.10159v1

  • Erwei Wang, James J. Davis, Daniele Moro, Piotr Zielinski, Claudionor Coelho, Satrajit Chatterjee, Peter Y. K. Cheung, George A. Constantinides, "Enabling Binary Neural Network Training on the Edge", https://arxiv.org/abs/2102.04270

Layers Implemented in QKeras

  • QDense

  • QConv1D

  • QConv2D

  • QDepthwiseConv2D

  • QSeparableConv1D (depthwise + pointwise convolution, without quantizing the activation values after the depthwise step)

  • QSeparableConv2D (depthwise + pointwise convolution, without quantizing the activation values after the depthwise step)

  • QMobileNetSeparableConv2D (extended from MobileNet SeparableConv2D implementation, quantizes the activation values after the depthwise step)

  • QConv2DTranspose

  • QActivation

  • QAdaptiveActivation

  • QAveragePooling2D (in fact, an AveragePooling2D stacked with a QActivation layer for quantization of the result)

  • QBatchNormalization (is still in its experimental stage, as we have not seen the need to use this yet due to the normalization and regularization effects of stochastic activation functions.)

  • QOctaveConv2D

  • QSimpleRNN, QSimpleRNNCell

  • QLSTM, QLSTMCell

  • QGRU, QGRUCell

  • QBidirectional

It is worth noting that not all functionality is safe at this time to be used with other high-level operations, such as with layer wrappers. For example, Bidirectional layer wrappers are used with RNNs. If this is required, we encourage users to use quantization functions invoked as strings instead of the actual functions as a way through this, but we may change that implementation in the future.

A first attempt to create a safe mechanism in QKeras is the adoption of QActivation is a wrap-up that provides an encapsulation around the activation functions so that we can save and restore the network architecture, and duplicate them using Keras interface, but this interface has not been fully tested yet.

Activation Layers Implemented in QKeras

  • smooth_sigmoid(x)

  • hard_sigmoid(x)

  • binary_sigmoid(x)

  • binary_tanh(x)

  • smooth_tanh(x)

  • hard_tanh(x)

  • quantized_bits(bits=8, integer=0, symmetric=0, keep_negative=1)(x)

  • bernoulli(alpha=1.0)(x)

  • stochastic_ternary(alpha=1.0, threshold=0.33)(x)

  • ternary(alpha=1.0, threshold=0.33)(x)

  • stochastic_binary(alpha=1.0)(x)

  • binary(alpha=1.0)(x)

  • quantized_relu(bits=8, integer=0, use_sigmoid=0, negative_slope=0.0)(x)

  • quantized_ulaw(bits=8, integer=0, symmetric=0, u=255.0)(x)

  • quantized_tanh(bits=8, integer=0, symmetric=0)(x)

  • quantized_po2(bits=8, max_value=-1)(x)

  • quantized_relu_po2(bits=8, max_value=-1)(x)

The stochastic_* functions, bernoulli as well as quantized_relu and quantized_tanh rely on stochastic versions of the activation functions. They draw a random number with uniform distribution from _hard_sigmoid of the input x, and result is based on the expected value of the activation function. Please refer to the papers if you want to understand the underlying theory, or the documentation in qkeras/qlayers.py.

The parameters "bits" specify the number of bits for the quantization, and "integer" specifies how many bits of "bits" are to the left of the decimal point. Finally, our experience in training networks with QSeparableConv2D, both quantized_bits and quantized_tanh that generates values between [-1, 1), required symmetric versions of the range in order to properly converge and eliminate the bias.

Every time we use a quantization for weights and bias that can generate numbers outside the range [-1.0, 1.0], we need to adjust the *_range to the number. For example, if we have a quantized_bits(bits=6, integer=2) in a weight of a layer, we need to set the weight range to 2**2, which is equivalent to Catapult HLS ac_fixed<6, 3, true>. Similarly, for quantization functions that accept an alpha parameter, we need to specify a range of alpha, and for po2 type of quantizers, we need to specify the range of max_value.

Example

Suppose you have the following network.

An example of a very simple network is given below in Keras.

from keras.layers import *

x = x_in = Input(shape)
x = Conv2D(18, (3, 3), name="first_conv2d")(x)
x = Activation("relu")(x)
x = SeparableConv2D(32, (3, 3))(x)
x = Activation("relu")(x)
x = Flatten()(x)
x = Dense(NB_CLASSES)(x)
x = Activation("softmax")(x)

You can easily quantize this network as follows:

from keras.layers import *
from qkeras import *

x = x_in = Input(shape)
x = QConv2D(18, (3, 3),
        kernel_quantizer="stochastic_ternary",
        bias_quantizer="ternary", name="first_conv2d")(x)
x = QActivation("quantized_relu(3)")(x)
x = QSeparableConv2D(32, (3, 3),
        depthwise_quantizer=quantized_bits(4, 0, 1),
        pointwise_quantizer=quantized_bits(3, 0, 1),
        bias_quantizer=quantized_bits(3),
        depthwise_activation=quantized_tanh(6, 2, 1))(x)
x = QActivation("quantized_relu(3)")(x)
x = Flatten()(x)
x = QDense(NB_CLASSES,
        kernel_quantizer=quantized_bits(3),
        bias_quantizer=quantized_bits(3))(x)
x = QActivation("quantized_bits(20, 5)")(x)
x = Activation("softmax")(x)

The last QActivation is advisable if you want to compare results later on. Please find more cases under the directory examples.

QTools

The purpose of QTools is to assist hardware implementation of the quantized model and model energy consumption estimation. QTools has two functions: data type map generation and energy consumption estimation.

  • Data Type Map Generation: QTools automatically generate the data type map for weights, bias, multiplier, adder, etc. of each layer. The data type map includes operation type, variable size, quantizer type and bits, etc. Input of the QTools is:
  1. a given quantized model;
  2. a list of input quantizers for the model. Output of QTools json file that list the data type map of each layer (stored in qtools_instance._output_dict) Output methods include: qtools_stats_to_json, which is to output the data type map to a json file; qtools_stats_print which is to print out the data type map.
  • Energy Consumption Estimation: Another function of QTools is to estimate the model energy consumption in Pico Joules (pJ). It provides a tool for QKeras users to quickly estimate energy consumption for memory access and MAC operations in a quantized model derived from QKeras, especially when comparing power consumption of two models running on the same device.

As with any high-level model, it should be used with caution when attempting to estimate the absolute energy consumption of a model for a given technology, or when attempting to compare different technologies.

This tool also provides a measure for model tuning which needs to consider both accuracy and model energy consumption. The energy cost provided by this tool can be integrated into a total loss function which combines energy cost and accuracy.

  • Energy Model: The best work referenced by the literature on energy consumption was first computed by Horowitz M.: “1.1 computing’s energy problem ( and what we can do about it)”; IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC), 2014

In this work, the author attempted to estimate the energy consumption for accelerators, and for 45 nm process, the data points he presented has since been used whenever someone wants to compare accelerator performance. QTools energy consumption on a 45nm process is based on the data published in this work.

  • Examples: Example of how to generate data type map can be found in qkeras/qtools/ examples/example_generate_json.py. Example of how to generate energy consumption estimation can be found in qkeras/qtools/examples/example_get_energy.py

Linting

We use Ruff or linting (Pyflakes, pycodestyle, pyupgrade, isort, Pylint rules). The config lives in pyproject.toml, so you can run it from the project root:

ruff check --fix .

Related Work

QKeras has been implemented based on the work of "B.Moons et al. - Minimum Energy Quantized Neural Networks", Asilomar Conference on Signals, Systems and Computers, 2017 and "Zhou, S. et al. - DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients," but the framework should be easily extensible. The original code from QNN can be found below.

https://github.com/BertMoons/QuantizedNeuralNetworks-Keras-Tensorflow

QKeras extends QNN by providing a richer set of layers (including SeparableConv2D, DepthwiseConv2D, ternary and stochastic ternary quantizations), besides some functions to aid the estimation for the accumulators and conversion between non-quantized to quantized networks. Finally, our main goal is easy of use, so we attempt to make QKeras layers a true drop-in replacement for Keras, so that users can easily exchange non-quantized layers by quantized ones.

Acknowledgements

Portions of QKeras were derived from QNN.

https://github.com/BertMoons/QuantizedNeuralNetworks-Keras-Tensorflow

Copyright (c) 2017, Bert Moons where it applies

Fork notice

This repository is a hard fork of the original QKeras project. The upstream project appears unmaintained, so this fork is independently maintained and not affiliated with the original authors or their organizations. The PyPI distribution is published as qkeras-v3 and the import namespace is qkeras. We aim to keep reasonable compatibility while updating dependencies (Keras/TF) and fixing issues; some breaking changes are documented in the CHANGELOG. Licensed under Apache-2.0. See LICENSE and NOTICE for attribution and details of modifications.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

qkeras_v3-1.1.1.tar.gz (226.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

qkeras_v3-1.1.1-py3-none-any.whl (317.4 kB view details)

Uploaded Python 3

File details

Details for the file qkeras_v3-1.1.1.tar.gz.

File metadata

  • Download URL: qkeras_v3-1.1.1.tar.gz
  • Upload date:
  • Size: 226.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for qkeras_v3-1.1.1.tar.gz
Algorithm Hash digest
SHA256 f252e46ec5cbe0a43f735f0b289a22adf70c7d1b04845d982bb043c19ea363c9
MD5 26487dd2519ee0f93178cd4f6295bef2
BLAKE2b-256 f0546eaec0dc5c0fcda3eed5e0a1f9c12b627bfce53905aca3fb779a710aaf9d

See more details on using hashes here.

Provenance

The following attestation bundles were made for qkeras_v3-1.1.1.tar.gz:

Publisher: python-publish.yml on fastmachinelearning/qkerasV3

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file qkeras_v3-1.1.1-py3-none-any.whl.

File metadata

  • Download URL: qkeras_v3-1.1.1-py3-none-any.whl
  • Upload date:
  • Size: 317.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for qkeras_v3-1.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 6a99a8ba422c7df1e2a318014e8299d7d29a36d00ac403c8698907b82534a720
MD5 3532288d83e32e4e5f0413b0aa3ff7ce
BLAKE2b-256 48deb56753703b56d32a971893bcf370ee4b54d5681a249a30e0c7bf1cb070df

See more details on using hashes here.

Provenance

The following attestation bundles were made for qkeras_v3-1.1.1-py3-none-any.whl:

Publisher: python-publish.yml on fastmachinelearning/qkerasV3

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page