Quantization package for KerasV3

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

makoeppel

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: Apache Software License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

QKerasV3: Quantized Deep Learning for Keras 3

Installation

pip install qkeras-v3

Introduction

QKeras is a quantization extension to Keras that provides drop-in replacement for some of the Keras layers, especially the ones that creates parameters and activation layers, and perform arithmetic operations, so that we can quickly create a deep quantized version of Keras network.

According to Tensorflow documentation, Keras is a high-level API to build and train deep learning models. It's used for fast prototyping, advanced research, and production, with three key advantages:

User friendly

Keras has a simple, consistent interface optimized for common use cases. It provides clear and actionable feedback for user errors.

Modular and composable

Keras models are made by connecting configurable building blocks together, with few restrictions.

Easy to extend

Write custom building blocks to express new ideas for research. Create new layers, loss functions, and develop state-of-the-art models.

QKeras is being designed to extend the functionality of Keras using Keras' design principle, i.e. being user friendly, modular and extensible, adding to it being "minimally intrusive" of Keras native functionality.

In order to successfully quantize a model, users need to replace variable creating layers (Dense, Conv2D, etc) by their counterparts (QDense, QConv2D, etc), and any layers that perform math operations need to be quantized afterwards.

Layers Implemented in QKerasV3

The following matrix tracks multi-backend framework support for quantization-aware training (QAT) layers in qkerasV3.

Layer Name	TensorFlow	JAX	PyTorch	Implementation Notes & Constraints
`QDense`	✅	✅	✅
`QConv1D`	✅	✅	✅
`QConv2D`	✅	✅	✅
`QDepthwiseConv2D`	✅	✅	✅
`QSeparableConv1D`	✅	✅	✅
`QSeparableConv2D`	✅	✅	✅
`QMobileNetSeparableConv2D`	✅	✅	✅	MobileNet-specific; explicitly quantizes activation values immediately after the depthwise step. TODO: needs a test.
`QConv2DTranspose`	✅	✅	✅
`QActivation`	✅	✅	✅
`QAdaptiveActivation`	✅	✅	✅
`QAveragePooling2D`	✅	✅	⚠️	Combines `AveragePooling2D` with a `QActivation` layer. PyTorch lacks native asymmetric padding (`padding="same"`) for all shapes.
`QBatchNormalization` / `QConv2DBatchnorm`	⚠️	⚠️	⚠️	Experimental Stage: Stochastic activation functions often offset its regularization needs. JAX/Torch rely on Keras 3 epoch variable updates.
`QOctaveConv2D`	✅	⚠️	⚠️	Multi-frequency feature extraction relies on complex tensor splitting and slicing across backends. TODO: needs a test.
`QSimpleRNN` / `QSimpleRNNCell`	✅	✅	✅
`QLSTM` / `QLSTMCell`	✅	✅	✅
`QGRU` / `QGRUCell`	✅	✅	✅
`QBidirectional`	✅	✅	✅

Legend:

✅ Supported: Tested and functions smoothly natively across the backend via Keras 3.
⚠️ Partial / Experimental / Conditional: Functions, but exhibits structural constraints, layout edge cases, or relies on features currently in testing.

It is worth noting that not all functionality is safe at this time to be used with other high-level operations, such as with layer wrappers. For example, Bidirectional layer wrappers are used with RNNs. If this is required, we encourage users to use quantization functions invoked as strings instead of the actual functions as a way through this, but we may change that implementation in the future.

A first attempt to create a safe mechanism in QKeras is the adoption of QActivation is a wrap-up that provides an encapsulation around the activation functions so that we can save and restore the network architecture, and duplicate them using Keras interface, but this interface has not been fully tested yet.

Activation Layers Implemented in QKerasV3

The following matrix tracks multi-backend framework support for quantization activation functions in qkerasV3.

Activation Function	TensorFlow	JAX	PyTorch
`smooth_sigmoid(x)`	✅	✅	✅
`hard_sigmoid(x)`	✅	✅	✅
`binary_sigmoid(x)`	✅	✅	✅
`binary_tanh(x)`	✅	✅	✅
`smooth_tanh(x)`	✅	✅	✅
`hard_tanh(x)`	✅	✅	✅
`quantized_bits(bits=8, integer=0, symmetric=0, keep_negative=1)(x)`	✅	✅	✅
`bernoulli(alpha=1.0)(x)`	✅	✅	✅
`stochastic_ternary(alpha=1.0, threshold=0.33)(x)`	✅	✅	✅
`ternary(alpha=1.0, threshold=0.33)(x)`	✅	✅	✅
`stochastic_binary(alpha=1.0)(x)`	✅	✅	✅
`binary(alpha=1.0)(x)`	✅	✅	✅
`quantized_relu(bits=8, integer=0, use_sigmoid=0, negative_slope=0.0)(x)`	✅	✅	✅
`quantized_ulaw(bits=8, integer=0, symmetric=0, u=255.0)(x)`	✅	✅	✅
`quantized_tanh(bits=8, integer=0, symmetric=0)(x)`	✅	✅	✅
`quantized_po2(bits=8, max_value=-1)(x)`	✅	✅	✅
`quantized_relu_po2(bits=8, max_value=-1)(x)`	✅	✅	✅

Legend:

✅ Supported: Tested and functions smoothly natively across the backend via Keras 3.
⚠️ Partial / Experimental / Conditional: Functions, but exhibits structural constraints, layout edge cases, or relies on features currently in testing.

The stochastic_* functions, bernoulli as well as quantized_relu and quantized_tanh rely on stochastic versions of the activation functions. They draw a random number with uniform distribution from _hard_sigmoid of the input x, and result is based on the expected value of the activation function. Please refer to the papers if you want to understand the underlying theory, or the documentation in qkeras/qlayers.py.

The parameters "bits" specify the number of bits for the quantization, and "integer" specifies how many bits of "bits" are to the left of the decimal point. Finally, our experience in training networks with QSeparableConv2D, both quantized_bits and quantized_tanh that generates values between [-1, 1), required symmetric versions of the range in order to properly converge and eliminate the bias.

Every time we use a quantization for weights and bias that can generate numbers outside the range [-1.0, 1.0], we need to adjust the *_range to the number. For example, if we have a quantized_bits(bits=6, integer=2) in a weight of a layer, we need to set the weight range to 2**2, which is equivalent to Catapult HLS ac_fixed<6, 3, true>. Similarly, for quantization functions that accept an alpha parameter, we need to specify a range of alpha, and for po2 type of quantizers, we need to specify the range of max_value.

Example

Suppose you have the following network.

An example of a very simple network is given below in Keras.

from keras.layers import *

x = x_in = Input(shape)
x = Conv2D(18, (3, 3), name="first_conv2d")(x)
x = Activation("relu")(x)
x = SeparableConv2D(32, (3, 3))(x)
x = Activation("relu")(x)
x = Flatten()(x)
x = Dense(NB_CLASSES)(x)
x = Activation("softmax")(x)

You can easily quantize this network as follows:

from keras.layers import *
from qkeras import *

x = x_in = Input(shape)
x = QConv2D(18, (3, 3),
        kernel_quantizer="stochastic_ternary",
        bias_quantizer="ternary", name="first_conv2d")(x)
x = QActivation("quantized_relu(3)")(x)
x = QSeparableConv2D(32, (3, 3),
        depthwise_quantizer=quantized_bits(4, 0, 1),
        pointwise_quantizer=quantized_bits(3, 0, 1),
        bias_quantizer=quantized_bits(3),
        depthwise_activation=quantized_tanh(6, 2, 1))(x)
x = QActivation("quantized_relu(3)")(x)
x = Flatten()(x)
x = QDense(NB_CLASSES,
        kernel_quantizer=quantized_bits(3),
        bias_quantizer=quantized_bits(3))(x)
x = QActivation("quantized_bits(20, 5)")(x)
x = Activation("softmax")(x)

The last QActivation is advisable if you want to compare results later on. Please find more cases under the directory examples.

QTools

The purpose of QTools is to assist hardware implementation of the quantized model and model energy consumption estimation. QTools has two functions: data type map generation and energy consumption estimation.

Data Type Map Generation: QTools automatically generate the data type map for weights, bias, multiplier, adder, etc. of each layer. The data type map includes operation type, variable size, quantizer type and bits, etc. Input of the QTools is:

a given quantized model;
a list of input quantizers for the model. Output of QTools json file that list the data type map of each layer (stored in qtools_instance._output_dict) Output methods include: qtools_stats_to_json, which is to output the data type map to a json file; qtools_stats_print which is to print out the data type map.

Energy Consumption Estimation: Another function of QTools is to estimate the model energy consumption in Pico Joules (pJ). It provides a tool for QKeras users to quickly estimate energy consumption for memory access and MAC operations in a quantized model derived from QKeras, especially when comparing power consumption of two models running on the same device.

As with any high-level model, it should be used with caution when attempting to estimate the absolute energy consumption of a model for a given technology, or when attempting to compare different technologies.

This tool also provides a measure for model tuning which needs to consider both accuracy and model energy consumption. The energy cost provided by this tool can be integrated into a total loss function which combines energy cost and accuracy.

Energy Model: The best work referenced by the literature on energy consumption was first computed by Horowitz M.: “1.1 computing’s energy problem ( and what we can do about it)”; IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC), 2014

In this work, the author attempted to estimate the energy consumption for accelerators, and for 45 nm process, the data points he presented has since been used whenever someone wants to compare accelerator performance. QTools energy consumption on a 45nm process is based on the data published in this work.

Examples: Example of how to generate data type map can be found in qkeras/qtools/ examples/example_generate_json.py. Example of how to generate energy consumption estimation can be found in qkeras/qtools/examples/example_get_energy.py

Unsupported Keras 3 Layers & Activations

The following features exist in core Keras 3 but currently do not have a quantized equivalent wrapper or implementation in qkerasV3.

MultiHeadAttention / GroupQueryAttention (Layer)
ConvLSTM1D / ConvLSTM2D / ConvLSTM3D (Layer)
LayerNormalization / GroupNormalization / RMSNormalization (Layer)
PReLU / ELU / LeakyReLU (Layer)
AlphaDropout / GaussianNoise / GaussianDropout (Layer)
mish(x) (Activation)
swish(x) / gelu(x) (Activation)
exponential(x) (Activation)
silu(x) (Activation)

Linting

We use Ruff or linting (Pyflakes, pycodestyle, pyupgrade, isort, Pylint rules). The config lives in pyproject.toml, so you can run it from the project root:

ruff check --fix .

Related Work

QKeras has been implemented based on the work of "B.Moons et al. - Minimum Energy Quantized Neural Networks", Asilomar Conference on Signals, Systems and Computers, 2017 and "Zhou, S. et al. - DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients," but the framework should be easily extensible. The original code from QNN can be found below.

https://github.com/BertMoons/QuantizedNeuralNetworks-Keras-Tensorflow

QKeras extends QNN by providing a richer set of layers (including SeparableConv2D, DepthwiseConv2D, ternary and stochastic ternary quantizations), besides some functions to aid the estimation for the accumulators and conversion between non-quantized to quantized networks. Finally, our main goal is easy of use, so we attempt to make QKeras layers a true drop-in replacement for Keras, so that users can easily exchange non-quantized layers by quantized ones.

Publications

Claudionor N. Coelho Jr, Aki Kuusela, Shan Li, Hao Zhuang, Jennifer Ngadiuba, Thea Klaeboe Aarrestad, Vladimir Loncar, Maurizio Pierini, Adrian Alan Pol, Sioni Summers, "Automatic heterogeneous quantization of deep neural networks for low-latency inference on the edge for particle detectors", Nature Machine Intelligence (2021), https://www.nature.com/articles/s42256-021-00356-5
Claudionor N. Coelho Jr., Aki Kuusela, Hao Zhuang, Thea Aarrestad, Vladimir Loncar, Jennifer Ngadiuba, Maurizio Pierini, Sioni Summers, "Ultra Low-latency, Low-area Inference Accelerators using Heterogeneous Deep Quantization with QKeras and hls4ml", http://arxiv.org/abs/2006.10159v1
Erwei Wang, James J. Davis, Daniele Moro, Piotr Zielinski, Claudionor Coelho, Satrajit Chatterjee, Peter Y. K. Cheung, George A. Constantinides, "Enabling Binary Neural Network Training on the Edge", https://arxiv.org/abs/2102.04270

Acknowledgements

Portions of QKeras were derived from QNN.

https://github.com/BertMoons/QuantizedNeuralNetworks-Keras-Tensorflow

Fork notice

This repository is a hard fork of the original QKeras project. The upstream project appears unmaintained, so this fork is independently maintained and not affiliated with the original authors or their organizations. The PyPI distribution is published as qkeras-v3 and the import namespace is qkeras. We aim to keep reasonable compatibility while updating dependencies (Keras/TF) and fixing issues; some breaking changes are documented in the CHANGELOG. Licensed under Apache-2.0. See LICENSE and NOTICE for attribution and details of modifications.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

makoeppel

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: Apache Software License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

This version

1.2.1

Jul 1, 2026

1.2.0

Jun 23, 2026

1.1.1

May 13, 2026

1.1.0

May 13, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

qkeras_v3-1.2.1.tar.gz (228.3 kB view details)

Uploaded Jul 1, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

qkeras_v3-1.2.1-py3-none-any.whl (319.3 kB view details)

Uploaded Jul 1, 2026 Python 3

File details

Details for the file qkeras_v3-1.2.1.tar.gz.

File metadata

Download URL: qkeras_v3-1.2.1.tar.gz
Upload date: Jul 1, 2026
Size: 228.3 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for qkeras_v3-1.2.1.tar.gz
Algorithm	Hash digest
SHA256	`bcf170eb41cf2dfc393fe366e4497a98063c0e914184cae2009eb1bf75898366`
MD5	`169de14fa67d3df1dfa665dbbcb651ce`
BLAKE2b-256	`c3bee480de5eed986a1eaaad5fefe2677c6ae754d2989b8cf9836beb57366ae8`

See more details on using hashes here.

Provenance

The following attestation bundles were made for qkeras_v3-1.2.1.tar.gz:

Publisher: python-publish.yml on fastmachinelearning/qkerasV3

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: qkeras_v3-1.2.1.tar.gz
- Subject digest: bcf170eb41cf2dfc393fe366e4497a98063c0e914184cae2009eb1bf75898366
- Sigstore transparency entry: 2036389309
- Sigstore integration time: Jul 1, 2026
Source repository:
- Permalink: fastmachinelearning/qkerasV3@3fe3bdc231532ba8c72f568431ec3660ad8fbafe
- Branch / Tag: refs/tags/v1.2.1
- Owner: https://github.com/fastmachinelearning
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: python-publish.yml@3fe3bdc231532ba8c72f568431ec3660ad8fbafe
- Trigger Event: release

File details

Details for the file qkeras_v3-1.2.1-py3-none-any.whl.

File metadata

Download URL: qkeras_v3-1.2.1-py3-none-any.whl
Upload date: Jul 1, 2026
Size: 319.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for qkeras_v3-1.2.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`74ddf72c5b2786869ccb291df83db931f4a4091ca0b8b61628ce70e350040a97`
MD5	`801f965405d7e813b5b48179db465be0`
BLAKE2b-256	`687f66d99b550069dd60e8984ffa4df89eed59f97b89e4e0d2d8af60699a72ed`

See more details on using hashes here.

Provenance

The following attestation bundles were made for qkeras_v3-1.2.1-py3-none-any.whl:

Publisher: python-publish.yml on fastmachinelearning/qkerasV3

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: qkeras_v3-1.2.1-py3-none-any.whl
- Subject digest: 74ddf72c5b2786869ccb291df83db931f4a4091ca0b8b61628ce70e350040a97
- Sigstore transparency entry: 2036389628
- Sigstore integration time: Jul 1, 2026
Source repository:
- Permalink: fastmachinelearning/qkerasV3@3fe3bdc231532ba8c72f568431ec3660ad8fbafe
- Branch / Tag: refs/tags/v1.2.1
- Owner: https://github.com/fastmachinelearning
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: python-publish.yml@3fe3bdc231532ba8c72f568431ec3660ad8fbafe
- Trigger Event: release

qkeras-v3 1.2.1

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

QKerasV3: Quantized Deep Learning for Keras 3

Installation

Introduction

Layers Implemented in QKerasV3

Activation Layers Implemented in QKerasV3

Example

QTools

Unsupported Keras 3 Layers & Activations

Linting

Related Work

Publications

Acknowledgements

Fork notice

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance