Skip to main content

Neural network quantization toolkit for ONNX models

Project description

quantize-rs Python API

Python bindings for quantize-rs, a neural network quantization toolkit for ONNX models.

Installation

pip install quantization-rs

Build from source (requires Rust toolchain and maturin):

pip install maturin
maturin develop --release --features python

API reference

quantize(input_path, output_path, bits=8, per_channel=False)

Weight-based quantization. Loads the model, quantizes all weight tensors, and saves the result in ONNX QDQ format.

Parameters:

Name Type Default Description
input_path str required Path to input ONNX model
output_path str required Path to save quantized model
bits int 8 Bit width: 4 or 8
per_channel bool False Use per-channel quantization (separate scale/zp per output channel)

Example:

import quantize_rs

quantize_rs.quantize("model.onnx", "model_int8.onnx", bits=8)
quantize_rs.quantize("model.onnx", "model_int4.onnx", bits=4, per_channel=True)

quantize_with_calibration(input_path, output_path, ...)

Activation-based calibration quantization. Runs inference on calibration samples to determine optimal quantization ranges per layer, then quantizes using those ranges.

Parameters:

Name Type Default Description
input_path str required Path to input ONNX model
output_path str required Path to save quantized model
calibration_data str or None None Path to .npy file (shape [N, ...]), or None for random samples
bits int 8 Bit width: 4 or 8
per_channel bool False Per-channel quantization
method str "minmax" Calibration method (see below)
num_samples int 100 Number of random samples when calibration_data is None
sample_shape list[int] or None None Shape of random samples; auto-detected from model if None

Calibration methods:

Method Description
"minmax" Uses observed min/max from activations
"percentile" Clips at 99.9th percentile to reduce outlier sensitivity
"entropy" Selects range minimizing KL divergence between original and quantized distributions
"mse" Selects range minimizing mean squared error

Example:

import quantize_rs

# With real calibration data
quantize_rs.quantize_with_calibration(
    "resnet18.onnx",
    "resnet18_int8.onnx",
    calibration_data="calibration_samples.npy",
    method="minmax"
)

# With random samples (auto-detects input shape from model)
quantize_rs.quantize_with_calibration(
    "resnet18.onnx",
    "resnet18_int8.onnx",
    num_samples=100,
    sample_shape=[3, 224, 224],
    method="percentile"
)

model_info(input_path)

Returns metadata about an ONNX model.

Parameters:

Name Type Default Description
input_path str required Path to ONNX model

Returns: ModelInfo object with the following fields:

Field Type Description
name str Graph name
version int Model version
num_nodes int Number of computation nodes
inputs list[str] Input tensor names
outputs list[str] Output tensor names

Example:

info = quantize_rs.model_info("model.onnx")
print(f"Name: {info.name}")
print(f"Nodes: {info.num_nodes}")
print(f"Inputs: {info.inputs}")
print(f"Outputs: {info.outputs}")

Preparing calibration data

For best results, use 50-200 representative samples from your validation or training set:

import numpy as np

# Collect preprocessed samples
samples = []
for img in validation_dataset[:100]:
    preprocessed = preprocess(img)  # your preprocessing pipeline
    samples.append(preprocessed)

# Save as .npy (shape: [num_samples, channels, height, width])
calibration_data = np.stack(samples)
np.save("calibration_samples.npy", calibration_data)

# Use during quantization
quantize_rs.quantize_with_calibration(
    "model.onnx",
    "model_int8.onnx",
    calibration_data="calibration_samples.npy",
    method="minmax"
)

If you do not have calibration data, the function generates random samples. This is adequate for testing but will produce less accurate quantization than real data.

ONNX Runtime integration

Quantized models use the standard DequantizeLinear operator and load directly in ONNX Runtime:

import onnxruntime as ort
import numpy as np

session = ort.InferenceSession("model_int8.onnx")
input_name = session.get_inputs()[0].name
output = session.run(None, {input_name: your_input})

Limitations

  • ONNX format only. Export PyTorch/TensorFlow models to ONNX before quantizing.
  • Requires ONNX opset >= 13 (automatically upgraded if needed).
  • INT4 values are stored as INT8 bytes in the ONNX file (DequantizeLinear requires INT8 input in opsets < 21).
  • All weight tensors are quantized. Per-layer selection is not yet supported.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

quantization_rs-0.5.0.tar.gz (87.4 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

quantization_rs-0.5.0-cp313-cp313-win_amd64.whl (6.9 MB view details)

Uploaded CPython 3.13Windows x86-64

quantization_rs-0.5.0-cp312-cp312-win_amd64.whl (6.4 MB view details)

Uploaded CPython 3.12Windows x86-64

quantization_rs-0.5.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (6.1 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

quantization_rs-0.5.0-cp312-cp312-macosx_11_0_arm64.whl (5.1 MB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

quantization_rs-0.5.0-cp312-cp312-macosx_10_12_x86_64.whl (5.8 MB view details)

Uploaded CPython 3.12macOS 10.12+ x86-64

File details

Details for the file quantization_rs-0.5.0.tar.gz.

File metadata

  • Download URL: quantization_rs-0.5.0.tar.gz
  • Upload date:
  • Size: 87.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.11.5

File hashes

Hashes for quantization_rs-0.5.0.tar.gz
Algorithm Hash digest
SHA256 a80b9c4f33a9c530678dd9a95cfc3318706dad8195b5de8801ea6d386f2da975
MD5 5bb2c3806f40dbcb585a9ea8945919f4
BLAKE2b-256 b3280541d1554c6b190f63757374bae1ebd14b13b98e25a8649bf8df9f52ce5e

See more details on using hashes here.

File details

Details for the file quantization_rs-0.5.0-cp313-cp313-win_amd64.whl.

File metadata

File hashes

Hashes for quantization_rs-0.5.0-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 35db90352d93eaa0a6a53c36e39213d695fbec49551cd5ea3915830a6bd9376d
MD5 5e5dae43d2cc30d2df55542a052f6178
BLAKE2b-256 26f5fde89ad0fc59c1b309a997b40a4a576ca5ebe195142b5e962a28608b9ec5

See more details on using hashes here.

File details

Details for the file quantization_rs-0.5.0-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for quantization_rs-0.5.0-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 fc5feaffdd90e1addb9001b507000ff19f0caa652b68f2f22e2a05d7ce439dcf
MD5 0fd2fc9df03d8402bd30f3e29bdf16d3
BLAKE2b-256 36e28f70535f392798c1dceafde71416178a15b232da8f546e072c13a84deeff

See more details on using hashes here.

File details

Details for the file quantization_rs-0.5.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for quantization_rs-0.5.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 450c805e85f2fc5f4a3aba18af3f29bada8a9158648b809c5ae554896a0dd08c
MD5 78a9b1b7c99586bd832584544a867910
BLAKE2b-256 6cfa11149e0663bee190fea81e151c1ab82a8a2787980489ee5328bf81389d18

See more details on using hashes here.

File details

Details for the file quantization_rs-0.5.0-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for quantization_rs-0.5.0-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 ac1b264f65fecf2b1ac398188a1936e4807d0096f5980afc7966b9f23814c2da
MD5 967e33ad8ddf9f7bed03f5131e64ae5f
BLAKE2b-256 b21b9601fc32bca5ae03e1fa6f94f07cacb7d2c3613edefd5a555db56c444181

See more details on using hashes here.

File details

Details for the file quantization_rs-0.5.0-cp312-cp312-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for quantization_rs-0.5.0-cp312-cp312-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 3377f9268374e38e09398bb5f90d814b0f5e83eab1aa718ef0f43caf23124e8f
MD5 effca194d5896a9d92069cd1df4db6a6
BLAKE2b-256 590ba4d267a7202de98ec250832a7da1d91c337abee7f45c4443178c19da8044

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page