Optimal block-scaled FP4 quants (NVFP4, MXFP4)

These details have not been verified by PyPI

Project description

Qwantize

Optimal quantization methods for block-scaled formats.

Formats

NVFP4 — FP4 E2M1 with FP8 E4M3 scales (block sizes 16, 32)
MXFP4 — FP4 E2M1 with UE8M0 (power-of-2) scales (block sizes 16, 32)

Methods

Each format supports multiple scale selection strategies:

Method	Description
Naive	Standard heuristic: `s = snap(amax / Q_MAX)`
SSE-Optimal	Bounded search minimizing sum of squared quantization error
Hessian-Optimal	Bounded search minimizing Hessian-weighted error `r^T H r` using activations

All methods have both pure-PyTorch (reference) and Triton (GPU-accelerated) implementations.

Install

pip install qwantize

Requires PyTorch (>=2.0) and Triton (>=3.0).

Usage

from qwantize import nvfp4_naive, nvfp4_optimal, nvfp4_dequantize, compute_metrics

# W has shape (..., block_size) where block_size is 16 or 32
W_blocked = W.reshape(M, K // 32, 32)

# Quantize: returns (scales, quants)
scales, quants = nvfp4_optimal(W_blocked, dim=-1)

# Dequantize
W_dq = nvfp4_dequantize(scales, quants, dim=-1)

# Or get dequantized output directly
scales, quants, W_dq = nvfp4_optimal(W_blocked, dim=-1, return_dequant=True)

# Compute metrics: ||Q(W)-W||/||W|| and ||XW_q^T - XW^T||/||XW^T||
metrics = compute_metrics(W, W_dq.reshape(M, K), X)

Triton-accelerated versions:

from qwantize import nvfp4_optimal_triton, nvfp4_optimal_hessian_triton

scales, quants, W_dq = nvfp4_optimal_triton(W_blocked, dim=-1, return_dequant=True)

# Hessian-aware (requires activations X)
scales, quants, W_dq = nvfp4_optimal_hessian_triton(W_blocked, dim=-1, return_dequant=True, X=X)

Benchmarks

Benchmarked on the down_proj weight of the first decoder layer from Qwen3-4B, with activations from WikiText-2 (max_seq_len=512, num_samples=2048).

python bench/full_bench.py

NVFP4 (block size 16)

Method	Weight Error	Output Error	Triton Speedup
Naive	10.05%	6.89%	1.7x
SSE-Optimal	8.74%	6.04%	7.0x
H-Optimal	9.35%	5.31%	1.8x

MXFP4 (block size 16)

Method	Weight Error	Output Error	Triton Speedup
Naive	11.77%	8.48%	1.7x
SSE-Optimal	11.02%	7.67%	33x
H-Optimal	11.10%	7.62%	—

Documentation

Full documentation: qwantize.readthedocs.io

Build locally:

pip install -r docs/requirements.txt
cd docs && make html

Contact

Author: Ayoub Ghriss, research@ayghri.me

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.1

Mar 5, 2026

0.1.0

Mar 5, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

qwantize-0.1.1.tar.gz (15.1 kB view details)

Uploaded Mar 5, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

qwantize-0.1.1-py3-none-any.whl (21.7 kB view details)

Uploaded Mar 5, 2026 Python 3

File details

Details for the file qwantize-0.1.1.tar.gz.

File metadata

Download URL: qwantize-0.1.1.tar.gz
Upload date: Mar 5, 2026
Size: 15.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.3.2 CPython/3.13.11 Linux/6.12.74-gentoo-x86_64

File hashes

Hashes for qwantize-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`ef1de56020bc042e9c1a4fd25df046870584f3d5ab20e3e71fdfdc7ff035f712`
MD5	`729ec6673a7873478188c71f38716ea8`
BLAKE2b-256	`b6cbad91163ed7bbfe313c994779b246a2ae7dbfc66ccd63c7016bdfb7fadf6c`

See more details on using hashes here.

File details

Details for the file qwantize-0.1.1-py3-none-any.whl.

File metadata

Download URL: qwantize-0.1.1-py3-none-any.whl
Upload date: Mar 5, 2026
Size: 21.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.3.2 CPython/3.13.11 Linux/6.12.74-gentoo-x86_64

File hashes

Hashes for qwantize-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`1e9676a5a1322b1fbdc70990598a65d242ce27050c16324cf82c2389adac27c6`
MD5	`36c848e556f40aa2a4f7c7100ad430b0`
BLAKE2b-256	`7bf9a9af2ca40f3260250ff0c16cc7015d5ef770a6ec0ba14e6d2fd04225fd19`

See more details on using hashes here.

qwantize 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

Qwantize

Formats

Methods

Install

Usage

Benchmarks

NVFP4 (block size 16)

MXFP4 (block size 16)

Documentation

Contact

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes