Skip to main content

Optimal block-scaled FP4 quants (NVFP4, MXFP4)

Project description

Qwantize

Optimal quantization methods for block-scaled formats.

Formats

  • NVFP4 — FP4 E2M1 with FP8 E4M3 scales (block sizes 16, 32)
  • MXFP4 — FP4 E2M1 with UE8M0 (power-of-2) scales (block sizes 16, 32)

Methods

Each format supports multiple scale selection strategies:

Method Description
Naive Standard heuristic: s = snap(amax / Q_MAX)
SSE-Optimal Bounded search minimizing sum of squared quantization error
Hessian-Optimal Bounded search minimizing Hessian-weighted error r^T H r using activations

All methods have both pure-PyTorch (reference) and Triton (GPU-accelerated) implementations.

Install

pip install -e .

Requires PyTorch and Triton (for GPU kernels).

Usage

from qwantize import nvfp4_naive, nvfp4_optimal, nvfp4_dequantize, compute_metrics

# W has shape (..., block_size) where block_size is 16 or 32
W_blocked = W.reshape(M, K // 32, 32)

# Quantize: returns (scales, quants)
scales, quants = nvfp4_optimal(W_blocked, dim=-1)

# Dequantize
W_dq = nvfp4_dequantize(scales, quants, dim=-1)

# Or get dequantized output directly
scales, quants, W_dq = nvfp4_optimal(W_blocked, dim=-1, return_dequant=True)

# Compute metrics: ||Q(W)-W||/||W|| and ||XW_q^T - XW^T||/||XW^T||
metrics = compute_metrics(W, W_dq.reshape(M, K), X)

Triton-accelerated versions:

from qwantize import nvfp4_optimal_triton, nvfp4_optimal_hessian_triton

scales, quants, W_dq = nvfp4_optimal_triton(W_blocked, dim=-1, return_dequant=True)

# Hessian-aware (requires activations X)
scales, quants, W_dq = nvfp4_optimal_hessian_triton(W_blocked, dim=-1, return_dequant=True, X=X)

Benchmarks

Benchmarked on the down_proj weight of the first decoder layer from Qwen3-4B, with activations from WikiText-2 (max_seq_len=512, num_samples=2048).

python bench/full_bench.py

NVFP4 (block size 16)

Method Weight Error Output Error Triton Speedup
Naive 10.05% 6.89% 1.7x
SSE-Optimal 8.74% 6.04% 7.0x
H-Optimal 9.35% 5.31% 1.8x

MXFP4 (block size 16)

Method Weight Error Output Error Triton Speedup
Naive 11.77% 8.48% 1.7x
SSE-Optimal 11.02% 7.67% 33x
H-Optimal 11.10% 7.62%

Documentation

Full documentation: qwantize.readthedocs.io

Build locally:

pip install -r docs/requirements.txt
cd docs && make html

Contact

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

qwantize-0.1.0.tar.gz (15.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

qwantize-0.1.0-py3-none-any.whl (21.7 kB view details)

Uploaded Python 3

File details

Details for the file qwantize-0.1.0.tar.gz.

File metadata

  • Download URL: qwantize-0.1.0.tar.gz
  • Upload date:
  • Size: 15.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.3.2 CPython/3.13.11 Linux/6.12.74-gentoo-x86_64

File hashes

Hashes for qwantize-0.1.0.tar.gz
Algorithm Hash digest
SHA256 8c5b36e4a56abef13756615e9b0536bb47fab81204899b6159c73e4a0ef4b5c1
MD5 b22cde690dfb0a63aa3e566965c407a7
BLAKE2b-256 e87e7a34807f898f33d0413a250523554d35bfdbaf4543743639d32a652ec2e3

See more details on using hashes here.

File details

Details for the file qwantize-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: qwantize-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 21.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.3.2 CPython/3.13.11 Linux/6.12.74-gentoo-x86_64

File hashes

Hashes for qwantize-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 6ebe0626d58d384d27ca07b5443a1e81b73a7a5ba16ffd1f60b87a8a7a84e5c9
MD5 c58fd254a8c6b3f049a2eba73fa01418
BLAKE2b-256 8a43862a845253b4a441e41f448bb6db025ab1a81023b67b1de06fc244c2d1c0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page