Multithreaded SIMD int8 and int4 quantization kernels.
Project description
pi-quant: Prime Intellect Fast Quantization Library
Overview
Fast, multithreaded CPU quantization kernels with various rounding modes, outperforming PyTorch’s built-in quantization routines by more than 2 times on all tested hardware. The kernels are optimized with SIMD intrinsics for different CPU architectures, including AMD64 (SSE4.2, AVX2, AVX512F) and ARM64 (Neon). The most optimal kernel is selected at runtime using runtime CPU detection.
What is Quantization?
Quantization is the process of mapping continuous values into a finite, discrete set of values. In machine learning and signal processing, it is commonly used to reduce the precision of numerical data, lowering memory usage and improving computational efficiency while maintaining acceptable accuracy.
Features
✅ Parallel De/Quantization: Efficiently quantizes and de-quantizes data using multiple threads.
✅ Rich Datatype Support: Provides f32, f64 ↔ (u)int8/16/32/64.
✅ Modern Python API: Use the library from Python with PyTorch, numpy or standalone.
✅ Architecture-Specific Optimizations: The kernels are optimized with SIMD intrinsics for different CPU architectures, including AMD64 (SSE4.2, AVX2, AVX512F) and ARM64 (Neon).
✅ Thread Pool: Reuses threads for minimal overhead.
✅ Flexible Rounding Modes: Supports both nearest and stochastic rounding modes.
✅ C99 API: Provides a C99 API for C projects or foreign language bindings (see quant.h).
✅ Store Operators: Supports multiple store modes (SET, ADD) during dequantization — useful for ring-reduction operations.
✅ Quantization Parameters: Efficient SIMD-parallel computation of quantization scale and zero point from input data.
Benchmarks
Benchmark
The benchmarks were run on a variety of hardware. We benchmark against PyTorch’s torch.quantize_per_tensor and **torch.ao.quantization.fx._decomposed.quantize_per_tensor**. Each benchmark quantized float32 to uint8 across 1000 runs. The number of elements and other details can be seen in the benchmark code.
Benchmark 1 (AMD EPYC 9654, 360 vCPUs)
1000 runs with numel 27264000
CPU: AMD EPYC 9654 96-Core Processor, Runtime: AVX512-F
Memory: 1485 GB
Linux: 6.8.0-57-generic
Torch FX Quant refers to torch.ao.quantization.fx._decomposed.quantize_per_tensor,
Torch Builtin Quant to **torch.quantize_per_tensor** and Fast Quant to pi-quant’s piquant.quantize_torch.
Benchmark 2 (AMD EPYC 7742, 128 vCPUs)
1000 runs with numel 27264000
CPU: AMD EPYC 7742 64-Core Processor, Runtime: AVX2
Memory: 528 GB
Linux: 6.8.0-1023-nvidia
Benchmark 3 (Apple M3 Pro)
1000 runs with numel 27264000
CPU: Apple M3 Pro, Runtime: Neon
Memory: 18 GB
OSX: 15.4 (24E248)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pypiquant-0.4.3-cp313-cp313-musllinux_1_2_x86_64.whl.
File metadata
- Download URL: pypiquant-0.4.3-cp313-cp313-musllinux_1_2_x86_64.whl
- Upload date:
- Size: 1.2 MB
- Tags: CPython 3.13, musllinux: musl 1.2+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.9.23
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5651e27d27ed39688fe7d9b507ac42d159db7cb7594407dc22ac7b4c2f27254d
|
|
| MD5 |
c3275f38a92c238f2a8451269c4c3233
|
|
| BLAKE2b-256 |
c514492c29b0fcdb9efd8714ca42b99f2de105745b13122310613d0e3b848005
|
File details
Details for the file pypiquant-0.4.3-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.
File metadata
- Download URL: pypiquant-0.4.3-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
- Upload date:
- Size: 173.3 kB
- Tags: CPython 3.13, manylinux: glibc 2.27+ x86-64, manylinux: glibc 2.28+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.9.23
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
dc91933180042716eee23051ead773e701dc5b90ccdd54b7f1bd3f16beac5841
|
|
| MD5 |
2c877b503a4e0a905be44f09878e6764
|
|
| BLAKE2b-256 |
0a592f519580e5a969b85639d8d8b6327374f92d3fc45eee5e95c94d7b72c10d
|
File details
Details for the file pypiquant-0.4.3-cp312-cp312-musllinux_1_2_x86_64.whl.
File metadata
- Download URL: pypiquant-0.4.3-cp312-cp312-musllinux_1_2_x86_64.whl
- Upload date:
- Size: 1.2 MB
- Tags: CPython 3.12, musllinux: musl 1.2+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.9.23
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
05b0576bc52c169a20406e553a1d06bfc4f25a48f83fb5559311be1d146eca6e
|
|
| MD5 |
046d2bedd50c1505a95365d4a03d0194
|
|
| BLAKE2b-256 |
b9e9a798a21d6fa1b1fd3c6b370d7d5eaabdeef064e5f72f1bcdbcd7b02ffde8
|
File details
Details for the file pypiquant-0.4.3-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.
File metadata
- Download URL: pypiquant-0.4.3-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
- Upload date:
- Size: 173.3 kB
- Tags: CPython 3.12, manylinux: glibc 2.27+ x86-64, manylinux: glibc 2.28+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.9.23
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2eb4a750bd804294a971c16bc394f9a6211ec845a7a7b72767cc733b03a37908
|
|
| MD5 |
2f8bbc52c661aca96bddc5a1d97ec160
|
|
| BLAKE2b-256 |
71f1afcb12cfe45b3c652b274b52758d91a3efda1154e3f9be789cb2b4b1e696
|
File details
Details for the file pypiquant-0.4.3-cp311-cp311-musllinux_1_2_x86_64.whl.
File metadata
- Download URL: pypiquant-0.4.3-cp311-cp311-musllinux_1_2_x86_64.whl
- Upload date:
- Size: 1.2 MB
- Tags: CPython 3.11, musllinux: musl 1.2+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.9.23
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
01bbe672761259e8af01ac69e9e40103bcae7dd546b3fde471135174279324dc
|
|
| MD5 |
f61d9570fe81c82d9c2c764722937f29
|
|
| BLAKE2b-256 |
03372e600cd7ced7dd16e2bbcc2892fc9fa6d48f6d74111baa6a55900a9b155a
|
File details
Details for the file pypiquant-0.4.3-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.
File metadata
- Download URL: pypiquant-0.4.3-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
- Upload date:
- Size: 173.3 kB
- Tags: CPython 3.11, manylinux: glibc 2.27+ x86-64, manylinux: glibc 2.28+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.9.23
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
12c51ce3cb2aeec9b3bcb2a970e639c953d4db1dcd7a5aae02ff05cd2ca21a73
|
|
| MD5 |
a6b174f87df191a9dc7d00a59ef61204
|
|
| BLAKE2b-256 |
8caaa3905cbca68f17d2d2ab5792aa983baa25874d9ccf8d0080d09384e44725
|
File details
Details for the file pypiquant-0.4.3-cp310-cp310-musllinux_1_2_x86_64.whl.
File metadata
- Download URL: pypiquant-0.4.3-cp310-cp310-musllinux_1_2_x86_64.whl
- Upload date:
- Size: 1.2 MB
- Tags: CPython 3.10, musllinux: musl 1.2+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.9.23
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ae080ffd352686de0ee23a7e45dd127d862d647496a18a9ba32f5c35d78519a1
|
|
| MD5 |
d01f395f8d81cc9880ecfbf85925c98d
|
|
| BLAKE2b-256 |
2c2e096bbb76a015311ff962f86145bd0b6622089bcb485597207ddc701e5be2
|
File details
Details for the file pypiquant-0.4.3-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.
File metadata
- Download URL: pypiquant-0.4.3-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
- Upload date:
- Size: 173.3 kB
- Tags: CPython 3.10, manylinux: glibc 2.27+ x86-64, manylinux: glibc 2.28+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.9.23
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
078824d7bab235397ec73f75dc8cb9e89b289e79d195c7744c7840159ec295f0
|
|
| MD5 |
f12c62c2ffde3b789cef03b982025ef6
|
|
| BLAKE2b-256 |
da3f042f4e5bb7794d08afc9491b470699f962fb83d2885dca0e09e962b64a4c
|
File details
Details for the file pypiquant-0.4.3-cp39-cp39-musllinux_1_2_x86_64.whl.
File metadata
- Download URL: pypiquant-0.4.3-cp39-cp39-musllinux_1_2_x86_64.whl
- Upload date:
- Size: 1.2 MB
- Tags: CPython 3.9, musllinux: musl 1.2+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.9.23
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7bf8edf43ef08000ce571e5897fef8f9c316b04e20f11306ff7e20146d58b0b8
|
|
| MD5 |
41e4452cccb1ae1b0e404cf350b54c0e
|
|
| BLAKE2b-256 |
3b297a3cc12ff07d106e0445bc0871789dc9f514b2f6a52b886c2a43491ebc85
|
File details
Details for the file pypiquant-0.4.3-cp39-cp39-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.
File metadata
- Download URL: pypiquant-0.4.3-cp39-cp39-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
- Upload date:
- Size: 173.3 kB
- Tags: CPython 3.9, manylinux: glibc 2.27+ x86-64, manylinux: glibc 2.28+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.9.23
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2b1a8181b12d5ca40ee9a4165965a06b7dd34b5c153821b812c9af132c2a1cac
|
|
| MD5 |
9a186ced2244958a37588ae4000595a9
|
|
| BLAKE2b-256 |
39f8a32b610c4186b6446b35113bcb9ea985d305a9282df117ea2c0eef057849
|
File details
Details for the file pypiquant-0.4.3-cp38-cp38-musllinux_1_2_x86_64.whl.
File metadata
- Download URL: pypiquant-0.4.3-cp38-cp38-musllinux_1_2_x86_64.whl
- Upload date:
- Size: 1.2 MB
- Tags: CPython 3.8, musllinux: musl 1.2+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.9.23
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b8ccc4d486127ad2783a0158e8a3d383da5124d82ecf40f59309e350b6e367cf
|
|
| MD5 |
707730a041168386087e6ffb1c5faa07
|
|
| BLAKE2b-256 |
c84a91197c349c5afca7958026d906047382a0629be53a1ddf7f812edb051bdc
|
File details
Details for the file pypiquant-0.4.3-cp38-cp38-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.
File metadata
- Download URL: pypiquant-0.4.3-cp38-cp38-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
- Upload date:
- Size: 173.3 kB
- Tags: CPython 3.8, manylinux: glibc 2.27+ x86-64, manylinux: glibc 2.28+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.9.23
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7b3fbad9fbd00041c8797723a776c664eb3feb996e68cc3d56593fa90f4681f8
|
|
| MD5 |
7a4daa4d7cbe2fcab1ccad86b09851f3
|
|
| BLAKE2b-256 |
f36c26086de121fe157ea1e8861689b9815db037c7cb799d2bf844b94cc97570
|