Skip to main content

Multithreaded SIMD int8 and int4 quantization kernels.

Project description

pi-quant: Prime Intellect Fast Quantization Library

logo.png

Overview

Fast, multithreaded CPU quantization kernels with various rounding modes, outperforming PyTorch’s built-in quantization routines by more than 2 times on all tested hardware. The kernels are optimized with SIMD intrinsics for different CPU architectures, including AMD64 (SSE4.2, AVX2, AVX512F) and ARM64 (Neon). The most optimal kernel is selected at runtime using runtime CPU detection.

What is Quantization?

Quantization is the process of mapping continuous values into a finite, discrete set of values. In machine learning and signal processing, it is commonly used to reduce the precision of numerical data, lowering memory usage and improving computational efficiency while maintaining acceptable accuracy.

Features

✅ Parallel De/Quantization: Efficiently quantizes and de-quantizes data using multiple threads.

✅ Rich Datatype Support: Provides f32, f64 ↔ (u)int8/16/32/64.

✅ Modern Python API: Use the library from Python with PyTorch, numpy or standalone.

✅ Architecture-Specific Optimizations: The kernels are optimized with SIMD intrinsics for different CPU architectures, including AMD64 (SSE4.2, AVX2, AVX512F) and ARM64 (Neon).

✅ Thread Pool: Reuses threads for minimal overhead.

✅ Flexible Rounding Modes: Supports both nearest and stochastic rounding modes.

✅ C99 API: Provides a C99 API for C projects or foreign language bindings (see quant.h).

✅ Store Operators: Supports multiple store modes (SET, ADD) during dequantization — useful for ring-reduction operations.

✅ Quantization Parameters: Efficient SIMD-parallel computation of quantization scale and zero point from input data.

Benchmarks

Benchmark

The benchmarks were run on a variety of hardware. We benchmark against PyTorch’s torch.quantize_per_tensor and **torch.ao.quantization.fx._decomposed.quantize_per_tensor**. Each benchmark quantized float32 to uint8 across 1000 runs. The number of elements and other details can be seen in the benchmark code.

Benchmark 1 (AMD EPYC 9654, 360 vCPUs)

1000 runs with numel 27264000
CPU: AMD EPYC 9654 96-Core Processor, Runtime: AVX512-F
Memory: 1485 GB
Linux: 6.8.0-57-generic

bench1.png Torch FX Quant refers to torch.ao.quantization.fx._decomposed.quantize_per_tensor, Torch Builtin Quant to **torch.quantize_per_tensor** and Fast Quant to pi-quant’s piquant.quantize_torch.

Benchmark 2 (AMD EPYC 7742, 128 vCPUs)

1000 runs with numel 27264000
CPU: AMD EPYC 7742 64-Core Processor, Runtime: AVX2
Memory: 528 GB
Linux: 6.8.0-1023-nvidia
bench2.png

Benchmark 3 (Apple M3 Pro)

1000 runs with numel 27264000
CPU: Apple M3 Pro, Runtime: Neon
Memory: 18 GB
OSX: 15.4 (24E248)
bench3.png

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

pypiquant-0.4.3-cp313-cp313-musllinux_1_2_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.13musllinux: musl 1.2+ x86-64

pypiquant-0.4.3-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (173.3 kB view details)

Uploaded CPython 3.13manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

pypiquant-0.4.3-cp312-cp312-musllinux_1_2_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.12musllinux: musl 1.2+ x86-64

pypiquant-0.4.3-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (173.3 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

pypiquant-0.4.3-cp311-cp311-musllinux_1_2_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.11musllinux: musl 1.2+ x86-64

pypiquant-0.4.3-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (173.3 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

pypiquant-0.4.3-cp310-cp310-musllinux_1_2_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.10musllinux: musl 1.2+ x86-64

pypiquant-0.4.3-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (173.3 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

pypiquant-0.4.3-cp39-cp39-musllinux_1_2_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.9musllinux: musl 1.2+ x86-64

pypiquant-0.4.3-cp39-cp39-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (173.3 kB view details)

Uploaded CPython 3.9manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

pypiquant-0.4.3-cp38-cp38-musllinux_1_2_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.8musllinux: musl 1.2+ x86-64

pypiquant-0.4.3-cp38-cp38-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (173.3 kB view details)

Uploaded CPython 3.8manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

File details

Details for the file pypiquant-0.4.3-cp313-cp313-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for pypiquant-0.4.3-cp313-cp313-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 5651e27d27ed39688fe7d9b507ac42d159db7cb7594407dc22ac7b4c2f27254d
MD5 c3275f38a92c238f2a8451269c4c3233
BLAKE2b-256 c514492c29b0fcdb9efd8714ca42b99f2de105745b13122310613d0e3b848005

See more details on using hashes here.

File details

Details for the file pypiquant-0.4.3-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for pypiquant-0.4.3-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 dc91933180042716eee23051ead773e701dc5b90ccdd54b7f1bd3f16beac5841
MD5 2c877b503a4e0a905be44f09878e6764
BLAKE2b-256 0a592f519580e5a969b85639d8d8b6327374f92d3fc45eee5e95c94d7b72c10d

See more details on using hashes here.

File details

Details for the file pypiquant-0.4.3-cp312-cp312-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for pypiquant-0.4.3-cp312-cp312-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 05b0576bc52c169a20406e553a1d06bfc4f25a48f83fb5559311be1d146eca6e
MD5 046d2bedd50c1505a95365d4a03d0194
BLAKE2b-256 b9e9a798a21d6fa1b1fd3c6b370d7d5eaabdeef064e5f72f1bcdbcd7b02ffde8

See more details on using hashes here.

File details

Details for the file pypiquant-0.4.3-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for pypiquant-0.4.3-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 2eb4a750bd804294a971c16bc394f9a6211ec845a7a7b72767cc733b03a37908
MD5 2f8bbc52c661aca96bddc5a1d97ec160
BLAKE2b-256 71f1afcb12cfe45b3c652b274b52758d91a3efda1154e3f9be789cb2b4b1e696

See more details on using hashes here.

File details

Details for the file pypiquant-0.4.3-cp311-cp311-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for pypiquant-0.4.3-cp311-cp311-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 01bbe672761259e8af01ac69e9e40103bcae7dd546b3fde471135174279324dc
MD5 f61d9570fe81c82d9c2c764722937f29
BLAKE2b-256 03372e600cd7ced7dd16e2bbcc2892fc9fa6d48f6d74111baa6a55900a9b155a

See more details on using hashes here.

File details

Details for the file pypiquant-0.4.3-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for pypiquant-0.4.3-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 12c51ce3cb2aeec9b3bcb2a970e639c953d4db1dcd7a5aae02ff05cd2ca21a73
MD5 a6b174f87df191a9dc7d00a59ef61204
BLAKE2b-256 8caaa3905cbca68f17d2d2ab5792aa983baa25874d9ccf8d0080d09384e44725

See more details on using hashes here.

File details

Details for the file pypiquant-0.4.3-cp310-cp310-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for pypiquant-0.4.3-cp310-cp310-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 ae080ffd352686de0ee23a7e45dd127d862d647496a18a9ba32f5c35d78519a1
MD5 d01f395f8d81cc9880ecfbf85925c98d
BLAKE2b-256 2c2e096bbb76a015311ff962f86145bd0b6622089bcb485597207ddc701e5be2

See more details on using hashes here.

File details

Details for the file pypiquant-0.4.3-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for pypiquant-0.4.3-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 078824d7bab235397ec73f75dc8cb9e89b289e79d195c7744c7840159ec295f0
MD5 f12c62c2ffde3b789cef03b982025ef6
BLAKE2b-256 da3f042f4e5bb7794d08afc9491b470699f962fb83d2885dca0e09e962b64a4c

See more details on using hashes here.

File details

Details for the file pypiquant-0.4.3-cp39-cp39-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for pypiquant-0.4.3-cp39-cp39-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 7bf8edf43ef08000ce571e5897fef8f9c316b04e20f11306ff7e20146d58b0b8
MD5 41e4452cccb1ae1b0e404cf350b54c0e
BLAKE2b-256 3b297a3cc12ff07d106e0445bc0871789dc9f514b2f6a52b886c2a43491ebc85

See more details on using hashes here.

File details

Details for the file pypiquant-0.4.3-cp39-cp39-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for pypiquant-0.4.3-cp39-cp39-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 2b1a8181b12d5ca40ee9a4165965a06b7dd34b5c153821b812c9af132c2a1cac
MD5 9a186ced2244958a37588ae4000595a9
BLAKE2b-256 39f8a32b610c4186b6446b35113bcb9ea985d305a9282df117ea2c0eef057849

See more details on using hashes here.

File details

Details for the file pypiquant-0.4.3-cp38-cp38-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for pypiquant-0.4.3-cp38-cp38-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 b8ccc4d486127ad2783a0158e8a3d383da5124d82ecf40f59309e350b6e367cf
MD5 707730a041168386087e6ffb1c5faa07
BLAKE2b-256 c84a91197c349c5afca7958026d906047382a0629be53a1ddf7f812edb051bdc

See more details on using hashes here.

File details

Details for the file pypiquant-0.4.3-cp38-cp38-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for pypiquant-0.4.3-cp38-cp38-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 7b3fbad9fbd00041c8797723a776c664eb3feb996e68cc3d56593fa90f4681f8
MD5 7a4daa4d7cbe2fcab1ccad86b09851f3
BLAKE2b-256 f36c26086de121fe157ea1e8861689b9815db037c7cb799d2bf844b94cc97570

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page