Skip to main content

Multithreaded SIMD int8 and int4 quantization kernels.

Project description

pi-quant: Prime Intellect Fast Quantization Library

logo.png

Overview

Fast, multithreaded CPU quantization kernels with various rounding modes, outperforming PyTorch’s built-in quantization routines by more than 2 times on all tested hardware. The kernels are optimized with SIMD intrinsics for different CPU architectures, including AMD64 (SSE4.2, AVX2, AVX512F) and ARM64 (Neon). The most optimal kernel is selected at runtime using runtime CPU detection.

What is Quantization?

Quantization is the process of mapping continuous values into a finite, discrete set of values. In machine learning and signal processing, it is commonly used to reduce the precision of numerical data, lowering memory usage and improving computational efficiency while maintaining acceptable accuracy.

Features

✅ Parallel De/Quantization: Efficiently quantizes and de-quantizes data using multiple threads.

✅ Rich Datatype Support: Provides f32, f64 ↔ (u)int8/16/32/64.

✅ Modern Python API: Use the library from Python with PyTorch, numpy or standalone.

✅ Architecture-Specific Optimizations: The kernels are optimized with SIMD intrinsics for different CPU architectures, including AMD64 (SSE4.2, AVX2, AVX512F) and ARM64 (Neon).

✅ Thread Pool: Reuses threads for minimal overhead.

✅ Flexible Rounding Modes: Supports both nearest and stochastic rounding modes.

✅ C99 API: Provides a C99 API for C projects or foreign language bindings (see quant.h).

✅ Store Operators: Supports multiple store modes (SET, ADD) during dequantization — useful for ring-reduction operations.

✅ Quantization Parameters: Efficient SIMD-parallel computation of quantization scale and zero point from input data.

Benchmarks

Benchmark

The benchmarks were run on a variety of hardware. We benchmark against PyTorch’s torch.quantize_per_tensor and **torch.ao.quantization.fx._decomposed.quantize_per_tensor**. Each benchmark quantized float32 to uint8 across 1000 runs. The number of elements and other details can be seen in the benchmark code.

Benchmark 1 (AMD EPYC 9654, 360 vCPUs)

1000 runs with numel 27264000
CPU: AMD EPYC 9654 96-Core Processor, Runtime: AVX512-F
Memory: 1485 GB
Linux: 6.8.0-57-generic

bench1.png Torch FX Quant refers to torch.ao.quantization.fx._decomposed.quantize_per_tensor, Torch Builtin Quant to **torch.quantize_per_tensor** and Fast Quant to pi-quant’s piquant.quantize_torch.

Benchmark 2 (AMD EPYC 7742, 128 vCPUs)

1000 runs with numel 27264000
CPU: AMD EPYC 7742 64-Core Processor, Runtime: AVX2
Memory: 528 GB
Linux: 6.8.0-1023-nvidia
bench2.png

Benchmark 3 (Apple M3 Pro)

1000 runs with numel 27264000
CPU: Apple M3 Pro, Runtime: Neon
Memory: 18 GB
OSX: 15.4 (24E248)
bench3.png

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

pypiquant-0.3.2-cp313-cp313-musllinux_1_2_x86_64.whl (1.1 MB view details)

Uploaded CPython 3.13musllinux: musl 1.2+ x86-64

pypiquant-0.3.2-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (105.3 kB view details)

Uploaded CPython 3.13manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

pypiquant-0.3.2-cp312-cp312-musllinux_1_2_x86_64.whl (1.1 MB view details)

Uploaded CPython 3.12musllinux: musl 1.2+ x86-64

pypiquant-0.3.2-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (105.3 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

pypiquant-0.3.2-cp311-cp311-musllinux_1_2_x86_64.whl (1.1 MB view details)

Uploaded CPython 3.11musllinux: musl 1.2+ x86-64

pypiquant-0.3.2-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (105.3 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

pypiquant-0.3.2-cp310-cp310-musllinux_1_2_x86_64.whl (1.1 MB view details)

Uploaded CPython 3.10musllinux: musl 1.2+ x86-64

pypiquant-0.3.2-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (105.3 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

pypiquant-0.3.2-cp39-cp39-musllinux_1_2_x86_64.whl (1.1 MB view details)

Uploaded CPython 3.9musllinux: musl 1.2+ x86-64

pypiquant-0.3.2-cp39-cp39-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (105.3 kB view details)

Uploaded CPython 3.9manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

pypiquant-0.3.2-cp38-cp38-musllinux_1_2_x86_64.whl (1.1 MB view details)

Uploaded CPython 3.8musllinux: musl 1.2+ x86-64

pypiquant-0.3.2-cp38-cp38-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (105.3 kB view details)

Uploaded CPython 3.8manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

File details

Details for the file pypiquant-0.3.2-cp313-cp313-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for pypiquant-0.3.2-cp313-cp313-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 eddaa4ad92392dfa959f8052bae9bf6789d386bcdf7783b32dbe6a43ee6a2dcd
MD5 25468e3eddbe6214f7e0b32434350212
BLAKE2b-256 9af17cd944434a7dedd919afe065818900b7f4d2529933586d9d28fe0186892d

See more details on using hashes here.

File details

Details for the file pypiquant-0.3.2-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for pypiquant-0.3.2-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 dde27a75eb7fa1eb924eafa04d853136c8fbc4f2b70c5b1a00839d154e13e33f
MD5 e665cab4210d1900965b4bebf24e49f7
BLAKE2b-256 eaea1cfaeaec5683f3f2fbf1bf78e9ad840ab03aaf73cee9535a07f4cc799468

See more details on using hashes here.

File details

Details for the file pypiquant-0.3.2-cp312-cp312-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for pypiquant-0.3.2-cp312-cp312-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 d21cc5e385098416e782a665bc8c7b34addb686aa08b41ed9186fff406ddabc2
MD5 5ba87de2a13ce1e4cf85b76f38cf2473
BLAKE2b-256 ff8de0abc9f375558563ce966206bd13e6b31ef24309957e2271e5df220aaa1e

See more details on using hashes here.

File details

Details for the file pypiquant-0.3.2-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for pypiquant-0.3.2-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 031c0832fa06c64d4948209bed3503185a267a0973b564870c19acb21a653544
MD5 ec2c221d8b2fbd10b2017b88f0675498
BLAKE2b-256 0ab8f3eac2ee2c4142f05d0103ef09c942d47cdcf96e546175e6943d6631b63f

See more details on using hashes here.

File details

Details for the file pypiquant-0.3.2-cp311-cp311-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for pypiquant-0.3.2-cp311-cp311-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 9783d58303baf9ad3ec7b7445743417b7d25cf1dc06ac9d9db3705e1e377eb6b
MD5 53226ab4a0994e3bf6e976da90cf538a
BLAKE2b-256 654db56a8743294244792cce5cecd2495f901f5589418c9ca49a76b6fad2f205

See more details on using hashes here.

File details

Details for the file pypiquant-0.3.2-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for pypiquant-0.3.2-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 6204f3e50e05bfa237ce0db26ef2d8e449f8d0206cafb157532bb2025ead6266
MD5 cbc6d7eb46ba22c8b31e35ac54cf59ca
BLAKE2b-256 1e808a68475857c864c9268d938188c6dda184f7ca6aea8a6124099c53f3e822

See more details on using hashes here.

File details

Details for the file pypiquant-0.3.2-cp310-cp310-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for pypiquant-0.3.2-cp310-cp310-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 63fbfbaabcf3c749f0f5a439e3ae52b32a627779263eb15c3053aa61823e2d47
MD5 dd0582caa875e7002b3ed836780595de
BLAKE2b-256 f0f72ef5ca8c55f942ea66f9b21ee212f16c4e37abe61b9e2cedaac6bd2769bc

See more details on using hashes here.

File details

Details for the file pypiquant-0.3.2-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for pypiquant-0.3.2-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 00acddeaaf531a082457a8ca3b2b8cc4ed3ef63829248c08d96ad107ccd62ebe
MD5 a04986d39475851775921dc76b91213f
BLAKE2b-256 eeb7f0a4b222d8a9bced0e709e18c4667229bbaf4a251de1b5b1ee8ed945b791

See more details on using hashes here.

File details

Details for the file pypiquant-0.3.2-cp39-cp39-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for pypiquant-0.3.2-cp39-cp39-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 bc427a4a7bccab531556eb0bc9486f3c1f7352dd8a3013ff8aad2263b3cc35a3
MD5 1a2be11cd9436cf842303d4ca65e8d5e
BLAKE2b-256 ef64baff87b8ff0b40e72b31493ad255225b8754a1df85ad213c670885568e2d

See more details on using hashes here.

File details

Details for the file pypiquant-0.3.2-cp39-cp39-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for pypiquant-0.3.2-cp39-cp39-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 3bda4519a2974f79b58578f07fff70f3e2bfb53cc68b48ca945d0340e436bd2e
MD5 eb24a78c3c5b775378e52596a38e994f
BLAKE2b-256 16c538ccfc9ee5f9df593a63a247faea706158c31360b1ca6150e310b0202e0a

See more details on using hashes here.

File details

Details for the file pypiquant-0.3.2-cp38-cp38-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for pypiquant-0.3.2-cp38-cp38-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 671b0c26e0996aec75befac39df0dc963bbf67a9ca0597a0394fa575593999b4
MD5 21ad649410aa80908e7626f58a87c5b4
BLAKE2b-256 5c233f9e6c7645605154abdf3c8206b55205ff16d4268c25477d5c3c1098c49a

See more details on using hashes here.

File details

Details for the file pypiquant-0.3.2-cp38-cp38-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for pypiquant-0.3.2-cp38-cp38-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 b4dcc3ac71835ef0925aff30d2ece9093f14daed87785afb83abdeff1b283999
MD5 2861e8f421a142da1fe13b08d16cad82
BLAKE2b-256 218142fbab1ff89d24795d7d4a9276601859ea5dad28fa5094e207d413f9083a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page