Skip to main content

Multithreaded SIMD int8 and int4 quantization kernels.

Project description

pi-quant: Prime Intellect Fast Quantization Library

logo.png

Overview

Fast, multithreaded CPU quantization kernels with various rounding modes, outperforming PyTorch’s built-in quantization routines by more than 2 times on all tested hardware. The kernels are optimized with SIMD intrinsics for different CPU architectures, including AMD64 (SSE4.2, AVX2, AVX512F) and ARM64 (Neon). The most optimal kernel is selected at runtime using runtime CPU detection.

What is Quantization?

Quantization is the process of mapping continuous values into a finite, discrete set of values. In machine learning and signal processing, it is commonly used to reduce the precision of numerical data, lowering memory usage and improving computational efficiency while maintaining acceptable accuracy.

Features

✅ Parallel De/Quantization: Efficiently quantizes and de-quantizes data using multiple threads.

✅ Rich Datatype Support: Provides f32, f64 ↔ (u)int8/16/32/64.

✅ Modern Python API: Use the library from Python with PyTorch, numpy or standalone.

✅ Architecture-Specific Optimizations: The kernels are optimized with SIMD intrinsics for different CPU architectures, including AMD64 (SSE4.2, AVX2, AVX512F) and ARM64 (Neon).

✅ Thread Pool: Reuses threads for minimal overhead.

✅ Flexible Rounding Modes: Supports both nearest and stochastic rounding modes.

✅ C99 API: Provides a C99 API for C projects or foreign language bindings (see quant.h).

✅ Store Operators: Supports multiple store modes (SET, ADD) during dequantization — useful for ring-reduction operations.

✅ Quantization Parameters: Efficient SIMD-parallel computation of quantization scale and zero point from input data.

Benchmarks

Benchmark

The benchmarks were run on a variety of hardware. We benchmark against PyTorch’s torch.quantize_per_tensor and **torch.ao.quantization.fx._decomposed.quantize_per_tensor**. Each benchmark quantized float32 to uint8 across 1000 runs. The number of elements and other details can be seen in the benchmark code.

Benchmark 1 (AMD EPYC 9654, 360 vCPUs)

1000 runs with numel 27264000
CPU: AMD EPYC 9654 96-Core Processor, Runtime: AVX512-F
Memory: 1485 GB
Linux: 6.8.0-57-generic

bench1.png Torch FX Quant refers to torch.ao.quantization.fx._decomposed.quantize_per_tensor, Torch Builtin Quant to **torch.quantize_per_tensor** and Fast Quant to pi-quant’s piquant.quantize_torch.

Benchmark 2 (AMD EPYC 7742, 128 vCPUs)

1000 runs with numel 27264000
CPU: AMD EPYC 7742 64-Core Processor, Runtime: AVX2
Memory: 528 GB
Linux: 6.8.0-1023-nvidia
bench2.png

Benchmark 3 (Apple M3 Pro)

1000 runs with numel 27264000
CPU: Apple M3 Pro, Runtime: Neon
Memory: 18 GB
OSX: 15.4 (24E248)
bench3.png

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

pypiquant-0.3.3-cp313-cp313-musllinux_1_2_x86_64.whl (1.1 MB view details)

Uploaded CPython 3.13musllinux: musl 1.2+ x86-64

pypiquant-0.3.3-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (105.3 kB view details)

Uploaded CPython 3.13manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

pypiquant-0.3.3-cp312-cp312-musllinux_1_2_x86_64.whl (1.1 MB view details)

Uploaded CPython 3.12musllinux: musl 1.2+ x86-64

pypiquant-0.3.3-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (105.3 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

pypiquant-0.3.3-cp311-cp311-musllinux_1_2_x86_64.whl (1.1 MB view details)

Uploaded CPython 3.11musllinux: musl 1.2+ x86-64

pypiquant-0.3.3-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (105.3 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

pypiquant-0.3.3-cp310-cp310-musllinux_1_2_x86_64.whl (1.1 MB view details)

Uploaded CPython 3.10musllinux: musl 1.2+ x86-64

pypiquant-0.3.3-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (105.3 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

pypiquant-0.3.3-cp39-cp39-musllinux_1_2_x86_64.whl (1.1 MB view details)

Uploaded CPython 3.9musllinux: musl 1.2+ x86-64

pypiquant-0.3.3-cp39-cp39-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (105.3 kB view details)

Uploaded CPython 3.9manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

pypiquant-0.3.3-cp38-cp38-musllinux_1_2_x86_64.whl (1.1 MB view details)

Uploaded CPython 3.8musllinux: musl 1.2+ x86-64

pypiquant-0.3.3-cp38-cp38-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (105.3 kB view details)

Uploaded CPython 3.8manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

File details

Details for the file pypiquant-0.3.3-cp313-cp313-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for pypiquant-0.3.3-cp313-cp313-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 eae721bc103ab90be90a941790f8b5ba87315f2ae4ea235254bb05dd5bcf91d8
MD5 9052b9196b039c2f3120f9a837260427
BLAKE2b-256 bdd125fcadab2539c9a332c66a45c161564b272feff68d30acee22b199b12fa8

See more details on using hashes here.

File details

Details for the file pypiquant-0.3.3-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for pypiquant-0.3.3-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 ff047ae1a1eb02109a1cbea01e24567ef342718110d3ce5a822e87f4725054e8
MD5 ceb70e18839d7652d747a4a195587468
BLAKE2b-256 31d76bb812f6a490a25a0e13214b4157fd69727d77fed1d17ac2ba6e65f0cd83

See more details on using hashes here.

File details

Details for the file pypiquant-0.3.3-cp312-cp312-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for pypiquant-0.3.3-cp312-cp312-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 60905c3db216132a970552dc22c2c73ce9842bf84cc3266976515d654f48fdd1
MD5 112885a10fd62245a76ccff20356ee4c
BLAKE2b-256 7020861a0dacd9a49a6c16abda91ecb5d8fc2f20a738820f72196ad25ab01a90

See more details on using hashes here.

File details

Details for the file pypiquant-0.3.3-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for pypiquant-0.3.3-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 c7d1380cd0fbde25036b4c8273f9ee937033c9f7e677e75cc1a4f7cbe6357d0e
MD5 94a72293594982387574f6ac80adbb3b
BLAKE2b-256 2a85262e197bfb70364fa46c62f140d91aab9e308313818005634974b2d9db62

See more details on using hashes here.

File details

Details for the file pypiquant-0.3.3-cp311-cp311-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for pypiquant-0.3.3-cp311-cp311-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 de58e36a6ea6db28215f933d8295c71a0099203577c859446ecb6665799064c2
MD5 820afb4ad0a2132daf4dea1437b90f25
BLAKE2b-256 b68894811d807a4eb0075b35c3aa9342e2d01a48a45569117d85f02287627e38

See more details on using hashes here.

File details

Details for the file pypiquant-0.3.3-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for pypiquant-0.3.3-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 5f0baebf6c05fabd2cd13bfa1151d2b93e6288a39489f0edf1359daae49948b4
MD5 723db3ed0aaae7aa605cb297af361b44
BLAKE2b-256 fb4a90edfdeabbe30634d79ffd55b3385a582d95637b56e1b9912a72a957e00a

See more details on using hashes here.

File details

Details for the file pypiquant-0.3.3-cp310-cp310-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for pypiquant-0.3.3-cp310-cp310-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 4980d7d519d6c38ba9c92a9b64a4f3b7108dc7615339491993100808daa438e5
MD5 07ecd20495195f904d894e6e7ebcbd4a
BLAKE2b-256 c13cbbb0032c4778d443f891b34029bc8fe421b79faa31df48d1d255e9c09df2

See more details on using hashes here.

File details

Details for the file pypiquant-0.3.3-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for pypiquant-0.3.3-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 4337d6f65473fe67beb49a58f9ee6ab8ada70daceb2db0207af55bd0669ade9f
MD5 79598f0eb1d8b6cca70f529183863ad7
BLAKE2b-256 649a6deda14420db46054fd4c9a0871f5b616bf4e9d12c946be05e00ee34d4dd

See more details on using hashes here.

File details

Details for the file pypiquant-0.3.3-cp39-cp39-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for pypiquant-0.3.3-cp39-cp39-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 bf5e5442eacf741722b445832720a9282ffd283ff9f07014d4b8fdafed7fa21c
MD5 d8883d783d7ec3a3b37f29f9ba8e5ffd
BLAKE2b-256 cee24ab7a6d623dd0a2a9afb3158fd2ca1a35ac2445f6dc5444f4c25d600ffd8

See more details on using hashes here.

File details

Details for the file pypiquant-0.3.3-cp39-cp39-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for pypiquant-0.3.3-cp39-cp39-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 bd5a0401a86dabea92ee0ac4b290656b1af92c8f905aa264922d486986115254
MD5 2a4546f1797cb0a4ae15c9937bf8632d
BLAKE2b-256 57ce948b8bd400ed56c7b4d90973e508a3f34ab327ddfcc5b35ad3e22cbedfc9

See more details on using hashes here.

File details

Details for the file pypiquant-0.3.3-cp38-cp38-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for pypiquant-0.3.3-cp38-cp38-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 99f446a5c74040dcd49365b7f1428812b20adb4d215b390ac18bff82728ecefc
MD5 9f7cb6a38238bf0823256cc63b5fbf69
BLAKE2b-256 35596adbf8f1abae8fb67c16c4f9217999b46dc73ce0bf393a33cb831f6d9401

See more details on using hashes here.

File details

Details for the file pypiquant-0.3.3-cp38-cp38-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for pypiquant-0.3.3-cp38-cp38-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 697e0d4a6bcd6fbb78d7bf393007dcfb813f98a9ff5fa0255e82b84c0a3ea054
MD5 b31263661ddae5bd8f8bd474471fcb78
BLAKE2b-256 97dd8ff16c77d124842af0cfbcd8044d0002590b24c538d02411bd88cde9aecf

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page