Auto Round Kernel binary package
Project description
What is AutoRound Kernel?
AutoRound Kernel is a low-bit acceleration library for Intel platform.
The kernels are optimized for the following CPUs:
- Intel Xeon Scalable processor (formerly Sapphire Rapids, and Emerald Rapids)
- Intel Xeon 6 processors (formerly Sierra Forest and Granite Rapids)
The kernels are optimized for the following GPUs:
- Intel Arc B-Series Graphics and Intel Arc Pro B-Series Graphics (formerly Battlemage)
Key Features
AutoRound Kernel provides weight-only linear computational capabilities for LLM inference. Specifically, the weight-only-quantization configs we support are given in the table below:
CPU
| Weight dtype | Compute dtype | Scale dtype | Algorithm[1] |
|---|---|---|---|
| INT8 | INT8[2] / BF16 / FP32 | BF16 / FP32 | sym / asym |
| INT4 | INT8 / BF16 / FP32 | BF16 / FP32 | sym / asym |
| INT3 | INT8 / BF16 / FP32 | BF16 / FP32 | sym / asym |
| INT2 | INT8 / BF16 / FP32 | BF16 / FP32 | sym / asym |
| INT5 | INT8 / BF16 / FP32 | BF16 / FP32 | sym / asym |
| INT6 | INT8 / BF16 / FP32 | BF16 / FP32 | sym / asym |
| INT7 | INT8 / BF16 / FP32 | BF16 / FP32 | sym / asym |
| INT1 | INT8 / BF16 / FP32 | BF16 / FP32 | sym / asym |
| FP8 (E4M3, E5M2) | BF16 / FP32 | FP32 / FP8 (E8M0) | NA |
| FP4 (E2M1) | BF16 / FP32 | BF16 / FP32 | NA |
XPU
| Weight dtype | Compute dtype | Scale dtype | Algorithm |
|---|---|---|---|
| INT8 | INT8 / FP16 | FP16 | sym |
| INT4 | INT8 / FP16 | FP16 | sym |
| FP8 (E4M3, E5M2) | FP16 | FP16 / FP8 (E8M0) | NA |
[1]: Quantization algorithms for integer types: symmetric or asymmetric.
[2]: Includes dynamic activation quantization; results are dequantized to floating-point formats.
Installation
1. Install via pip
pip install auto-round-lib
2. Install from Source
python setup.py bdist_wheel;pip install dist/*
Validated Hardware Environment
CPU based on Intel 64 architecture or compatible processors:
- Intel Xeon Scalable processor (Granite Rapids)
GPU built on Intel's Xe architecture:
- Intel Arc B-Series Graphics (Battlemage)
Resources
QuantLinear API
ARK exposes a unified weight-only linear interface through QuantLinear, QuantLinearGPTQ, QuantLinearAWQ, and QuantLinearFP8. Please refer to the QLinear for more integration details.
The expected lifecycle is: create the module, load quantized tensors from the checkpoint, call post_init() once to repack weights into the ARK-friendly layout, and then call forward() during inference.
Minimal usage:
from auto_round_kernel.qlinear import QuantLinear
qlinear = QuantLinear(
bits=4,
group_size=128,
sym=True,
in_features=in_features,
out_features=out_features,
bias=bias is not None,
weight_dtype=weight_dtype,
)
# Load qweight, qzeros, scales, and bias from checkpoint.
qlinear.post_init()
# Run inference
y = qlinear(x)
A Weight-Only Example
A runnable end-to-end example is available in test_weightonly.py. It demonstrates how to prepare quantized weights and scales, call repack_quantized_weight to build ARK-packed weights, verify correctness with unpack_weight, and run woqgemm on CPU and XPU.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file auto_round_lib-0.13.0-cp314-cp314-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.
File metadata
- Download URL: auto_round_lib-0.13.0-cp314-cp314-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
- Upload date:
- Size: 17.1 MB
- Tags: CPython 3.14, manylinux: glibc 2.27+ x86-64, manylinux: glibc 2.28+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fc8362673bd003e81762f098854794820ae3cfa44938521d0fc454f17025fe32
|
|
| MD5 |
72ddd6b28cc3a8dbd892fc45546a0337
|
|
| BLAKE2b-256 |
7c5fbfe90351744a0a15c7810947f8b47eabff1bd837d73466ab4c8db65bf3ed
|
File details
Details for the file auto_round_lib-0.13.0-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.
File metadata
- Download URL: auto_round_lib-0.13.0-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
- Upload date:
- Size: 17.1 MB
- Tags: CPython 3.13, manylinux: glibc 2.27+ x86-64, manylinux: glibc 2.28+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b75948fc8ab9c62f270970f1e14b3c42b064066ed1c46d232c8d78a99bfc6ad4
|
|
| MD5 |
16b4b1e9597912dbd402cc97855612e4
|
|
| BLAKE2b-256 |
583907d08a1a8a45c85344d411d1891167a381683c7733a1aedcef2ef34125fc
|
File details
Details for the file auto_round_lib-0.13.0-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.
File metadata
- Download URL: auto_round_lib-0.13.0-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
- Upload date:
- Size: 17.1 MB
- Tags: CPython 3.12, manylinux: glibc 2.27+ x86-64, manylinux: glibc 2.28+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1f926f81de8b34b04019f48eae1205a9d8542baed0b2ea76024cc98d324fd12d
|
|
| MD5 |
12a7a6c7045680b08e685a99fec8becb
|
|
| BLAKE2b-256 |
2291cd2cc8cba2b15d547f5bec5a931266c0023019d8781c34c1f1a3f955ab1c
|
File details
Details for the file auto_round_lib-0.13.0-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.
File metadata
- Download URL: auto_round_lib-0.13.0-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
- Upload date:
- Size: 17.1 MB
- Tags: CPython 3.11, manylinux: glibc 2.27+ x86-64, manylinux: glibc 2.28+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
100ace67ca49c9957d66f4ffb8b3b7251fb62c5a587abfa7caa9bea5ab239b08
|
|
| MD5 |
73ba3b4ffc9c05316f0bfd2a8e40fed5
|
|
| BLAKE2b-256 |
ec32eb1eeff2081992bdfa2db105384e2e94ada5d2082711fe7a94d1dfdb2a3b
|
File details
Details for the file auto_round_lib-0.13.0-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.
File metadata
- Download URL: auto_round_lib-0.13.0-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
- Upload date:
- Size: 17.1 MB
- Tags: CPython 3.10, manylinux: glibc 2.27+ x86-64, manylinux: glibc 2.28+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
29e2ada89ce258a9aada9a26861a8fb3e87362f32fc81c1512e07dfba74ac35d
|
|
| MD5 |
ac13fbdd4bc0110da38ca77cc996edc2
|
|
| BLAKE2b-256 |
1958e16ed936f7b60239e15d7e63c651be58369f81f04204ec4f788dbe247c15
|