Skip to main content

Auto Round Kernel binary package

Project description

What is AutoRound Kernel?

AutoRound Kernel is a low-bit acceleration library for Intel platform.

The kernels are optimized for the following CPUs:

  • Intel Xeon Scalable processor (formerly Sapphire Rapids, and Emerald Rapids)
  • Intel Xeon 6 processors (formerly Sierra Forest and Granite Rapids)

The kernels are optimized for the following GPUs:

  • Intel Arc B-Series Graphics and Intel Arc Pro B-Series Graphics (formerly Battlemage)

Key Features

AutoRound Kernel provides weight-only linear computational capabilities for LLM inference. Specifically, the weight-only-quantization configs we support are given in the table below:

CPU

Weight dtype Compute dtype Scale dtype Algorithm[1]
INT8 INT8[2] / BF16 / FP32 BF16 / FP32 sym / asym
INT4 INT8 / BF16 / FP32 BF16 / FP32 sym / asym
INT3 INT8 / BF16 / FP32 BF16 / FP32 sym / asym
INT2 INT8 / BF16 / FP32 BF16 / FP32 sym / asym
INT5 INT8 / BF16 / FP32 BF16 / FP32 sym / asym
INT6 INT8 / BF16 / FP32 BF16 / FP32 sym / asym
INT7 INT8 / BF16 / FP32 BF16 / FP32 sym / asym
INT1 INT8 / BF16 / FP32 BF16 / FP32 sym / asym
FP8 (E4M3, E5M2) BF16 / FP32 FP32 / FP8 (E8M0) NA
FP4 (E2M1) BF16 / FP32 BF16 / FP32 NA

XPU

Weight dtype Compute dtype Scale dtype Algorithm
INT8 INT8 / FP16 FP16 sym
INT4 INT8 / FP16 FP16 sym
FP8 (E4M3, E5M2) FP16 FP16 / FP8 (E8M0) NA

[1]: Quantization algorithms for integer types: symmetric or asymmetric.
[2]: Includes dynamic activation quantization; results are dequantized to floating-point formats.

Installation

1. Install via pip

pip install auto-round-lib

2. Install from Source

python setup.py bdist_wheel;pip install dist/*

Validated Hardware Environment

CPU based on Intel 64 architecture or compatible processors:

  • Intel Xeon Scalable processor (Granite Rapids)

GPU built on Intel's Xe architecture:

  • Intel Arc B-Series Graphics (Battlemage)

Resources

QuantLinear API

ARK exposes a unified weight-only linear interface through QuantLinear, QuantLinearGPTQ, QuantLinearAWQ, and QuantLinearFP8. Please refer to the QLinear for more integration details.

The expected lifecycle is: create the module, load quantized tensors from the checkpoint, call post_init() once to repack weights into the ARK-friendly layout, and then call forward() during inference.

Minimal usage:

from auto_round_kernel.qlinear import QuantLinear

qlinear = QuantLinear(
    bits=4,
    group_size=128,
    sym=True,
    in_features=in_features,
    out_features=out_features,
    bias=bias is not None,
    weight_dtype=weight_dtype,
)
# Load qweight, qzeros, scales, and bias from checkpoint.
qlinear.post_init()

# Run inference
y = qlinear(x)

A Weight-Only Example

A runnable end-to-end example is available in test_weightonly.py. It demonstrates how to prepare quantized weights and scales, call repack_quantized_weight to build ARK-packed weights, verify correctness with unpack_weight, and run woqgemm on CPU and XPU.

Replace torch SDPA and run lm-eval

ARK also exposes an XPU SDPA kernel through ARK.sdpa(...). If you want to replace torch.nn.functional.scaled_dot_product_attention globally for evaluation without editing model code, use the helper launcher in tools/lm_eval_with_ark_sdpa.py.

Example:

cd /path/to/auto_round_extension/ark
PYTHONPATH=$PWD python tools/lm_eval_with_ark_sdpa.py \
  --model hf \
  --model_args pretrained=/path/to/model,trust_remote_code=True,dtype=bfloat16 \
  --tasks hellaswag,piqa,winogrande \
  --device xpu:0 \
  --batch_size 1

Notes:

  • The patch only routes calls to ARK on XPU when the inputs match ARK kernel constraints; otherwise it falls back to the original torch SDPA.
  • Supported Q/K/V dtypes are FP16 and BF16.
  • Supported head dimensions are 64, 96, 128, and 192.
  • dropout_p must be 0.0 for the ARK path.
  • Additive masks are supported when they can be normalized to [B, 1, Sq, Skv]; boolean masks fall back to torch.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

auto_round_lib-0.13.1-cp314-cp314-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (17.2 MB view details)

Uploaded CPython 3.14manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

auto_round_lib-0.13.1-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (17.2 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

auto_round_lib-0.13.1-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (17.2 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

auto_round_lib-0.13.1-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (17.2 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

auto_round_lib-0.13.1-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (17.2 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

File details

Details for the file auto_round_lib-0.13.1-cp314-cp314-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for auto_round_lib-0.13.1-cp314-cp314-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 d8ebdb2ea3073097c10d466edec8187403a8f417fc7ae7159eb849923d5bd232
MD5 a056d521073d0ec7e66b0d7f09ce355c
BLAKE2b-256 75991def81307236c48227ab9c20dafbc84a9edbdf4ffde04ef8cd07ebb746ba

See more details on using hashes here.

File details

Details for the file auto_round_lib-0.13.1-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for auto_round_lib-0.13.1-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 1033f8ff058c5309075201297f2a8e099f5b6c3af0197a61e4cbb1d42dc62249
MD5 be0579c1b67d0bb4752c64c22aef9f3b
BLAKE2b-256 51ac96dcf8b5d79b1e9a26109a908a3ac632c5eed0bf6005890124ea5525fa89

See more details on using hashes here.

File details

Details for the file auto_round_lib-0.13.1-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for auto_round_lib-0.13.1-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 21b0375f84b7505426b64b161d00716e3ae91ae7ac7b69d04d6318f93a6c198f
MD5 2f42affc1981e62d733ea33da24eed93
BLAKE2b-256 b21d6aae4a2bf25e0a2742c0a24007764b0fe91fbfdee19b7f60364019f36044

See more details on using hashes here.

File details

Details for the file auto_round_lib-0.13.1-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for auto_round_lib-0.13.1-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 09b798394b768a7b56543204afa7c29f7a6e02ab67c20be93b6ad4b49620bd5f
MD5 aa8e63db935b76077829d585be590bfa
BLAKE2b-256 e7810289eaf0b714b15442e2e023adffcd9da41ed00ad6e3b6093cb4c70435c4

See more details on using hashes here.

File details

Details for the file auto_round_lib-0.13.1-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for auto_round_lib-0.13.1-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 e3694bf8cfe77a48164a959c1b6da5977b374012a692a4182df7b57f758197bc
MD5 e44f2339350aff4730a2193ab06d70a3
BLAKE2b-256 2ceb37842a28c117cff6d22effb4d62e24f3d8c5b4d788573e337732879b346f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page