auto-round-lib

Auto Round Kernel binary package

These details have not been verified by PyPI

Project links

Homepage

Project description

What is AutoRound Kernel?

AutoRound Kernel is a low-bit acceleration library for Intel platform.

The kernels are optimized for the following CPUs:

Intel Xeon Scalable processor (formerly Sapphire Rapids, and Emerald Rapids)
Intel Xeon 6 processors (formerly Sierra Forest and Granite Rapids)

The kernels are optimized for the following GPUs:

Intel Arc B-Series Graphics and Intel Arc Pro B-Series Graphics (formerly Battlemage)

Key Features

AutoRound Kernel provides weight-only linear computational capabilities for LLM inference. Specifically, the weight-only-quantization configs we support are given in the table below:

CPU

Weight dtype	Compute dtype	Scale dtype	Algorithm^[1]
INT8	INT8^[2] / BF16 / FP32	BF16 / FP32	sym / asym
INT4	INT8 / BF16 / FP32	BF16 / FP32	sym / asym
INT3	INT8 / BF16 / FP32	BF16 / FP32	sym / asym
INT2	INT8 / BF16 / FP32	BF16 / FP32	sym / asym
INT5	INT8 / BF16 / FP32	BF16 / FP32	sym / asym
INT6	INT8 / BF16 / FP32	BF16 / FP32	sym / asym
INT7	INT8 / BF16 / FP32	BF16 / FP32	sym / asym
INT1	INT8 / BF16 / FP32	BF16 / FP32	sym / asym
FP8 (E4M3, E5M2)	BF16 / FP32	FP32 / FP8 (E8M0)	NA
FP4 (E2M1)	BF16 / FP32	BF16 / FP32	NA

XPU

Weight dtype	Compute dtype	Scale dtype	Algorithm
INT8	INT8 / FP16	FP16	sym
INT4	INT8 / FP16	FP16	sym
FP8 (E4M3, E5M2)	FP16	FP16 / FP8 (E8M0)	NA

^[1]: Quantization algorithms for integer types: symmetric or asymmetric.
^[2]: Includes dynamic activation quantization; results are dequantized to floating-point formats.

Installation

1. Install via pip

pip install auto-round-lib

2. Install from Source

python setup.py bdist_wheel;pip install dist/*

Validated Hardware Environment

CPU based on Intel 64 architecture or compatible processors:

Intel Xeon Scalable processor (Granite Rapids)

GPU built on Intel's Xe architecture:

Intel Arc B-Series Graphics (Battlemage)

Resources

QuantLinear API

ARK exposes a unified weight-only linear interface through QuantLinear, QuantLinearGPTQ, QuantLinearAWQ, and QuantLinearFP8. Please refer to the QLinear for more integration details.

The expected lifecycle is: create the module, load quantized tensors from the checkpoint, call post_init() once to repack weights into the ARK-friendly layout, and then call forward() during inference.

Minimal usage:

from auto_round_kernel.qlinear import QuantLinear

qlinear = QuantLinear(
    bits=4,
    group_size=128,
    sym=True,
    in_features=in_features,
    out_features=out_features,
    bias=bias is not None,
    weight_dtype=weight_dtype,
)
# Load qweight, qzeros, scales, and bias from checkpoint.
qlinear.post_init()

# Run inference
y = qlinear(x)

A Weight-Only Example

A runnable end-to-end example is available in test_weightonly.py. It demonstrates how to prepare quantized weights and scales, call repack_quantized_weight to build ARK-packed weights, verify correctness with unpack_weight, and run woqgemm on CPU and XPU.

Replace torch SDPA and run lm-eval

ARK also exposes an XPU SDPA kernel through ARK.sdpa(...). If you want to replace torch.nn.functional.scaled_dot_product_attention globally for evaluation without editing model code, use the helper launcher in tools/lm_eval_with_ark_sdpa.py.

Example:

cd /path/to/auto_round_extension/ark
PYTHONPATH=$PWD python tools/lm_eval_with_ark_sdpa.py \
  --model hf \
  --model_args pretrained=/path/to/model,trust_remote_code=True,dtype=bfloat16 \
  --tasks hellaswag,piqa,winogrande \
  --device xpu:0 \
  --batch_size 1

Notes:

The patch only routes calls to ARK on XPU when the inputs match ARK kernel constraints; otherwise it falls back to the original torch SDPA.
Supported Q/K/V dtypes are FP16 and BF16.
Supported head dimensions are 64, 96, 128, and 192.
dropout_p must be 0.0 for the ARK path.
Additive masks are supported when they can be normalized to [B, 1, Sq, Skv]; boolean masks fall back to torch.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

0.13.1

May 15, 2026

0.13.0

May 13, 2026

0.10.3.1

Feb 10, 2026

0.10.3.0

Feb 6, 2026

0.10.2.1

Feb 10, 2026

0.10.2.0

Feb 6, 2026

0.10.1.1

Feb 10, 2026

0.10.1.0

Feb 6, 2026

0.9.6

Jan 23, 2026

0.9.5

Jan 13, 2026

0.9.4

Dec 31, 2025

0.9.3

Dec 25, 2025

0.9.2

Dec 4, 2025

0.9.1

Nov 26, 2025

0.9.0

Nov 14, 2025

0.8.0

Oct 23, 2025

0.7.1

Sep 23, 2025

0.7.0

Sep 10, 2025

0.6.0

Jul 23, 2025

0.5.1

Apr 23, 2025

0.5.0

Apr 22, 2025

0.4.7

Apr 1, 2025

0.4.6

Feb 24, 2025

0.4.5

Jan 27, 2025

0.4.4

Jan 9, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

auto_round_lib-0.13.1-cp314-cp314-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (17.2 MB view details)

Uploaded May 15, 2026 CPython 3.14manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

auto_round_lib-0.13.1-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (17.2 MB view details)

Uploaded May 15, 2026 CPython 3.13manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

auto_round_lib-0.13.1-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (17.2 MB view details)

Uploaded May 15, 2026 CPython 3.12manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

auto_round_lib-0.13.1-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (17.2 MB view details)

Uploaded May 15, 2026 CPython 3.11manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

auto_round_lib-0.13.1-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (17.2 MB view details)

Uploaded May 15, 2026 CPython 3.10manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

File details

Details for the file auto_round_lib-0.13.1-cp314-cp314-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

Download URL: auto_round_lib-0.13.1-cp314-cp314-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Upload date: May 15, 2026
Size: 17.2 MB
Tags: CPython 3.14, manylinux: glibc 2.27+ x86-64, manylinux: glibc 2.28+ x86-64
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.13

File hashes

Hashes for auto_round_lib-0.13.1-cp314-cp314-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm	Hash digest
SHA256	`d8ebdb2ea3073097c10d466edec8187403a8f417fc7ae7159eb849923d5bd232`
MD5	`a056d521073d0ec7e66b0d7f09ce355c`
BLAKE2b-256	`75991def81307236c48227ab9c20dafbc84a9edbdf4ffde04ef8cd07ebb746ba`

See more details on using hashes here.

File details

Details for the file auto_round_lib-0.13.1-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

Download URL: auto_round_lib-0.13.1-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Upload date: May 15, 2026
Size: 17.2 MB
Tags: CPython 3.13, manylinux: glibc 2.27+ x86-64, manylinux: glibc 2.28+ x86-64
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.13

File hashes

Hashes for auto_round_lib-0.13.1-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm	Hash digest
SHA256	`1033f8ff058c5309075201297f2a8e099f5b6c3af0197a61e4cbb1d42dc62249`
MD5	`be0579c1b67d0bb4752c64c22aef9f3b`
BLAKE2b-256	`51ac96dcf8b5d79b1e9a26109a908a3ac632c5eed0bf6005890124ea5525fa89`

See more details on using hashes here.

File details

Details for the file auto_round_lib-0.13.1-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

Download URL: auto_round_lib-0.13.1-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Upload date: May 15, 2026
Size: 17.2 MB
Tags: CPython 3.12, manylinux: glibc 2.27+ x86-64, manylinux: glibc 2.28+ x86-64
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.13

File hashes

Hashes for auto_round_lib-0.13.1-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm	Hash digest
SHA256	`21b0375f84b7505426b64b161d00716e3ae91ae7ac7b69d04d6318f93a6c198f`
MD5	`2f42affc1981e62d733ea33da24eed93`
BLAKE2b-256	`b21d6aae4a2bf25e0a2742c0a24007764b0fe91fbfdee19b7f60364019f36044`

See more details on using hashes here.

File details

Details for the file auto_round_lib-0.13.1-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

Download URL: auto_round_lib-0.13.1-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Upload date: May 15, 2026
Size: 17.2 MB
Tags: CPython 3.11, manylinux: glibc 2.27+ x86-64, manylinux: glibc 2.28+ x86-64
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.13

File hashes

Hashes for auto_round_lib-0.13.1-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm	Hash digest
SHA256	`09b798394b768a7b56543204afa7c29f7a6e02ab67c20be93b6ad4b49620bd5f`
MD5	`aa8e63db935b76077829d585be590bfa`
BLAKE2b-256	`e7810289eaf0b714b15442e2e023adffcd9da41ed00ad6e3b6093cb4c70435c4`

See more details on using hashes here.

File details

Details for the file auto_round_lib-0.13.1-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

Download URL: auto_round_lib-0.13.1-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Upload date: May 15, 2026
Size: 17.2 MB
Tags: CPython 3.10, manylinux: glibc 2.27+ x86-64, manylinux: glibc 2.28+ x86-64
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.13

File hashes

Hashes for auto_round_lib-0.13.1-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm	Hash digest
SHA256	`e3694bf8cfe77a48164a959c1b6da5977b374012a692a4182df7b57f758197bc`
MD5	`e44f2339350aff4730a2193ab06d70a3`
BLAKE2b-256	`2ceb37842a28c117cff6d22effb4d62e24f3d8c5b4d788573e337732879b346f`

See more details on using hashes here.

auto-round-lib 0.13.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

What is AutoRound Kernel?

Key Features

CPU

XPU

Installation

1. Install via pip

2. Install from Source

Validated Hardware Environment

CPU based on Intel 64 architecture or compatible processors:

GPU built on Intel's Xe architecture:

Resources

QuantLinear API

A Weight-Only Example

Replace torch SDPA and run lm-eval

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distributions

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes