Skip to main content

Auto Round Kernel binary package

Project description

What is AutoRound Kernel?

AutoRound Kernel is a low-bit acceleration library for Intel platform.

The kernels are optimized for the following CPUs:

  • Intel Xeon Scalable processor (formerly Sapphire Rapids, and Emerald Rapids)
  • Intel Xeon 6 processors (formerly Sierra Forest and Granite Rapids)

The kernels are optimized for the following GPUs:

  • Intel Arc B-Series Graphics and Intel Arc Pro B-Series Graphics (formerly Battlemage)

Key Features

AutoRound Kernel provides weight-only linear computational capabilities for LLM inference. Specifically, the weight-only-quantization configs we support are given in the table below:

CPU

Weight dtype Compute dtype Scale dtype Algorithm[1]
INT8 INT8[2] / BF16 / FP32 BF16 / FP32 sym / asym
INT4 INT8 / BF16 / FP32 BF16 / FP32 sym / asym
INT3 INT8 / BF16 / FP32 BF16 / FP32 sym / asym
INT2 INT8 / BF16 / FP32 BF16 / FP32 sym / asym
INT5 INT8 / BF16 / FP32 BF16 / FP32 sym / asym
INT6 INT8 / BF16 / FP32 BF16 / FP32 sym / asym
INT7 INT8 / BF16 / FP32 BF16 / FP32 sym / asym
INT1 INT8 / BF16 / FP32 BF16 / FP32 sym / asym
FP8 (E4M3, E5M2) BF16 / FP32 FP32 / FP8 (E8M0) NA
FP4 (E2M1) BF16 / FP32 BF16 / FP32 NA

XPU

Weight dtype Compute dtype Scale dtype Algorithm
INT8 INT8 / FP16 FP16 sym
INT4 INT8 / FP16 FP16 sym
FP8 (E4M3, E5M2) FP16 FP16 / FP8 (E8M0) NA

[1]: Quantization algorithms for integer types: symmetric or asymmetric.
[2]: Includes dynamic activation quantization; results are dequantized to floating-point formats.

Installation

1. Install via pip

pip install auto-round-lib

2. Install from Source

python setup.py bdist_wheel;pip install dist/*

Validated Hardware Environment

CPU based on Intel 64 architecture or compatible processors:

  • Intel Xeon Scalable processor (Granite Rapids)

GPU built on Intel's Xe architecture:

  • Intel Arc B-Series Graphics (Battlemage)

Resources

QuantLinear API

ARK exposes a unified weight-only linear interface through QuantLinear, QuantLinearGPTQ, QuantLinearAWQ, and QuantLinearFP8. Please refer to the QLinear for more integration details.

The expected lifecycle is: create the module, load quantized tensors from the checkpoint, call post_init() once to repack weights into the ARK-friendly layout, and then call forward() during inference.

Minimal usage:

from auto_round_kernel.qlinear import QuantLinear

qlinear = QuantLinear(
    bits=4,
    group_size=128,
    sym=True,
    in_features=in_features,
    out_features=out_features,
    bias=bias is not None,
    weight_dtype=weight_dtype,
)
# Load qweight, qzeros, scales, and bias from checkpoint.
qlinear.post_init()

# Run inference
y = qlinear(x)

A Weight-Only Example

A runnable end-to-end example is available in test_weightonly.py. It demonstrates how to prepare quantized weights and scales, call repack_quantized_weight to build ARK-packed weights, verify correctness with unpack_weight, and run woqgemm on CPU and XPU.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

auto_round_lib-0.13.0-cp314-cp314-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (17.1 MB view details)

Uploaded CPython 3.14manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

auto_round_lib-0.13.0-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (17.1 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

auto_round_lib-0.13.0-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (17.1 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

auto_round_lib-0.13.0-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (17.1 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

auto_round_lib-0.13.0-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (17.1 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

File details

Details for the file auto_round_lib-0.13.0-cp314-cp314-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for auto_round_lib-0.13.0-cp314-cp314-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 fc8362673bd003e81762f098854794820ae3cfa44938521d0fc454f17025fe32
MD5 72ddd6b28cc3a8dbd892fc45546a0337
BLAKE2b-256 7c5fbfe90351744a0a15c7810947f8b47eabff1bd837d73466ab4c8db65bf3ed

See more details on using hashes here.

File details

Details for the file auto_round_lib-0.13.0-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for auto_round_lib-0.13.0-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 b75948fc8ab9c62f270970f1e14b3c42b064066ed1c46d232c8d78a99bfc6ad4
MD5 16b4b1e9597912dbd402cc97855612e4
BLAKE2b-256 583907d08a1a8a45c85344d411d1891167a381683c7733a1aedcef2ef34125fc

See more details on using hashes here.

File details

Details for the file auto_round_lib-0.13.0-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for auto_round_lib-0.13.0-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 1f926f81de8b34b04019f48eae1205a9d8542baed0b2ea76024cc98d324fd12d
MD5 12a7a6c7045680b08e685a99fec8becb
BLAKE2b-256 2291cd2cc8cba2b15d547f5bec5a931266c0023019d8781c34c1f1a3f955ab1c

See more details on using hashes here.

File details

Details for the file auto_round_lib-0.13.0-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for auto_round_lib-0.13.0-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 100ace67ca49c9957d66f4ffb8b3b7251fb62c5a587abfa7caa9bea5ab239b08
MD5 73ba3b4ffc9c05316f0bfd2a8e40fed5
BLAKE2b-256 ec32eb1eeff2081992bdfa2db105384e2e94ada5d2082711fe7a94d1dfdb2a3b

See more details on using hashes here.

File details

Details for the file auto_round_lib-0.13.0-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for auto_round_lib-0.13.0-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 29e2ada89ce258a9aada9a26861a8fb3e87362f32fc81c1512e07dfba74ac35d
MD5 ac13fbdd4bc0110da38ca77cc996edc2
BLAKE2b-256 1958e16ed936f7b60239e15d7e63c651be58369f81f04204ec4f788dbe247c15

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page