vllm-xpu-kernels

A high-throughput and memory-efficient inference and serving engine for LLMs

These details have been verified by PyPI

Maintainers

1pikachu IntelAIFW rajeevsi WenjunLiu wliao2

These details have not been verified by PyPI

Project links

Project description

About

This repo is designed as a vLLM plugin which provides custom kernels for Intel GPU (known as XPU in PyTorch).

Getting started

Currently we use PyTorch 2.10, oneapi 2025.3.

How it works

python3 setup.py build - will build a _C.abi3.so under build directory python3 setup.py install - will copy above .so to vllm_xpu_kernels folder python3 setup.py develop - will be local install if we use develop build or system/virtual env lib path if we use install.

On vllm side, we will import vllm_xpu_kernels._C at start time which should register all custom ops so we can directly use.

Prepare

Install oneapi 2025.3 deep learning essential dependency.

Create a new virtual env, install build dependency and torch dependency

pip install -r requirements.txt

Build & Install

Build development installation to current directory:

pip install --extra-index-url=https://download.pytorch.org/whl/xpu -e . -v
# or for faster build, you can use --no-build-isolation
pip install --no-build-isolation -e . -v

or installation to system directory:

pip install --extra-index-url=https://download.pytorch.org/whl/xpu  .
# or for faster build, you can use --no-build-isolation
pip install --no-build-isolation .

or build wheel (generated .whl in dist folder)

pip wheel --extra-index-url=https://download.pytorch.org/whl/xpu  .
# or for faster build, you can use --no-build-isolation
pip wheel --no-build-isolation  .

Incremental build

python3 -m build --wheel --no-isolation

How to use in vLLM

Please refer to temporary branch https://github.com/jikunshang/vllm/tree/xpu_kernel to install & test vllm which replaces rms_norm kernel from IPEX to vllm-xpu-kernels.

Why Static Linking DNNL Instead of Shared Linking?

We chose to statically link oneDNN (DNNL) rather than using it as a shared library for the following reasons:

1. Version Compatibility

Static linking ensures our application always uses the exact version of DNNL. With shared libraries, there's a risk that system-installed versions might be incompatible or introduce subtle bugs due to API/ABI changes.

2. Performance Consistency

By linking statically, we avoid potential performance variability introduced by different builds or configurations of DNNL that might be present on the host system.

3. Avoiding Runtime Errors

Using shared libraries requires correct paths and environment setup (LD_LIBRARY_PATH on Linux). Static linking avoids issues where DNNL cannot be found or loaded at runtime.

4. Aligning with PyTorch

One key reason to use static linking is to maintain consistency with the PyTorch ecosystem. PyTorch itself statically links libraries like DNNL to ensure deterministic and reliable behavior across different environments.

License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

Project details

These details have been verified by PyPI

Maintainers

1pikachu IntelAIFW rajeevsi WenjunLiu wliao2

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.3.1

Mar 6, 2026

0.0.1

Feb 24, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

vllm_xpu_kernels-0.1.3.1-cp38-abi3-manylinux_2_28_x86_64.whl (56.3 MB view details)

Uploaded Mar 6, 2026 CPython 3.8+manylinux: glibc 2.28+ x86-64

File details

Details for the file vllm_xpu_kernels-0.1.3.1-cp38-abi3-manylinux_2_28_x86_64.whl.

File metadata

Download URL: vllm_xpu_kernels-0.1.3.1-cp38-abi3-manylinux_2_28_x86_64.whl
Upload date: Mar 6, 2026
Size: 56.3 MB
Tags: CPython 3.8+, manylinux: glibc 2.28+ x86-64
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.0

File hashes

Hashes for vllm_xpu_kernels-0.1.3.1-cp38-abi3-manylinux_2_28_x86_64.whl
Algorithm	Hash digest
SHA256	`ddbc1b5597fd68127d043692a204c2ba62991fbd5cc812c2f2170a581801fdba`
MD5	`217874e86161e28916be05717daae647`
BLAKE2b-256	`9ddd8c71c06385930c9a1bfb8b4d4834bd98209fc72a48dd03e14ec808b1ee59`

See more details on using hashes here.

vllm-xpu-kernels 0.1.3.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

About

Getting started

How it works

Prepare

Build & Install

How to use in vLLM

Why Static Linking DNNL Instead of Shared Linking?

1. Version Compatibility

2. Performance Consistency

3. Avoiding Runtime Errors

4. Aligning with PyTorch

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distribution

File details

File metadata

File hashes