PACE platform plugin for vLLM CPU inference on AMD EPYC processors.

These details have not been verified by PyPI

Project links

Project description

pace-vllm

vLLM platform plugin for AMD PACE. Once installed, vLLM auto-discovers it and routes its CPU worker through PACE's kernels and KV cache, with no changes to your vLLM scripts. Check out the GitHub repository for more information.

The PACE vLLM plugin brings PACE's CPU optimizations to vLLM with no application code changes, retaining ~95% of standalone PACE efficiency and delivering ~1.3x the performance of native vLLM 0.21 on 5th Gen AMD EPYC processors. More details and technical results here.

What it does

pace-vllm registers PACE as a vLLM CPU platform via the vllm.platform_plugins entry point. The plugin replaces vLLM's stock CPU worker, attention backend, KV cache, and Linear/RMSNorm layers with PACE equivalents; in compile mode it also installs a post-grad pattern matcher that fuses gated/ungated MLP blocks into a single libxsmm call.

Highlights

Drop-in plugin - no changes to your vLLM serve script; the vllm.platform_plugins entry point is discovered automatically.
SlabPool KV cache - one slab per attention layer, owned by PACE, with sliding-window and sink-attention support.
Fused MLP pass - gated SwiGLU/GeGLU and ungated fc1->act->fc2 MLPs (silu / gelu-tanh / gelu-exact / relu) are rewritten into a single pace::libxsmm_fused_mlp call under compile mode.

Requirements

Linux x86_64 with AVX512F + AVX512_BF16 (AMD Zen4 / EPYC 5th Gen or newer)
Python 3.10 – 3.13
vLLM 0.21.x (CPU build)

Install

# 1. vLLM CPU build (pace-vllm is a plugin; it no-ops without vllm).
pip install https://github.com/vllm-project/vllm/releases/download/v0.21.0/vllm-0.21.0+cpu-cp38-abi3-manylinux_2_34_x86_64.whl --extra-index-url https://download.pytorch.org/whl/cpu

# 2. pace-vllm
pip install pace-vllm

Quick example

CLI (vLLM auto-discovers the plugin):

vllm serve meta-llama/Llama-3.1-8B

Python:

from vllm import LLM, SamplingParams


def main() -> None:
    llm = LLM(model="meta-llama/Llama-3.1-8B", dtype="bfloat16")
    out = llm.generate(["The capital of France is"], SamplingParams(max_tokens=8))
    print(out[0].outputs[0].text)


if __name__ == "__main__":
    main()

The if __name__ == "__main__": guard is required: vLLM v1's engine spawns a subprocess for the worker, and without the guard the subprocess re-imports the script and recursively spawns until the OS refuses.

Support

We welcome feedback, suggestions, and bug reports. Should you have any of these, please kindly file an issue on the PACE GitHub page here.

License

pace-vllm is licensed under the MIT License. See the LICENSE file for details. Third-party notices are in NOTICE.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.2.0

Jun 16, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pace_vllm-1.2.0-py3-none-manylinux_2_34_x86_64.whl (22.1 MB view details)

Uploaded Jun 16, 2026 Python 3manylinux: glibc 2.34+ x86-64

File details

Details for the file pace_vllm-1.2.0-py3-none-manylinux_2_34_x86_64.whl.

File metadata

Download URL: pace_vllm-1.2.0-py3-none-manylinux_2_34_x86_64.whl
Upload date: Jun 16, 2026
Size: 22.1 MB
Tags: Python 3, manylinux: glibc 2.34+ x86-64
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for pace_vllm-1.2.0-py3-none-manylinux_2_34_x86_64.whl
Algorithm	Hash digest
SHA256	`4181c4c8bb8ce98a0daaf9838f0e4a96ffd33c343870dbc289457675b4561e9f`
MD5	`01f8c8dfc35be5cdc84995f8c8dddbfb`
BLAKE2b-256	`8bd98b37d9b378d73af75875c4d41afaa993665a523124ff6420e14a93adc128`

See more details on using hashes here.

pace-vllm 1.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

pace-vllm

What it does

Highlights

Requirements

Install

Quick example

Support

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distribution

File details

File metadata

File hashes