PACE (Platform Aware Compute Engine): high-performance LLM inference on AMD CPUs.

These details have not been verified by PyPI

Project links

Project description

AMD PACE

High-performance LLM inference on AMD EPYC CPUs. PACE is a PyTorch C++ extension with custom AVX512 kernels, slab/paged KV cache, fused operators, and a production-ready serving stack. Check out the GitHub repository for more information.

PACE achieves 1.6x higher autoregressive and 3.2x higher speculative-decoding throughput compared to vLLM on 5th Gen AMD EPYC processors. More details and technical results here.

Highlights

SlabPool attention - CPU-native KV cache and attention backend with O(1) slab allocation, L2-aware block sizing, and a unified dispatcher that picks the optimal kernel path per sequence (GQA decode, multi-token decode, tiled prefill) within one OMP dispatch. Continuous batching, sliding-window, and sink attention go through a single entry point.
Inference server - pace-server provides a router/engine serving stack with continuous batching, multi-instance NUMA-aware execution, and built-in metrics. The launcher partitions CPU cores across engine instances and binds memory to the local NUMA node.
Paged attention - vLLM-style paged KV cache on CPU, fully integrated with PACE's serving stack and all supported models.
Fused AVX512 kernels - fused Add+RMSNorm, Add+LayerNorm, RoPE, QKV projections, and a fused MLP kernel (via TPP/libXSMM). Default for all supported models.
Broad model support - Llama (up to 3.3), Qwen2/2.5, Phi3/4, Gemma 3, GPT-J, OPT, and GPT-OSS, all running in BF16 under one operator and backend framework. Adding a new architecture is a single-file effort.
Speculative decoding (PARD) - built-in parallel-draft speculation, up to 5x throughput over standard autoregressive decoding.

Requirements

Linux x86_64 with AVX512F + AVX512_BF16 (AMD Zen4 or newer)
Python 3.10 – 3.13

Install

# 1. CPU PyTorch (the +cpu build is not on PyPI; needs PyTorch's index).
pip install --extra-index-url https://download.pytorch.org/whl/cpu torch==2.12.0+cpu

# 2. amd-pace
pip install amd-pace

Quick example

Inference server (router + engine, OpenAI-compatible endpoint):

pace-server --server_model meta-llama/Llama-3.1-8B --kv_cache_type SLAB_POOL --serve_type continuous_prefill_first

For offline programmatic generation (the pace.llm.LLMModel API needs a tokenizer and an OperatorConfig that picks a backend per op), see the runnable scripts at examples/ -- pace_llm_basic.py is the smallest starting point.

Support

We welcome feedback, suggestions, and bug reports. Should you have any of these, please kindly file an issue on the PACE GitHub page here.

License

AMD PACE is licensed under the MIT License. See the LICENSE file for details. Third-party notices are in NOTICE.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.2.0

Jun 16, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

amd_pace-1.2.0-py3-none-manylinux_2_34_x86_64.whl (22.3 MB view details)

Uploaded Jun 16, 2026 Python 3manylinux: glibc 2.34+ x86-64

File details

Details for the file amd_pace-1.2.0-py3-none-manylinux_2_34_x86_64.whl.

File metadata

Download URL: amd_pace-1.2.0-py3-none-manylinux_2_34_x86_64.whl
Upload date: Jun 16, 2026
Size: 22.3 MB
Tags: Python 3, manylinux: glibc 2.34+ x86-64
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for amd_pace-1.2.0-py3-none-manylinux_2_34_x86_64.whl
Algorithm	Hash digest
SHA256	`62ebbf0882deb9724cdfc4a35ce8db3f09cf82942e4adeb62f1345b5b0f137bf`
MD5	`c74c956a46a7e9b3914c7790d526a5c0`
BLAKE2b-256	`1ea5187d1edec9dfdf31137dae67513dd3b3d6cf809425cc21ed90ee7652abe1`

See more details on using hashes here.

amd-pace 1.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

AMD PACE

Highlights

Requirements

Install

Quick example

Support

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distribution

File details

File metadata

File hashes