AIBrix KV Cache offloading framework for cross-engine KV reuse

These details have not been verified by PyPI

Project description

AIBrix KV Cache Offloading Framework for Cross-Engine KV Reuse

AIBrix KV cache offloading framework provides several common functionalities for cross-engine KV reuse use cases:

Tensor Parallelism Aware Management: When inference engine (e.g., vLLM) uses tensor parallelism, each participating engine instance fetches KV tensors independently of the cache backend. In case of cache misses, before proceeding with prefill computation, participants must align the potentially different number of KV tensors fetched from the external KV cache service to ensure a consistent view.

Embedded Cache w/ CPU Memory: To meet performance requirements, it's common to have a small CPU memory-based cache embedded in the engine to avoid frequently accessing remote cache backends.

Selective KV Cache Offloading: Enables fine-grained control over offloading strategies and thus is crucial in optimizing performance across diverse deployment environments:

Many cloud providers and companies deploy lower-end GPU instances without high-speed interconnects like RDMA, suited for tasks related to 7B/8B models running on 24/32GiB GPU cards. In these setups, GPUs within the same instance (typically 8-16 GPUs) share a single VPC NIC, leading to significant network bandwidth contention. Selective KV cache offloading (e.g., only offloading KV tensors identified by the employed eviction policy as hot rather than offloading all KV tensors) helps mitigate this issue by reducing unnecessary data transfers and conserving limited network bandwidth.
Even in high-performance environments with RDMA-equipped GPUs, selective KV cache offloading can enhance efficiency by limiting the PCIe bandwidth consumed by remote data movement. While RDMA enables low-latency, high-bandwidth communication, remote data access still incurs higher latency than local memory access. By leveraging selective KV offloading, the framework reduces the frequency of remote data transfers, preserving PCIe bandwidth and ensuring that local memory access remains the preferred data pathway. To achieve selective KV cache offloading, we introduce an eviction policy layer that can be extended and customized with advanced offloading strategies to determine which KV tensors should be offloaded. Within this layer, multiple callbacks are available to support different offloading modes, including offloading all KV tensors, only hot KV tensors, or only cold KV tensors, with the definition of "hot" and "cold" being determined by the specific eviction policy in use. In this initial PR, the framework will provide built-in support for LRU, FIFO, and S3FIFO eviction policies.

Quick Start

Installation

AIBrix KV cache offloading framework can be installed by pip.

pip install aibrix-kvcache

Contributing

We welcome contributions from the community! Check out our contributing guidelines to see how you can make a difference.

Build from source

# This may take several minutes
pip install -e .

Lint, Format and Type Check

Before contribute your code, please run the following commands to ensure that your code passes the tests and linting checks.

# install dependencies
pip install -r requirements/build.txt -r requirements/dev.txt -r requirements/core.txt

# linting, formatting and type checking
bash ./scripts/format.sh

License

AI Runtime is licensed under the APACHE License.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.6.0

Mar 3, 2026

0.6.0rc5 pre-release

Mar 3, 2026

0.6.0rc4 pre-release

Mar 3, 2026

This version

0.6.0rc3 pre-release

Mar 3, 2026

0.6.0rc2 pre-release

Mar 2, 2026

0.6.0rc1 pre-release

Feb 26, 2026

0.5.0

Nov 9, 2025

0.5.0rc3 pre-release

Nov 8, 2025

0.5.0rc2 pre-release

Nov 1, 2025

0.5.0rc1 pre-release

Oct 25, 2025

0.4.1

Aug 19, 2025

0.4.0

Aug 5, 2025

0.4.0rc4 pre-release

Aug 2, 2025

0.4.0rc3 pre-release

Aug 2, 2025

0.4.0rc2 pre-release

Aug 1, 2025

0.3.0

May 21, 2025

0.3.0rc2.post1 pre-release

May 21, 2025

0.3.0rc2 pre-release

May 21, 2025

0.1.0.post2

May 13, 2025

0.1.0.post1

May 13, 2025

0.1.0

May 12, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

aibrix_kvcache-0.6.0rc3-cp312-cp312-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (3.9 MB view details)

Uploaded Mar 3, 2026 CPython 3.12manylinux: glibc 2.24+ x86-64manylinux: glibc 2.28+ x86-64

aibrix_kvcache-0.6.0rc3-cp311-cp311-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (3.9 MB view details)

Uploaded Mar 3, 2026 CPython 3.11manylinux: glibc 2.24+ x86-64manylinux: glibc 2.28+ x86-64

File details

Details for the file aibrix_kvcache-0.6.0rc3-cp312-cp312-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl.

File metadata

Download URL: aibrix_kvcache-0.6.0rc3-cp312-cp312-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl
Upload date: Mar 3, 2026
Size: 3.9 MB
Tags: CPython 3.12, manylinux: glibc 2.24+ x86-64, manylinux: glibc 2.28+ x86-64
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for aibrix_kvcache-0.6.0rc3-cp312-cp312-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl
Algorithm	Hash digest
SHA256	`7fa3e35ecb96fb9c582285ed08ac8da3f6b1aa492e26aaac35bda55a31b30f37`
MD5	`0d61ab146cbe283e4f04c3de89c2cfcd`
BLAKE2b-256	`1fc6a874f57bb2aea121fae69e81acaf4260ece5304d01b87a82406451cfc8ae`

See more details on using hashes here.

File details

Details for the file aibrix_kvcache-0.6.0rc3-cp311-cp311-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl.

File metadata

Download URL: aibrix_kvcache-0.6.0rc3-cp311-cp311-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl
Upload date: Mar 3, 2026
Size: 3.9 MB
Tags: CPython 3.11, manylinux: glibc 2.24+ x86-64, manylinux: glibc 2.28+ x86-64
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for aibrix_kvcache-0.6.0rc3-cp311-cp311-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl
Algorithm	Hash digest
SHA256	`5f9564115bb8586ac7f87da9572781858875ad70f74942132a624f4220e17125`
MD5	`adab48e265850b39b65731d1a7d0327e`
BLAKE2b-256	`601ce2daf63859fc67f0332fdce52581b804b134cbf99618e6c26f8a8f562f7d`

See more details on using hashes here.

aibrix-kvcache 0.6.0rc3

Navigation

Verified details

Owner

Unverified details

Meta

Classifiers

Project description

AIBrix KV Cache Offloading Framework for Cross-Engine KV Reuse

Quick Start

Installation

Contributing

Build from source

Lint, Format and Type Check

License

Project details

Verified details

Owner

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distributions

File details

File metadata

File hashes

File details

File metadata

File hashes