LMCache storage plugin that targets the MemKV context memory store; gives vLLM a MemKV-backed prefix cache via lmcache_connector.
Project description
memkv-lmcache
LMCache StoragePluginInterface backend that persists KV chunks in a
remote MemKV cluster. Loaded as a vendor plugin via LMCache's
storage_plugins dynamic loader — no patches to LMCache's tree.
vLLM gets a MemKV-backed prefix KV-state path for free through
vllm/distributed/kv_transfer/kv_connector/v1/lmcache_connector.py —
no separate vLLM connector package is required.
Build
cd lmcache-plugin
pip install maturin
maturin develop --release # local dev install
# or
maturin build --release # wheel under target/wheels/
pip install target/wheels/memkv_lmcache-*.whl
The wheel bundles a native PyO3 extension built from the same
memkv-client crate the NIXL plugin and the sglang plugin use, so
RDMA/TCP transport selection works the same way across all three.
Configure the MemKV connection
The plugin reads the standard MemKV config chain — MEMKV_CONFIG
yaml first, then MEMKV_* env vars:
export MEMKV_SERVERS="10.0.0.10:9900,10.0.0.11:9900"
export MEMKV_RDMA_DEVICES="mlx5_0,mlx5_1"
export MEMKV_AUTH_KEY="<64-hex>"
# optional:
# export MEMKV_TRANSPORT=auto
# export MEMKV_CONFIG=/etc/memkv.yaml
Configure LMCache
Add the plugin to your LMCache yaml. max_local_cpu_size must be
> 0 because the plugin uses LocalCPUBackend's allocator to stage
retrieved tensors:
chunk_size: 64
local_cpu: true
max_local_cpu_size: 5
storage_plugins: memkv
extra_config:
storage_plugin.memkv.module_path: memkv_lmcache.backend
storage_plugin.memkv.class_name: MemKVStorageBackend
Launch vLLM with LMCache + MemKV
LMCACHE_CONFIG_FILE=/etc/lmcache.yaml \
KV_TRANSFER_CONFIG='{"kv_connector":"LMCacheConnectorV1","kv_role":"kv_both"}' \
vllm serve meta-llama/Llama-3-8B \
--tensor-parallel-size 1 \
--kv-transfer-config "$KV_TRANSFER_CONFIG"
LMCache's connector picks the storage_plugins entry up at startup and routes prefix KV reads/writes through MemKVStorageBackend.
What's implemented
| Method | Status |
|---|---|
contains |
yes (in-process _meta check — a key only counts as present when its shape/dtype metadata is on file, so a fresh process reports miss for server-resident bytes it cannot reconstruct); batched_contains falls through to StoragePluginInterface's default loop |
exists_in_put_tasks |
yes (in-process tracking set) |
batched_submit_put_task |
yes (synchronous; returns None) |
get_blocking |
yes (requires prior put in this process — see Caveats) |
remove |
yes |
pin / unpin |
yes (presence-check only — wire layer has no per-client retention) |
get_allocator_backend |
yes (delegates to LocalCPUBackend) |
close |
yes |
Caveats
- Cross-restart warm cache is MVP-restricted. Each engine process
keeps shape/dtype/fmt in an in-memory dict so
get_blockingknows whatMemoryObjto allocate. The wire bytes survive in MemKV across restarts; the local metadata does not. A fresh process therefore starts cold even when MemKV holds the chunks. This matches LMCache's LocalDiskBackend behavior. Encoding the shape header on the wire is a follow-up. - Key length cap. MemKV's protocol caps keys at 512 bytes;
CacheEngineKey.to_string()shapes longer than 480 bytes collapse to amemkv-h2:<blake2b-256>digest. - Pin/unpin are local-only. MemKV has no per-client retention policy, so the methods are presence checks against the local meta dict. Server-side eviction is owned by the MemKV cluster.
- Reads ride the server-driven chunked BatchRead.
get_blockingusesbatch_get_into, which streams the full value through the server's bounce buffers into per-thread staging with strict full-length-or-miss semantics. The single-key zero-copyget_into(client-driven RDMA READ) remains available but is not the LMCache default: under sustained burst load it saturated the per-connection RC send CQ tail and tripped vLLM's SPMD broadcast timeout, and it faults the value resident server-side.
Layout
lmcache-plugin/
├── Cargo.toml # cdylib + pyo3 + memkv-client
├── pyproject.toml # maturin
├── src/lib.rs # PyO3 wrapper around memkv-client::Engine
└── python/memkv_lmcache/
├── __init__.py # re-exports Client
└── backend.py # MemKVStorageBackend(StoragePluginInterface)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file memkv_lmcache-1.0.0-cp38-abi3-manylinux_2_28_x86_64.whl.
File metadata
- Download URL: memkv_lmcache-1.0.0-cp38-abi3-manylinux_2_28_x86_64.whl
- Upload date:
- Size: 1.5 MB
- Tags: CPython 3.8+, manylinux: glibc 2.28+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.20
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d467369f085fdb1cbf4a6ddfc927b9a4fefe25b4deb75137c51644d9e878d5f0
|
|
| MD5 |
706a69141f0c076d2916f563e5031246
|
|
| BLAKE2b-256 |
bd525f0b2337951211ebb98c01bad707090000aadddba95d2b3d4dc40ad788ff
|
File details
Details for the file memkv_lmcache-1.0.0-cp38-abi3-manylinux_2_28_aarch64.whl.
File metadata
- Download URL: memkv_lmcache-1.0.0-cp38-abi3-manylinux_2_28_aarch64.whl
- Upload date:
- Size: 1.4 MB
- Tags: CPython 3.8+, manylinux: glibc 2.28+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.20
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6600007ca0fa05648cfbb21ef23276bb0fbd31c60812b20a6353a984726f8938
|
|
| MD5 |
9fc0a2a6292d51513e5d6fd6b076c3ff
|
|
| BLAKE2b-256 |
6f17da2e99f83cc73f95abf69d6a849af1ef251eb59a1a13b9403d81cc58122d
|