Skip to main content

Dynamo KVBM

Project description

Dynamo KVBM

The Dynamo KVBM is a distributed KV-cache block management system designed for scalable LLM inference. It cleanly separates memory management from inference runtimes (vLLM, TensorRT-LLM, and SGLang), enabling GPU↔CPU↔Disk/Remote tiering, asynchronous block offload/onboard, and efficient block reuse.

A block diagram showing a layered architecture view of Dynamo KV Block manager.

Feature Highlights

  • Distributed KV-Cache Management: Unified GPU↔CPU↔Disk↔Remote tiering for scalable LLM inference.
  • Async Offload & Reuse: Seamlessly move KV blocks between memory tiers using GDS-accelerated transfers powered by NIXL, without recomputation.
  • Runtime-Agnostic: Works out-of-the-box with vLLM, TensorRT-LLM, and SGLang via lightweight connectors.
  • Memory-Safe & Modular: RAII lifecycle and pluggable design for reliability, portability, and backend extensibility.

Build and Installation

The pip wheel is built through a Docker build process:

# Build the Docker image with KVBM enabled (from the dynamo repo root)
./container/build.sh --framework none --enable-kvbm --tag local-kvbm

Once built, you can either:

Option 1: Run and use the container directly

./container/run.sh --framework none -it

Option 2: Extract the wheel file to your local filesystem

# Create a temporary container from the built image
docker create --name temp-kvbm-container local-kvbm:latest

# Copy the KVBM wheel to your current directory
docker cp temp-kvbm-container:/opt/dynamo/wheelhouse/ ./dynamo_wheelhouse

# Clean up the temporary container
docker rm temp-kvbm-container

# Install the wheel locally
pip install ./dynamo_wheelhouse/kvbm*.whl

Note that the default pip wheel built is not compatible with CUDA 13 at the moment.

Integrations

Environment Variables

Variable Description Default
DYN_KVBM_CPU_CACHE_GB CPU pinned memory cache size (GB) required
DYN_KVBM_DISK_CACHE_GB SSD Disk/Storage system cache size (GB) optional
DYN_KVBM_DISK_CACHE_DIR Disk cache directory /tmp/
DYN_KVBM_DISK_ZEROFILL_FALLBACK Enable zero-fill when fallocate() unsupported (e.g., Lustre) false
DYN_KVBM_DISK_DISABLE_O_DIRECT Disable O_DIRECT for disk I/O (debug/compatibility) false
DYN_KVBM_LEADER_WORKER_INIT_TIMEOUT_SECS Timeout (in seconds) for the KVBM leader and worker to synchronize and allocate the required memory and storage. Increase this value if allocating large amounts of memory or storage. 120
DYN_KVBM_METRICS Enable metrics endpoint false
DYN_KVBM_METRICS_PORT Metrics port 6880
DYN_KVBM_DISABLE_DISK_OFFLOAD_FILTER Disable disk offload filtering to remove SSD lifespan protection false

Disk Storage Configuration

Why special configuration may be needed:

Some filesystems (e.g., Lustre, certain network filesystems) don't support fallocate(), which KVBM uses for fast disk space allocation. Additionally, KVBM uses O_DIRECT I/O for GPU DirectStorage (GDS) performance, which requires strict 4096-byte alignment.

Setup for filesystems without fallocate() support:

export DYN_KVBM_DISK_CACHE_DIR=/mnt/storage/kvbm_cache
export DYN_KVBM_DISK_ZEROFILL_FALLBACK=true  # Enables zero-fill fallback when fallocate() unsupported

What happens:

  • Without ZEROFILL_FALLBACK=true: Disk cache allocation may fail with "Operation not supported"
  • With ZEROFILL_FALLBACK=true: KVBM writes zeros using page-aligned buffers compatible with O_DIRECT requirements

Troubleshooting: If you encounter "write all error" or EINVAL (errno 22), try disabling O_DIRECT: export DYN_KVBM_DISK_DISABLE_O_DIRECT=true

vLLM

DYN_KVBM_CPU_CACHE_GB=100 vllm serve \
  --kv-transfer-config '{"kv_connector":"DynamoConnector","kv_role":"kv_both","kv_connector_module_path":"kvbm.vllm_integration.connector"}' \
  Qwen/Qwen3-8B

For more detailed integration with dynamo, disaggregated serving support and benchmarking, please check vllm-setup

TensorRT-LLM

cat >/tmp/kvbm_llm_api_config.yaml <<EOF
cuda_graph_config: null
kv_cache_config:
  enable_partial_reuse: false
  free_gpu_memory_fraction: 0.80
kv_connector_config:
  connector_module: kvbm.trtllm_integration.connector
  connector_scheduler_class: DynamoKVBMConnectorLeader
  connector_worker_class: DynamoKVBMConnectorWorker
EOF

DYN_KVBM_CPU_CACHE_GB=100 trtllm-serve Qwen/Qwen3-8B \
  --host localhost --port 8000 \
  --backend pytorch \
  --extra_llm_api_options /tmp/kvbm_llm_api_config.yaml

For more detailed integration with dynamo and benchmarking, please check trtllm-setup

📚 Docs

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

kvbm-0.8.1-cp310-abi3-manylinux_2_28_x86_64.whl (10.9 MB view details)

Uploaded CPython 3.10+manylinux: glibc 2.28+ x86-64

kvbm-0.8.1-cp310-abi3-manylinux_2_28_aarch64.whl (10.0 MB view details)

Uploaded CPython 3.10+manylinux: glibc 2.28+ ARM64

File details

Details for the file kvbm-0.8.1-cp310-abi3-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for kvbm-0.8.1-cp310-abi3-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 97621dcc85ac16203fa8e0db5845208d2bf5787fc669c45825aec243d4d3ed28
MD5 125c26fd86793c557cdb0654c0af08a1
BLAKE2b-256 9f9b9f249c8b9527fab87956f6e70483e447aac0342fb004b3186dae1b0ee07f

See more details on using hashes here.

File details

Details for the file kvbm-0.8.1-cp310-abi3-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for kvbm-0.8.1-cp310-abi3-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 35b439f550e99d8374623f63871ae4e208d4687e7e801a7ccba05cdb7dcb4546
MD5 30c45438e099d46766bb310d25312929
BLAKE2b-256 558081e9577eca30d651d2d2456e079cc08d2f9c6e4eab068584c9b7b98caae6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page