Skip to main content

Dynamo KVBM

Project description

Dynamo KVBM

The Dynamo KVBM is a distributed KV-cache block management system designed for scalable LLM inference. It cleanly separates memory management from inference runtimes (vLLM, TensorRT-LLM, and SGLang), enabling GPU↔CPU↔Disk/Remote tiering, asynchronous block offload/onboard, and efficient block reuse.

A block diagram showing a layered architecture view of Dynamo KV Block manager.

Feature Highlights

  • Distributed KV-Cache Management: Unified GPU↔CPU↔Disk↔Remote tiering for scalable LLM inference.
  • Async Offload & Reuse: Seamlessly move KV blocks between memory tiers using GDS-accelerated transfers powered by NIXL, without recomputation.
  • Runtime-Agnostic: Works out-of-the-box with vLLM, TensorRT-LLM, and SGLang via lightweight connectors.
  • Memory-Safe & Modular: RAII lifecycle and pluggable design for reliability, portability, and backend extensibility.

Installation

pip install kvbm

See the support matrix for version compatibility questions.

Build from Source

The pip wheel is built through a Docker build process:

# Render and build the Docker image with KVBM enabled (from the dynamo repo root)
python container/render.py --framework dynamo --target runtime --output-short-filename
docker build --build-arg ENABLE_KVBM="true" -f container/rendered.Dockerfile .

Once built, you can either:

Option 1: Run and use the container directly

./container/run.sh --framework none -it

Option 2: Extract the wheel file to your local filesystem

# Create a temporary container from the built image
docker create --name temp-kvbm-container local-kvbm:latest

# Copy the KVBM wheel to your current directory
docker cp temp-kvbm-container:/opt/dynamo/wheelhouse/ ./dynamo_wheelhouse

# Clean up the temporary container
docker rm temp-kvbm-container

# Install the wheel locally
pip install ./dynamo_wheelhouse/kvbm*.whl

Note that the default pip wheel built is not compatible with CUDA 13 at the moment.

Integrations

Environment Variables

Variable Description Default
DYN_KVBM_CPU_CACHE_GB CPU pinned memory cache size (GB) required
DYN_KVBM_DISK_CACHE_GB SSD Disk/Storage system cache size (GB) optional
DYN_KVBM_DISK_CACHE_DIR Disk cache directory /tmp/
DYN_KVBM_DISK_ZEROFILL_FALLBACK Enable zero-fill when fallocate() unsupported (e.g., Lustre) false
DYN_KVBM_DISK_DISABLE_O_DIRECT Disable O_DIRECT for disk I/O (debug/compatibility) false
DYN_KVBM_LEADER_WORKER_INIT_TIMEOUT_SECS Timeout (in seconds) for the KVBM leader and worker to synchronize and allocate the required memory and storage. Increase this value if allocating large amounts of memory or storage. 120
DYN_KVBM_METRICS Enable metrics endpoint false
DYN_KVBM_METRICS_PORT Metrics port 6880
DYN_KVBM_DISABLE_DISK_OFFLOAD_FILTER Disable disk offload filtering to remove SSD lifespan protection false
DYN_KVBM_HOST_OFFLOAD_PREFIX_MIN_PRIORITY Minimum priority (0-100) for CPU offload with contiguous (prefix) semantics: offloading stops at the first block below threshold, and all subsequent blocks are also skipped. Used for priority-based filtering. 0 (no filtering)

Disk Storage Configuration

Why special configuration may be needed:

Some filesystems (e.g., Lustre, certain network filesystems) don't support fallocate(), which KVBM uses for fast disk space allocation. Additionally, KVBM uses O_DIRECT I/O for GPU DirectStorage (GDS) performance, which requires strict 4096-byte alignment.

Setup for filesystems without fallocate() support:

export DYN_KVBM_DISK_CACHE_DIR=/mnt/storage/kvbm_cache
export DYN_KVBM_DISK_ZEROFILL_FALLBACK=true  # Enables zero-fill fallback when fallocate() unsupported

What happens:

  • Without ZEROFILL_FALLBACK=true: Disk cache allocation may fail with "Operation not supported"
  • With ZEROFILL_FALLBACK=true: KVBM writes zeros using page-aligned buffers compatible with O_DIRECT requirements

Troubleshooting: If you encounter "write all error" or EINVAL (errno 22), try disabling O_DIRECT: export DYN_KVBM_DISK_DISABLE_O_DIRECT=true

vLLM

DYN_KVBM_CPU_CACHE_GB=100 vllm serve \
  --kv-transfer-config '{"kv_connector":"DynamoConnector","kv_role":"kv_both","kv_connector_module_path":"kvbm.vllm_integration.connector"}' \
  Qwen/Qwen3-8B

For more detailed integration with dynamo, disaggregated serving support and benchmarking, please check vllm-setup

TensorRT-LLM

cat >/tmp/kvbm_llm_api_config.yaml <<EOF
cuda_graph_config: null
kv_cache_config:
  enable_partial_reuse: false
  free_gpu_memory_fraction: 0.80
kv_connector_config:
  connector_module: kvbm.trtllm_integration.connector
  connector_scheduler_class: DynamoKVBMConnectorLeader
  connector_worker_class: DynamoKVBMConnectorWorker
EOF

DYN_KVBM_CPU_CACHE_GB=100 trtllm-serve Qwen/Qwen3-8B \
  --host localhost --port 8000 \
  --backend pytorch \
  --extra_llm_api_options /tmp/kvbm_llm_api_config.yaml

For more detailed integration with dynamo and benchmarking, please check trtllm-setup

📚 Docs

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

kvbm-1.0.2-cp310-abi3-manylinux_2_28_x86_64.whl (11.1 MB view details)

Uploaded CPython 3.10+manylinux: glibc 2.28+ x86-64

kvbm-1.0.2-cp310-abi3-manylinux_2_28_aarch64.whl (10.0 MB view details)

Uploaded CPython 3.10+manylinux: glibc 2.28+ ARM64

File details

Details for the file kvbm-1.0.2-cp310-abi3-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for kvbm-1.0.2-cp310-abi3-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 96531e9590902b599be10ef2b355d2cf49ab58de0d99f3d64ed4f152f4cddf2e
MD5 4b95d90d245cf9a1a2c1257f01b35895
BLAKE2b-256 7e9a88832b709d6b3d4b2dfd4123686788746555e905bcaef09f16b1da3d88c5

See more details on using hashes here.

File details

Details for the file kvbm-1.0.2-cp310-abi3-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for kvbm-1.0.2-cp310-abi3-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 e15e4f446b3b238360a5bc7cf2ff899edbdcaa77980a943551f081658c14aa18
MD5 08d463f7f8f1347e5f8a16e35d179f85
BLAKE2b-256 53b3ba6a38ed95844c547fe24d6da7d518c72e48e5275b07979400426e650d7e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page