Dynamo KVBM

These details have not been verified by PyPI

Project links

Homepage

Project description

Dynamo KVBM

The Dynamo KVBM is a distributed KV-cache block management system designed for scalable LLM inference. It cleanly separates memory management from inference runtimes (vLLM, TensorRT-LLM, and SGLang), enabling GPU↔CPU↔Disk/Remote tiering, asynchronous block offload/onboard, and efficient block reuse.

A block diagram showing a layered architecture view of Dynamo KV Block manager.

Feature Highlights

Distributed KV-Cache Management: Unified GPU↔CPU↔Disk↔Remote tiering for scalable LLM inference.
Async Offload & Reuse: Seamlessly move KV blocks between memory tiers using GDS-accelerated transfers powered by NIXL, without recomputation.
Runtime-Agnostic: Works out-of-the-box with vLLM, TensorRT-LLM, and SGLang via lightweight connectors.
Memory-Safe & Modular: RAII lifecycle and pluggable design for reliability, portability, and backend extensibility.

Installation

pip install kvbm

See the support matrix for version compatibility questions.

Build from Source

The pip wheel is built through a Docker build process:

# Render and build the Docker image with KVBM enabled (from the dynamo repo root)
python container/render.py --framework dynamo --target runtime --output-short-filename
docker build --build-arg ENABLE_KVBM="true" -f container/rendered.Dockerfile .

Once built, you can either:

Option 1: Run and use the container directly

./container/run.sh --framework none -it

Option 2: Extract the wheel file to your local filesystem

# Create a temporary container from the built image
docker create --name temp-kvbm-container local-kvbm:latest

# Copy the KVBM wheel to your current directory
docker cp temp-kvbm-container:/opt/dynamo/wheelhouse/ ./dynamo_wheelhouse

# Clean up the temporary container
docker rm temp-kvbm-container

# Install the wheel locally
pip install ./dynamo_wheelhouse/kvbm*.whl

Note that the default pip wheel built is not compatible with CUDA 13 at the moment.

Integrations

Environment Variables

Variable	Description	Default
`DYN_KVBM_CPU_CACHE_GB`	CPU pinned memory cache size (GB)	required
`DYN_KVBM_DISK_CACHE_GB`	SSD Disk/Storage system cache size (GB)	optional
`DYN_KVBM_DISK_CACHE_DIR`	Disk cache directory	`/tmp/`
`DYN_KVBM_DISK_ZEROFILL_FALLBACK`	Enable zero-fill when `fallocate()` unsupported (e.g., Lustre)	`false`
`DYN_KVBM_DISK_DISABLE_O_DIRECT`	Disable O_DIRECT for disk I/O (debug/compatibility)	`false`
`DYN_KVBM_LEADER_WORKER_INIT_TIMEOUT_SECS`	Timeout (in seconds) for the KVBM leader and worker to synchronize and allocate the required memory and storage. Increase this value if allocating large amounts of memory or storage.	120
`DYN_KVBM_METRICS`	Enable metrics endpoint	`false`
`DYN_KVBM_METRICS_PORT`	Metrics port	`6880`
`DYN_KVBM_DISABLE_DISK_OFFLOAD_FILTER`	Disable disk offload filtering to remove SSD lifespan protection	`false`
`DYN_KVBM_HOST_OFFLOAD_PREFIX_MIN_PRIORITY`	Minimum priority (0-100) for CPU offload with contiguous (prefix) semantics: offloading stops at the first block below threshold, and all subsequent blocks are also skipped. Used for priority-based filtering.	`0` (no filtering)
`DYN_KVBM_NCCL_MLA_MODE`	Enable NCCL replicated mode for MLA (Multi-Layer Attention) models (e.g., DeepSeek). When set to `true`, rank 0 loads KV blocks from G2/G3 storage and broadcasts to all GPUs via NCCL instead of each GPU loading independently. Requires MPI and optional `nccl` feature for optimal behavior.	`false`

Disk Storage Configuration

Why special configuration may be needed:

Some filesystems (e.g., Lustre, certain network filesystems) don't support fallocate(), which KVBM uses for fast disk space allocation. Additionally, KVBM uses O_DIRECT I/O for GPU DirectStorage (GDS) performance, which requires strict 4096-byte alignment.

Setup for filesystems without fallocate() support:

export DYN_KVBM_DISK_CACHE_DIR=/mnt/storage/kvbm_cache
export DYN_KVBM_DISK_ZEROFILL_FALLBACK=true  # Enables zero-fill fallback when fallocate() unsupported

What happens:

Without ZEROFILL_FALLBACK=true: Disk cache allocation may fail with "Operation not supported"
With ZEROFILL_FALLBACK=true: KVBM writes zeros using page-aligned buffers compatible with O_DIRECT requirements

Troubleshooting: If you encounter "write all error" or EINVAL (errno 22), try disabling O_DIRECT: export DYN_KVBM_DISK_DISABLE_O_DIRECT=true

vLLM

DYN_KVBM_CPU_CACHE_GB=100 vllm serve \
  --kv-transfer-config '{"kv_connector":"DynamoConnector","kv_role":"kv_both","kv_connector_module_path":"kvbm.vllm_integration.connector"}' \
  Qwen/Qwen3-8B

For more detailed integration with dynamo, disaggregated serving support and benchmarking, please check vllm-setup

TensorRT-LLM

cat >/tmp/kvbm_llm_api_config.yaml <<EOF
cuda_graph_config: null
kv_cache_config:
  enable_partial_reuse: false
  free_gpu_memory_fraction: 0.80
kv_connector_config:
  connector_module: kvbm.trtllm_integration.connector
  connector_scheduler_class: DynamoKVBMConnectorLeader
  connector_worker_class: DynamoKVBMConnectorWorker
EOF

DYN_KVBM_CPU_CACHE_GB=100 trtllm-serve Qwen/Qwen3-8B \
  --host localhost --port 8000 \
  --backend pytorch \
  --extra_llm_api_options /tmp/kvbm_llm_api_config.yaml

For more detailed integration with dynamo and benchmarking, please check trtllm-setup

📚 Docs

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

1.1.1

May 9, 2026

1.1.0

May 4, 2026

1.0.2

Apr 22, 2026

1.0.1

Mar 16, 2026

1.0.0

Mar 13, 2026

0.9.1

Mar 4, 2026

0.9.0

Feb 12, 2026

0.8.1

Jan 23, 2026

0.8.0

Jan 15, 2026

0.7.1

Dec 15, 2025

0.7.0.post1

Dec 6, 2025

0.7.0

Nov 26, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

kvbm-1.1.1-cp310-abi3-manylinux_2_28_x86_64.whl (11.1 MB view details)

Uploaded May 9, 2026 CPython 3.10+manylinux: glibc 2.28+ x86-64

kvbm-1.1.1-cp310-abi3-manylinux_2_28_aarch64.whl (9.7 MB view details)

Uploaded May 9, 2026 CPython 3.10+manylinux: glibc 2.28+ ARM64

File details

Details for the file kvbm-1.1.1-cp310-abi3-manylinux_2_28_x86_64.whl.

File metadata

Download URL: kvbm-1.1.1-cp310-abi3-manylinux_2_28_x86_64.whl
Upload date: May 9, 2026
Size: 11.1 MB
Tags: CPython 3.10+, manylinux: glibc 2.28+ x86-64
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for kvbm-1.1.1-cp310-abi3-manylinux_2_28_x86_64.whl
Algorithm	Hash digest
SHA256	`51946d8a197911beeaca185d9e3e4fe95dcc7140cd3f5d33250513f34ca2b297`
MD5	`16d2d08562a46c53fded115971ecf621`
BLAKE2b-256	`00ab0d645197d0b8f3ab3b533c5965bd8b7b067a306b49fe5de58aa311d828de`

See more details on using hashes here.

File details

Details for the file kvbm-1.1.1-cp310-abi3-manylinux_2_28_aarch64.whl.

File metadata

Download URL: kvbm-1.1.1-cp310-abi3-manylinux_2_28_aarch64.whl
Upload date: May 9, 2026
Size: 9.7 MB
Tags: CPython 3.10+, manylinux: glibc 2.28+ ARM64
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for kvbm-1.1.1-cp310-abi3-manylinux_2_28_aarch64.whl
Algorithm	Hash digest
SHA256	`f30bf0ddd6fe4feee2a0c9913058933b63227e4bf09db7c1cc38c09a4e39a9a6`
MD5	`24b7a8323220bb1b57a3acbddb1c31ed`
BLAKE2b-256	`59b0db2aecb3beb4a21440414270f52cd4113918cdf9e5b2f15c2d899f8003cb`

See more details on using hashes here.

kvbm 1.1.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Dynamo KVBM

Feature Highlights

Installation

Build from Source

Integrations

Environment Variables

Disk Storage Configuration

vLLM

TensorRT-LLM

📚 Docs

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distributions

File details

File metadata

File hashes

File details

File metadata

File hashes