Out-of-tree vLLM KVConnector for SemBlend semantic KV donor discovery
Project description
SemBlend vLLM Connector
vLLM KVConnector for SemBlend-backed semantic KV donor discovery.
This repo is the open-source adapter layer between vLLM and SemBlend.
SemBlend is a semantic KV reuse research library. It exists to evaluate when
similar prompts may safely reuse or blend previously computed KV state. This
connector exposes that work through vLLM's KVConnectorBase_V1
lifecycle.
Status
Experimental.
Default behavior is discovery-only:
- exact vLLM prefix caching remains authoritative;
- semantic lookup runs only after exact prefix coverage is insufficient;
- the connector records donor hits, misses, and rejection reasons;
- it returns
(0, False)fromget_num_new_matched_tokens()unless a future materialization mode can prove that the KV can be loaded safely; - normal vLLM execution continues on every provider error or unsupported case.
Install
From PyPI:
pip install "semblend-vllm-connector[semblend]"
Development:
pip install -e ".[semblend,dev]"
Run local checks:
make check
vLLM Configuration
Discovery-only mode:
vllm serve meta-llama/Llama-3.1-8B-Instruct \
--enable-prefix-caching \
--kv-transfer-config '{
"kv_connector": "SemBlendVllmConnector",
"kv_connector_module_path": "semblend_vllm_connector.connector",
"kv_role": "kv_both",
"kv_load_failure_policy": "recompute",
"kv_connector_extra_config": {
"mode": "discovery_only",
"provider": "local",
"min_prompt_tokens": 256,
"min_similarity": 0.70
}
}'
SemBlend provider mode:
{
"kv_connector": "SemBlendVllmConnector",
"kv_connector_module_path": "semblend_vllm_connector.connector",
"kv_role": "kv_both",
"kv_load_failure_policy": "recompute",
"kv_connector_extra_config": {
"mode": "discovery_only",
"provider": "semblend",
"min_prompt_tokens": 256,
"min_similarity": 0.70,
"min_reuse_ratio": 0.50,
"embedder_type": "minilm",
"model_id": "meta-llama/Llama-3.1-8B-Instruct"
}
}
Equivalent JSON examples live in examples/.
Modes
| Mode | Positive matched tokens? | Purpose |
|---|---|---|
discovery_only |
No | Safe telemetry and workload qualification. |
exact_prefix |
Only with engine-valid exact block refs | Future safe materialization path. |
request_only_experimental |
Yes, block-aligned prefix only | Isolated validation mode; run with vLLM prefix caching disabled. |
segmented_experimental |
Not enabled in this repo yet | Requires segmented/sparse execution support. |
Safety Rules
The connector must not:
- weaken exact prefix-cache semantics;
- report semantic hits as computed tokens unless KV can actually be loaded;
- publish non-identical semantic donor KV into vLLM's exact prefix cache;
- cross model, tokenizer, adapter, or cache-salt namespaces;
- fail inference because semantic lookup failed.
Repository Layout
src/semblend_vllm_connector/
connector.py vLLM KVConnectorBase_V1 implementation
config.py config/env parsing
provider.py provider protocol + local deterministic provider
providers/
semblend.py lazy SemBlendPipeline adapter
types.py shared dataclasses/enums
namespace.py vLLM request namespace extraction
docs/
ARCHITECTURE.md detailed architecture and rollout plan
SEMBLEND_PROVIDER.md
VLLM_CONNECTOR_CONTRACT.md
examples/
discovery_kv_transfer_config.json
semblend_discovery_kv_transfer_config.json
Open Source Posture
This project follows the dynamic connector pattern used by mature vLLM KV cache
projects: vLLM loads the connector from a Python module path, connector-specific
settings live in kv_connector_extra_config, and unsafe materialization cases
fail closed to normal vLLM prefill.
See:
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file semblend_vllm_connector-0.1.0.tar.gz.
File metadata
- Download URL: semblend_vllm_connector-0.1.0.tar.gz
- Upload date:
- Size: 25.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2fbc69d401ab84442fc6c655bfb603e477af825e0bcf92b201aeb055e6030877
|
|
| MD5 |
13618d67d34d814adee848e356407a70
|
|
| BLAKE2b-256 |
3e6a7ca343a869a4556c26b460b38087dee0d161df4f25068b1900e609fc3ad9
|
File details
Details for the file semblend_vllm_connector-0.1.0-py3-none-any.whl.
File metadata
- Download URL: semblend_vllm_connector-0.1.0-py3-none-any.whl
- Upload date:
- Size: 20.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3d5950056df8044a1a1fe7ec53570c87033aaa78d1360e8e6c2173eceadeb536
|
|
| MD5 |
23e1d1984b9abcc98b00522b6ada675e
|
|
| BLAKE2b-256 |
6c28491ef7af4e3699a6e39bb9d1f28548e084e08f26ab9cb6c94b60f9c22af9
|