Skip to main content

Out-of-tree vLLM KVConnector for SemBlend semantic KV donor discovery

Project description

SemBlend vLLM Connector

CI License Python

vLLM KVConnector for SemBlend-backed semantic KV donor discovery.

This repo is the open-source adapter layer between vLLM and SemBlend.

SemBlend is a semantic KV reuse research library. It exists to evaluate when similar prompts may safely reuse or blend previously computed KV state. This connector exposes that work through vLLM's KVConnectorBase_V1 lifecycle.

Status

Experimental.

Default behavior is discovery-only:

  • exact vLLM prefix caching remains authoritative;
  • semantic lookup runs only after exact prefix coverage is insufficient;
  • the connector records donor hits, misses, and rejection reasons;
  • it returns (0, False) from get_num_new_matched_tokens() unless a future materialization mode can prove that the KV can be loaded safely;
  • normal vLLM execution continues on every provider error or unsupported case.

Install

From PyPI:

pip install "semblend-vllm-connector[semblend]"

Development:

pip install -e ".[semblend,dev]"

Run local checks:

make check

vLLM Configuration

Discovery-only mode:

vllm serve meta-llama/Llama-3.1-8B-Instruct \
  --enable-prefix-caching \
  --kv-transfer-config '{
    "kv_connector": "SemBlendVllmConnector",
    "kv_connector_module_path": "semblend_vllm_connector.connector",
    "kv_role": "kv_both",
    "kv_load_failure_policy": "recompute",
    "kv_connector_extra_config": {
      "mode": "discovery_only",
      "provider": "local",
      "min_prompt_tokens": 256,
      "min_similarity": 0.70
    }
  }'

SemBlend provider mode:

{
  "kv_connector": "SemBlendVllmConnector",
  "kv_connector_module_path": "semblend_vllm_connector.connector",
  "kv_role": "kv_both",
  "kv_load_failure_policy": "recompute",
  "kv_connector_extra_config": {
    "mode": "discovery_only",
    "provider": "semblend",
    "min_prompt_tokens": 256,
    "min_similarity": 0.70,
    "min_reuse_ratio": 0.50,
    "embedder_type": "minilm",
    "model_id": "meta-llama/Llama-3.1-8B-Instruct"
  }
}

Equivalent JSON examples live in examples/.

Modes

Mode Positive matched tokens? Purpose
discovery_only No Safe telemetry and workload qualification.
exact_prefix Only with engine-valid exact block refs Future safe materialization path.
request_only_experimental Yes, block-aligned prefix only Isolated validation mode; run with vLLM prefix caching disabled.
segmented_experimental Not enabled in this repo yet Requires segmented/sparse execution support.

Safety Rules

The connector must not:

  • weaken exact prefix-cache semantics;
  • report semantic hits as computed tokens unless KV can actually be loaded;
  • publish non-identical semantic donor KV into vLLM's exact prefix cache;
  • cross model, tokenizer, adapter, or cache-salt namespaces;
  • fail inference because semantic lookup failed.

Repository Layout

src/semblend_vllm_connector/
  connector.py        vLLM KVConnectorBase_V1 implementation
  config.py           config/env parsing
  provider.py         provider protocol + local deterministic provider
  providers/
    semblend.py       lazy SemBlendPipeline adapter
  types.py            shared dataclasses/enums
  namespace.py        vLLM request namespace extraction

docs/
  ARCHITECTURE.md     detailed architecture and rollout plan
  SEMBLEND_PROVIDER.md
  VLLM_CONNECTOR_CONTRACT.md

examples/
  discovery_kv_transfer_config.json
  semblend_discovery_kv_transfer_config.json

Open Source Posture

This project follows the dynamic connector pattern used by mature vLLM KV cache projects: vLLM loads the connector from a Python module path, connector-specific settings live in kv_connector_extra_config, and unsafe materialization cases fail closed to normal vLLM prefill.

See:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

semblend_vllm_connector-0.1.0.tar.gz (25.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

semblend_vllm_connector-0.1.0-py3-none-any.whl (20.3 kB view details)

Uploaded Python 3

File details

Details for the file semblend_vllm_connector-0.1.0.tar.gz.

File metadata

  • Download URL: semblend_vllm_connector-0.1.0.tar.gz
  • Upload date:
  • Size: 25.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for semblend_vllm_connector-0.1.0.tar.gz
Algorithm Hash digest
SHA256 2fbc69d401ab84442fc6c655bfb603e477af825e0bcf92b201aeb055e6030877
MD5 13618d67d34d814adee848e356407a70
BLAKE2b-256 3e6a7ca343a869a4556c26b460b38087dee0d161df4f25068b1900e609fc3ad9

See more details on using hashes here.

File details

Details for the file semblend_vllm_connector-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for semblend_vllm_connector-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3d5950056df8044a1a1fe7ec53570c87033aaa78d1360e8e6c2173eceadeb536
MD5 23e1d1984b9abcc98b00522b6ada675e
BLAKE2b-256 6c28491ef7af4e3699a6e39bb9d1f28548e084e08f26ab9cb6c94b60f9c22af9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page