Skip to main content

Apohara ContextForge plugin for vLLM V1 — multi-agent KV-cache coordination, JCR Safety Gate (INV-15), RotateKV INT4 hooks, on AMD Instinct MI300X.

Project description

apohara-vllm-plugin

Multi-agent KV-cache coordination as a vLLM V1 plugin. Drop it next to vLLM and it self-registers through the vllm.general_plugins entry-point group: no patching, no fork.

pip install apohara-vllm-plugin

The plugin's job inside vLLM is:

  1. Anchor-aware KV-block routing via SimHash LSH lookup against the ContextForge registry (cross-agent block reuse).
  2. RotateKV pre-RoPE INT4 quantization hooks (INVARIANT 10: pre-RoPE only).
  3. JCR Safety Gate (INV-15) enforcement — judge / critic agents with JCR risk > 0.7 are forced into dense prefill, bypassing the shared cache. See arXiv:2601.08343.
  4. Honest metrics — every flag in the hook's return dict reflects state (what actually ran), not intent (what the config asked for).

This is the thin published shim over the in-tree implementation at apohara_context_forge.serving.romy_plugin.

Quick usage

Inside vLLM (automatic)

vLLM walks vllm.general_plugins at worker startup. No code change:

pip install vllm apohara-vllm-plugin
python -m vllm.entrypoints.openai.api_server --model Qwen/Qwen3-235B-A22B

You should see in the vLLM startup log:

ROMY plugin initialised: worker=… deps={…}

Cross-worker KV reuse is wired separately, config-driven via --kv-transfer-config (LMCache) — not by this plugin. See LMCACHE.md.

Manually (for tests / inspection)

from apohara_vllm_plugin import register

plugin = register()
assert plugin.is_initialized()
print(plugin.get_stats())

The plugin is constructible without vLLM installed.

Wiring real ContextForge dependencies

By default the plugin runs as a no-op telemetry surface (every flag in the metadata dict reports False / None honestly). Inject the real subsystems through vLLMRomyPlugin(...):

from apohara_vllm_plugin import vLLMRomyPlugin, ROMYConfig
from apohara_context_forge.quantization.rotate_kv import (
    RotateKVConfig, RotateKVQuantizer,
)
from apohara_context_forge.dedup.lsh_engine import LSHTokenMatcher
from apohara_context_forge.safety.jcr_gate import JCRSafetyGate
from apohara_context_forge.metrics.collector import MetricsCollector

plugin = vLLMRomyPlugin(
    ROMYConfig(),
    quantizer=RotateKVQuantizer(RotateKVConfig()),
    lsh_matcher=LSHTokenMatcher(),
    jcr_gate=JCRSafetyGate(),
    metrics=MetricsCollector(),
)
plugin.initialize("worker_0", vllm_config={})

pre_attention_hook / post_attention_hook are unit-tested, importable utilities for inspecting reuse/quantization decisions; they are NOT cabled to the vLLM runtime (no such vLLM platform attention-hook API exists). The runtime cross-worker KV path is config-driven via --kv-transfer-config (LMCache).

Honest semantics

V6.1+ flags in the pre-attention hook's return dict:

Flag True iff
quantization_attempted enable_quantization=True and a quantizer was wired
quantization_applied a quantizer was wired and it actually executed without raising
quantized (alias) same as quantization_applied — kept for back-compat
pre_rope always True — INV-10: this hook never operates on post-RoPE tensors
anchor_match None if no LSH matcher wired; else lookup descriptor
jcr_dense True iff JCR Safety Gate fired INV-15 for this call

Returning True when nothing happened is the pattern we're explicitly fixing in V6.1 — see the project root AUDIT.md.

Citation

If this plugin or the underlying mechanisms help your work, please cite:

@misc{contextforge,
  author    = {Suarez, Pablo M.},
  title     = {{ContextForge: A Unified KV-Cache Coordination Layer
                for Multi-Agent LLM Pipelines on AMD Instinct MI300X}},
  year      = {2026},
  publisher = {Zenodo},
  doi       = {10.5281/zenodo.20114594},
  url       = {https://doi.org/10.5281/zenodo.20114594}
}

License

Apache-2.0.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

apohara_vllm_plugin-0.1.0.tar.gz (10.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

apohara_vllm_plugin-0.1.0-py3-none-any.whl (9.6 kB view details)

Uploaded Python 3

File details

Details for the file apohara_vllm_plugin-0.1.0.tar.gz.

File metadata

  • Download URL: apohara_vllm_plugin-0.1.0.tar.gz
  • Upload date:
  • Size: 10.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for apohara_vllm_plugin-0.1.0.tar.gz
Algorithm Hash digest
SHA256 aaa7502faeb7667e5444e4ff8dfcf497feb30a14ed6189cce57a5f0a55a587c5
MD5 24e1be15a8b9f54b616cc0c8dae28645
BLAKE2b-256 15c67c6bb8e1d50ed48af224d337d389850b78e34b25c5379568f74f841611ea

See more details on using hashes here.

Provenance

The following attestation bundles were made for apohara_vllm_plugin-0.1.0.tar.gz:

Publisher: release-plugin.yml on SuarezPM/Apohara_Context_Forge

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file apohara_vllm_plugin-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for apohara_vllm_plugin-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 134cca68e4feb5ced7c511aaacbe6a979ff26ac874cc9f008df00c169b9830b9
MD5 12ec5ecc8a8618e63ff701253ce3354e
BLAKE2b-256 c2cff455857ae2c60776c7a3359483847cffef05eaba7a376027718a157a83f7

See more details on using hashes here.

Provenance

The following attestation bundles were made for apohara_vllm_plugin-0.1.0-py3-none-any.whl:

Publisher: release-plugin.yml on SuarezPM/Apohara_Context_Forge

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page