Apohara ContextForge plugin for vLLM V1 — multi-agent KV-cache coordination, JCR Safety Gate (INV-15), RotateKV INT4 hooks, on AMD Instinct MI300X.
Project description
apohara-vllm-plugin
Multi-agent KV-cache coordination as a vLLM V1
plugin. Drop it next to vLLM and it self-registers through the
vllm.general_plugins entry-point group: no patching, no fork.
pip install apohara-vllm-plugin
The plugin's job inside vLLM is:
- Anchor-aware KV-block routing via SimHash LSH lookup against the ContextForge registry (cross-agent block reuse).
- RotateKV pre-RoPE INT4 quantization hooks (INVARIANT 10: pre-RoPE only).
- JCR Safety Gate (INV-15) enforcement — judge / critic agents
with
JCR risk > 0.7are forced into dense prefill, bypassing the shared cache. See arXiv:2601.08343. - Honest metrics — every flag in the hook's return dict reflects state (what actually ran), not intent (what the config asked for).
This is the thin published shim over the in-tree implementation at
apohara_context_forge.serving.romy_plugin.
Quick usage
Inside vLLM (automatic)
vLLM walks vllm.general_plugins at worker startup. No code change:
pip install vllm apohara-vllm-plugin
python -m vllm.entrypoints.openai.api_server --model Qwen/Qwen3-235B-A22B
You should see in the vLLM startup log:
ROMY plugin initialised: worker=… deps={…}
Cross-worker KV reuse is wired separately, config-driven via
--kv-transfer-config (LMCache) — not by this plugin. See
LMCACHE.md.
Manually (for tests / inspection)
from apohara_vllm_plugin import register
plugin = register()
assert plugin.is_initialized()
print(plugin.get_stats())
The plugin is constructible without vLLM installed.
Wiring real ContextForge dependencies
By default the plugin runs as a no-op telemetry surface (every flag in
the metadata dict reports False / None honestly). Inject the real
subsystems through vLLMRomyPlugin(...):
from apohara_vllm_plugin import vLLMRomyPlugin, ROMYConfig
from apohara_context_forge.quantization.rotate_kv import (
RotateKVConfig, RotateKVQuantizer,
)
from apohara_context_forge.dedup.lsh_engine import LSHTokenMatcher
from apohara_context_forge.safety.jcr_gate import JCRSafetyGate
from apohara_context_forge.metrics.collector import MetricsCollector
plugin = vLLMRomyPlugin(
ROMYConfig(),
quantizer=RotateKVQuantizer(RotateKVConfig()),
lsh_matcher=LSHTokenMatcher(),
jcr_gate=JCRSafetyGate(),
metrics=MetricsCollector(),
)
plugin.initialize("worker_0", vllm_config={})
pre_attention_hook / post_attention_hook are unit-tested,
importable utilities for inspecting reuse/quantization decisions; they
are NOT cabled to the vLLM runtime (no such vLLM platform attention-hook
API exists). The runtime cross-worker KV path is config-driven via
--kv-transfer-config (LMCache).
Honest semantics
V6.1+ flags in the pre-attention hook's return dict:
| Flag | True iff |
|---|---|
quantization_attempted |
enable_quantization=True and a quantizer was wired |
quantization_applied |
a quantizer was wired and it actually executed without raising |
quantized (alias) |
same as quantization_applied — kept for back-compat |
pre_rope |
always True — INV-10: this hook never operates on post-RoPE tensors |
anchor_match |
None if no LSH matcher wired; else lookup descriptor |
jcr_dense |
True iff JCR Safety Gate fired INV-15 for this call |
Returning True when nothing happened is the pattern we're explicitly
fixing in V6.1 — see the project root AUDIT.md.
Citation
If this plugin or the underlying mechanisms help your work, please cite:
@misc{contextforge,
author = {Suarez, Pablo M.},
title = {{ContextForge: A Unified KV-Cache Coordination Layer
for Multi-Agent LLM Pipelines on AMD Instinct MI300X}},
year = {2026},
publisher = {Zenodo},
doi = {10.5281/zenodo.20114594},
url = {https://doi.org/10.5281/zenodo.20114594}
}
License
Apache-2.0.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file apohara_vllm_plugin-0.1.0.tar.gz.
File metadata
- Download URL: apohara_vllm_plugin-0.1.0.tar.gz
- Upload date:
- Size: 10.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
aaa7502faeb7667e5444e4ff8dfcf497feb30a14ed6189cce57a5f0a55a587c5
|
|
| MD5 |
24e1be15a8b9f54b616cc0c8dae28645
|
|
| BLAKE2b-256 |
15c67c6bb8e1d50ed48af224d337d389850b78e34b25c5379568f74f841611ea
|
Provenance
The following attestation bundles were made for apohara_vllm_plugin-0.1.0.tar.gz:
Publisher:
release-plugin.yml on SuarezPM/Apohara_Context_Forge
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
apohara_vllm_plugin-0.1.0.tar.gz -
Subject digest:
aaa7502faeb7667e5444e4ff8dfcf497feb30a14ed6189cce57a5f0a55a587c5 - Sigstore transparency entry: 1706517836
- Sigstore integration time:
-
Permalink:
SuarezPM/Apohara_Context_Forge@4134f827307d551d54360c233feee844114e323f -
Branch / Tag:
refs/tags/vllm-plugin-v0.1.0 - Owner: https://github.com/SuarezPM
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release-plugin.yml@4134f827307d551d54360c233feee844114e323f -
Trigger Event:
push
-
Statement type:
File details
Details for the file apohara_vllm_plugin-0.1.0-py3-none-any.whl.
File metadata
- Download URL: apohara_vllm_plugin-0.1.0-py3-none-any.whl
- Upload date:
- Size: 9.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
134cca68e4feb5ced7c511aaacbe6a979ff26ac874cc9f008df00c169b9830b9
|
|
| MD5 |
12ec5ecc8a8618e63ff701253ce3354e
|
|
| BLAKE2b-256 |
c2cff455857ae2c60776c7a3359483847cffef05eaba7a376027718a157a83f7
|
Provenance
The following attestation bundles were made for apohara_vllm_plugin-0.1.0-py3-none-any.whl:
Publisher:
release-plugin.yml on SuarezPM/Apohara_Context_Forge
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
apohara_vllm_plugin-0.1.0-py3-none-any.whl -
Subject digest:
134cca68e4feb5ced7c511aaacbe6a979ff26ac874cc9f008df00c169b9830b9 - Sigstore transparency entry: 1706518047
- Sigstore integration time:
-
Permalink:
SuarezPM/Apohara_Context_Forge@4134f827307d551d54360c233feee844114e323f -
Branch / Tag:
refs/tags/vllm-plugin-v0.1.0 - Owner: https://github.com/SuarezPM
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release-plugin.yml@4134f827307d551d54360c233feee844114e323f -
Trigger Event:
push
-
Statement type: