Apohara ContextForge plugin for vLLM V1 — multi-agent KV-cache coordination, JCR Safety Gate (INV-15), RotateKV INT4 hooks, on AMD Instinct MI300X.

These details have not been verified by PyPI

Project links

Paper

Project description

apohara-vllm-plugin

Multi-agent KV-cache coordination as a vLLM V1 plugin. Drop it next to vLLM and it self-registers through the vllm.general_plugins entry-point group: no patching, no fork.

pip install apohara-vllm-plugin

The plugin's job inside vLLM is:

Anchor-aware KV-block routing via SimHash LSH lookup against the ContextForge registry (cross-agent block reuse).
RotateKV pre-RoPE INT4 quantization hooks (INVARIANT 10: pre-RoPE only).
JCR Safety Gate (INV-15) enforcement — judge / critic agents with JCR risk > 0.7 are forced into dense prefill, bypassing the shared cache. See arXiv:2601.08343.
Honest metrics — every flag in the hook's return dict reflects state (what actually ran), not intent (what the config asked for).

This is the thin published shim over the in-tree implementation at apohara_context_forge.serving.romy_plugin.

Quick usage

Inside vLLM (automatic)

vLLM walks vllm.general_plugins at worker startup. No code change:

pip install vllm apohara-vllm-plugin
python -m vllm.entrypoints.openai.api_server --model Qwen/Qwen3-235B-A22B

You should see in the vLLM startup log:

ROMY plugin initialised: worker=… deps={…}

Cross-worker KV reuse is wired separately, config-driven via --kv-transfer-config (LMCache) — not by this plugin. See LMCACHE.md.

Manually (for tests / inspection)

from apohara_vllm_plugin import register

plugin = register()
assert plugin.is_initialized()
print(plugin.get_stats())

The plugin is constructible without vLLM installed.

Wiring real ContextForge dependencies

By default the plugin runs as a no-op telemetry surface (every flag in the metadata dict reports False / None honestly). Inject the real subsystems through vLLMRomyPlugin(...):

from apohara_vllm_plugin import vLLMRomyPlugin, ROMYConfig
from apohara_context_forge.quantization.rotate_kv import (
    RotateKVConfig, RotateKVQuantizer,
)
from apohara_context_forge.dedup.lsh_engine import LSHTokenMatcher
from apohara_context_forge.safety.jcr_gate import JCRSafetyGate
from apohara_context_forge.metrics.collector import MetricsCollector

plugin = vLLMRomyPlugin(
    ROMYConfig(),
    quantizer=RotateKVQuantizer(RotateKVConfig()),
    lsh_matcher=LSHTokenMatcher(),
    jcr_gate=JCRSafetyGate(),
    metrics=MetricsCollector(),
)
plugin.initialize("worker_0", vllm_config={})

pre_attention_hook / post_attention_hook are unit-tested, importable utilities for inspecting reuse/quantization decisions; they are NOT cabled to the vLLM runtime (no such vLLM platform attention-hook API exists). The runtime cross-worker KV path is config-driven via --kv-transfer-config (LMCache).

Honest semantics

V6.1+ flags in the pre-attention hook's return dict:

Flag	True iff
`quantization_attempted`	`enable_quantization=True` and a quantizer was wired
`quantization_applied`	a quantizer was wired and it actually executed without raising
`quantized` (alias)	same as `quantization_applied` — kept for back-compat
`pre_rope`	always `True` — INV-10: this hook never operates on post-RoPE tensors
`anchor_match`	`None` if no LSH matcher wired; else lookup descriptor
`jcr_dense`	`True` iff JCR Safety Gate fired INV-15 for this call

Returning True when nothing happened is the pattern we're explicitly fixing in V6.1 — see the project root AUDIT.md.

Citation

If this plugin or the underlying mechanisms help your work, please cite:

@misc{contextforge,
  author    = {Suarez, Pablo M.},
  title     = {{ContextForge: A Unified KV-Cache Coordination Layer
                for Multi-Agent LLM Pipelines on AMD Instinct MI300X}},
  year      = {2026},
  publisher = {Zenodo},
  doi       = {10.5281/zenodo.20114594},
  url       = {https://doi.org/10.5281/zenodo.20114594}
}

License

Apache-2.0.

Project details

These details have not been verified by PyPI

Project links

Paper

Release history Release notifications | RSS feed

This version

0.1.0

Jun 3, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

apohara_vllm_plugin-0.1.0.tar.gz (10.8 kB view details)

Uploaded Jun 3, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

apohara_vllm_plugin-0.1.0-py3-none-any.whl (9.6 kB view details)

Uploaded Jun 3, 2026 Python 3

File details

Details for the file apohara_vllm_plugin-0.1.0.tar.gz.

File metadata

Download URL: apohara_vllm_plugin-0.1.0.tar.gz
Upload date: Jun 3, 2026
Size: 10.8 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for apohara_vllm_plugin-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`aaa7502faeb7667e5444e4ff8dfcf497feb30a14ed6189cce57a5f0a55a587c5`
MD5	`24e1be15a8b9f54b616cc0c8dae28645`
BLAKE2b-256	`15c67c6bb8e1d50ed48af224d337d389850b78e34b25c5379568f74f841611ea`

See more details on using hashes here.

Provenance

The following attestation bundles were made for apohara_vllm_plugin-0.1.0.tar.gz:

Publisher: release-plugin.yml on SuarezPM/Apohara_Context_Forge

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: apohara_vllm_plugin-0.1.0.tar.gz
- Subject digest: aaa7502faeb7667e5444e4ff8dfcf497feb30a14ed6189cce57a5f0a55a587c5
- Sigstore transparency entry: 1706517836
- Sigstore integration time: Jun 3, 2026
Source repository:
- Permalink: SuarezPM/Apohara_Context_Forge@4134f827307d551d54360c233feee844114e323f
- Branch / Tag: refs/tags/vllm-plugin-v0.1.0
- Owner: https://github.com/SuarezPM
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release-plugin.yml@4134f827307d551d54360c233feee844114e323f
- Trigger Event: push

File details

Details for the file apohara_vllm_plugin-0.1.0-py3-none-any.whl.

File metadata

Download URL: apohara_vllm_plugin-0.1.0-py3-none-any.whl
Upload date: Jun 3, 2026
Size: 9.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for apohara_vllm_plugin-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`134cca68e4feb5ced7c511aaacbe6a979ff26ac874cc9f008df00c169b9830b9`
MD5	`12ec5ecc8a8618e63ff701253ce3354e`
BLAKE2b-256	`c2cff455857ae2c60776c7a3359483847cffef05eaba7a376027718a157a83f7`

See more details on using hashes here.

Provenance

The following attestation bundles were made for apohara_vllm_plugin-0.1.0-py3-none-any.whl:

Publisher: release-plugin.yml on SuarezPM/Apohara_Context_Forge

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: apohara_vllm_plugin-0.1.0-py3-none-any.whl
- Subject digest: 134cca68e4feb5ced7c511aaacbe6a979ff26ac874cc9f008df00c169b9830b9
- Sigstore transparency entry: 1706518047
- Sigstore integration time: Jun 3, 2026
Source repository:
- Permalink: SuarezPM/Apohara_Context_Forge@4134f827307d551d54360c233feee844114e323f
- Branch / Tag: refs/tags/vllm-plugin-v0.1.0
- Owner: https://github.com/SuarezPM
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release-plugin.yml@4134f827307d551d54360c233feee844114e323f
- Trigger Event: push

apohara-vllm-plugin 0.1.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

apohara-vllm-plugin

Quick usage

Inside vLLM (automatic)

Manually (for tests / inspection)

Wiring real ContextForge dependencies

Honest semantics

Citation

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance