Last released Jun 1, 2026
Retrieval-preserving hierarchical KV cache compression for long-context LLM inference
Supported by