Last released Jun 16, 2026
Attention-aware KV cache quantization for LLM inference (KVQuant++ extensions)
Supported by