Last released Oct 29, 2024
LMCache: prefill your long contexts only once
lmcache_vllm: LMCache's wrapper for vllm
Last released Sep 20, 2024
GPU based arithmetic coding for LLM KV compression
Supported by