Last released Jun 13, 2026
Optimized CUDAgraph-enabled kernels and attention backend for vLLM, SGLang and more based on TurboQuant near-lossless KV cache compression. SOTA performance with Gemma 4, Qwen 3.6 and other modern LLMs.
Last released Apr 14, 2026
Python client for the ARBI API
Supported by