4 projects
turboquant
TurboQuant KV cache compression for LLM inference. Open-source pip-installable implementation for HuggingFace models.
turboquant-vectors
Compress and protect embeddings with TurboQuant. Zero-loss privacy via orthogonal rotation + 8x compression. No training needed.
quantsim-bench
Which quantization should I use? One command benchmarks every quant level on YOUR GPU.
kvcache-bench
Benchmark every KV cache compression method on your GPU. One command, real numbers.