Unified toolkit for benchmarking and integrating TurboQuant+ KV-cache compression across inference engines (llama.cpp, vLLM, MLX).
Project description
tqkit
Unified toolkit for benchmarking and integrating TurboQuant+ KV-cache compression across LLM inference engines.
What this is
tqkit is a single CLI and Python package that talks to every inference engine that ships TurboQuant+ KV-cache compression:
- llama.cpp (TheTom/llama.cpp@feature/turboquant-kv-cache)
- vLLM (CUDA) (TheTom/vllm@feature/turboquant-kv-cache)
- vLLM (AMD ROCm) (TheTom/vllm@feature/turboquant-amd)
- MLX-Swift (TheTom/mlx@feature/turboquant-plus)
- vllm-swift plugin
You bring the inference engine. tqkit autodetects what's installed, runs the canonical benchmark, and prints a reproducible KV-savings table.
Why this exists
KV cache is the dominant memory cost at long context. TurboQuant+ shrinks it ~70% with negligible accuracy loss. The savings replicate across engines and hardware vendors. tqkit is the proof, the tool, and the install path.
A 14B model's KV cache at 1M tokens in FP16 is ~200 GB. With TurboQuant+ asymmetric quantization, it's ~56 GB — small enough to fit on a single MI300X.
Install
pip install tqkit
Usage
tq backends # autodetect installed engines + versions
tq bench # run canonical KV-savings benchmark
tq report # print the most recent KV-cache layout report
tq integrate <backend> # print install + serve recipe for one engine
Status
v0.1.0 — alpha. Backend detection + version reporting work. Canonical bench runner and per-engine bridges land in v0.2.0.
License
Apache 2.0.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file tqkit-0.1.0.tar.gz.
File metadata
- Download URL: tqkit-0.1.0.tar.gz
- Upload date:
- Size: 8.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
424005d72296ceb3b6236173b8bb9797cfceee1d3847093411f9ad383cc02a0f
|
|
| MD5 |
6e1a7cd1e18f6d7b3bbca5c359ef65ca
|
|
| BLAKE2b-256 |
cffc9181d4a10d00a4bfb741d404d0b16f6be96840f13064f2d199faba626430
|
File details
Details for the file tqkit-0.1.0-py3-none-any.whl.
File metadata
- Download URL: tqkit-0.1.0-py3-none-any.whl
- Upload date:
- Size: 9.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d898df99c0d62e8f8142299448b8bb1e4aaa7e464bdf03ea0c0191fb95474e2f
|
|
| MD5 |
e21166235231c5360b1eedb36d843ea5
|
|
| BLAKE2b-256 |
8834cc016b5efdf71e604fc4f2d886bc2734eb1a85780c79a26bfe039ab54e1f
|