Skip to main content

Unified toolkit for benchmarking and integrating TurboQuant+ KV-cache compression across inference engines (llama.cpp, vLLM, MLX).

Project description

tqkit

Unified toolkit for benchmarking and integrating TurboQuant+ KV-cache compression across LLM inference engines.

What this is

tqkit is a single CLI and Python package that talks to every inference engine that ships TurboQuant+ KV-cache compression:

You bring the inference engine. tqkit autodetects what's installed, runs the canonical benchmark, and prints a reproducible KV-savings table.

Why this exists

KV cache is the dominant memory cost at long context. TurboQuant+ shrinks it ~70% with negligible accuracy loss. The savings replicate across engines and hardware vendors. tqkit is the proof, the tool, and the install path.

A 14B model's KV cache at 1M tokens in FP16 is ~200 GB. With TurboQuant+ asymmetric quantization, it's ~56 GB — small enough to fit on a single MI300X.

Install

pip install tqkit

Usage

tq backends                # autodetect installed engines + versions
tq bench                   # run canonical KV-savings benchmark
tq report                  # print the most recent KV-cache layout report
tq integrate <backend>     # print install + serve recipe for one engine

Status

v0.1.0 — alpha. Backend detection + version reporting work. Canonical bench runner and per-engine bridges land in v0.2.0.

License

Apache 2.0.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tqkit-0.1.0.tar.gz (8.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tqkit-0.1.0-py3-none-any.whl (9.3 kB view details)

Uploaded Python 3

File details

Details for the file tqkit-0.1.0.tar.gz.

File metadata

  • Download URL: tqkit-0.1.0.tar.gz
  • Upload date:
  • Size: 8.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for tqkit-0.1.0.tar.gz
Algorithm Hash digest
SHA256 424005d72296ceb3b6236173b8bb9797cfceee1d3847093411f9ad383cc02a0f
MD5 6e1a7cd1e18f6d7b3bbca5c359ef65ca
BLAKE2b-256 cffc9181d4a10d00a4bfb741d404d0b16f6be96840f13064f2d199faba626430

See more details on using hashes here.

File details

Details for the file tqkit-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: tqkit-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 9.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for tqkit-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d898df99c0d62e8f8142299448b8bb1e4aaa7e464bdf03ea0c0191fb95474e2f
MD5 e21166235231c5360b1eedb36d843ea5
BLAKE2b-256 8834cc016b5efdf71e604fc4f2d886bc2734eb1a85780c79a26bfe039ab54e1f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page