Skip to main content

fraQtl runtime — drop-in KV cache compression + INT3-resident weight loading for HuggingFace transformers.

Project description

fraQtl

5x KV cache compression. +0.002 PPL. 7 models, 3B–70B. One line of code.

Runtime KV-cache compression via the Attention Importance Kernel. Protect the directions that matter. Quantize the rest. Drop-in, no retraining, production-ready.

Results (verified, 7 models)

Model Params Arch k=16 k=32
Mistral 7B 7B GQA-8 +0.019 +0.007
Llama 3.2 3B 3B GQA-3 +0.043 +0.011
Llama-2-7B 7B MHA-32 +0.022 +0.007
Qwen 2.5 3B 3B GQA-2 +0.034 +0.010
Llama 3.1 8B 8B GQA-8 +0.034 +0.025
Llama-2-13B 13B MHA-40 +0.019 +0.005
Llama 3.1 70B 70B GQA-8 +0.079 +0.019

All measured at runtime on live KV cache. Split prefill/eval methodology. Same config everywhere.

vs Competition (Llama-2-7B)

Method PPL Delta Compression
fraQtl k=32 +0.007 5x
fraQtl k=16 +0.022 5x
KVQuant 2-bit +0.27 ~5x
KIVI K2V2 +1.00 ~5x

Memory at Scale

Context KV Cache (FP16) fraQtl 5x Savings
4K 2.1 GB 430 MB 1.7 GB
32K 17 GB 3.4 GB 14 GB
128K 69 GB 14 GB 55 GB

Install

pip install git+https://github.com/samuelsalfati/fraqtl.git

Quick Start

import fraqtl

# Authenticate (get token at fraqtl.ai)
fraqtl.login("sk_fraqtl_...")

# Compress
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-v0.1",
                                              torch_dtype="float16", device_map="auto")

model = fraqtl.aipress_kv(model, calib_seqs)
# That's it. Serve normally.

CLI

fraqtl compress --model mistralai/Mistral-7B-v0.1 --k 16 --eval
fraqtl analyze --model mistralai/Mistral-7B-v0.1

How It Works

  1. Eigenbasis — compute the Attention Importance Kernel (V^T alpha^T alpha V) from one forward pass
  2. Protect — top-k eigendirections at full precision
  3. Sacrifice — remaining directions at INT3
  4. Zero overhead — W_O fusion absorbs rotation into weights

Paper

"The Right Basis, Not the Right Subspace: Downstream-Optimal Quantization for KV-Cache Compression"

Samuel Salfati, Cornell University

Patent

Patent pending (filed April 6, 2026).

License

Proprietary. Early access available at fraqtl.ai.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fraqtl_runtime-0.1.0-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (857.3 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

File details

Details for the file fraqtl_runtime-0.1.0-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.

File metadata

File hashes

Hashes for fraqtl_runtime-0.1.0-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 df98b4b4fe6361d0c29a1159e26cf6488c68365e17ccad23354bf2e568dd2f65
MD5 934e30160268c9c4130b25d162658b57
BLAKE2b-256 c8a7d9a393cf6385fe9af13fdeba6a39adbed8d7ca3332741a0de31436c32e81

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page