fraQtl runtime — drop-in KV cache compression + INT3-resident weight loading for HuggingFace transformers.

These details have not been verified by PyPI

Project links

Project description

fraQtl

5x KV cache compression. +0.002 PPL. 7 models, 3B–70B. One line of code.

Runtime KV-cache compression via the Attention Importance Kernel. Protect the directions that matter. Quantize the rest. Drop-in, no retraining, production-ready.

Results (verified, 7 models)

Model	Params	Arch	k=16	k=32
Mistral 7B	7B	GQA-8	+0.019	+0.007
Llama 3.2 3B	3B	GQA-3	+0.043	+0.011
Llama-2-7B	7B	MHA-32	+0.022	+0.007
Qwen 2.5 3B	3B	GQA-2	+0.034	+0.010
Llama 3.1 8B	8B	GQA-8	+0.034	+0.025
Llama-2-13B	13B	MHA-40	+0.019	+0.005
Llama 3.1 70B	70B	GQA-8	+0.079	+0.019

All measured at runtime on live KV cache. Split prefill/eval methodology. Same config everywhere.

vs Competition (Llama-2-7B)

Method	PPL Delta	Compression
fraQtl k=32	+0.007	5x
fraQtl k=16	+0.022	5x
KVQuant 2-bit	+0.27	~5x
KIVI K2V2	+1.00	~5x

Memory at Scale

Context	KV Cache (FP16)	fraQtl 5x	Savings
4K	2.1 GB	430 MB	1.7 GB
32K	17 GB	3.4 GB	14 GB
128K	69 GB	14 GB	55 GB

Install

pip install git+https://github.com/samuelsalfati/fraqtl.git

Quick Start

import fraqtl

# Authenticate (get token at fraqtl.ai)
fraqtl.login("sk_fraqtl_...")

# Compress
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-v0.1",
                                              torch_dtype="float16", device_map="auto")

model = fraqtl.aipress_kv(model, calib_seqs)
# That's it. Serve normally.

CLI

fraqtl compress --model mistralai/Mistral-7B-v0.1 --k 16 --eval
fraqtl analyze --model mistralai/Mistral-7B-v0.1

How It Works

Eigenbasis — compute the Attention Importance Kernel (V^T alpha^T alpha V) from one forward pass
Protect — top-k eigendirections at full precision
Sacrifice — remaining directions at INT3
Zero overhead — W_O fusion absorbs rotation into weights

Paper

"The Right Basis, Not the Right Subspace: Downstream-Optimal Quantization for KV-Cache Compression"

Samuel Salfati, Cornell University

Patent

Patent pending (filed April 6, 2026).

License

Proprietary. Early access available at fraqtl.ai.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.1.1

Apr 30, 2026

This version

0.1.0

Apr 27, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

fraqtl_runtime-0.1.0-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (857.3 kB view details)

Uploaded Apr 27, 2026 CPython 3.11manylinux: glibc 2.17+ x86-64

File details

Details for the file fraqtl_runtime-0.1.0-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.

File metadata

Download URL: fraqtl_runtime-0.1.0-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl
Upload date: Apr 27, 2026
Size: 857.3 kB
Tags: CPython 3.11, manylinux: glibc 2.17+ x86-64
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for fraqtl_runtime-0.1.0-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl
Algorithm	Hash digest
SHA256	`df98b4b4fe6361d0c29a1159e26cf6488c68365e17ccad23354bf2e568dd2f65`
MD5	`934e30160268c9c4130b25d162658b57`
BLAKE2b-256	`c8a7d9a393cf6385fe9af13fdeba6a39adbed8d7ca3332741a0de31436c32e81`

See more details on using hashes here.

fraqtl-runtime 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

fraQtl

Results (verified, 7 models)

vs Competition (Llama-2-7B)

Memory at Scale

Install

Quick Start

CLI

How It Works

Paper

Patent

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distribution

File details

File metadata

File hashes