Skip to main content

Norm-Aware KVQuant: Precision Where It Counts

Project description

Norm-Aware KV Cache Quantization

Installation

To install the package from PyPI, run the following command:

pip install kvq

Usage

  1. Initialization

    1.1. Creating a KVQ object using a configuration object:

    import torch
    from kvq import KVQConfig, KVQ
    
    
    config = KVQConfig(
        budget = 4,
        model="meta-llama/Llama-3.1-8B-Instruct"
        residual_length=32,
        group_size={"k": 64, "v": 64}, # Group size for keys and values
        axis={"k": 0, "v": 0}, # Axis along which to quantize
    )
    
    kv_cache = KVQ(config)
    
    text = "What is the meaning of life?"
    
    inputs = tokenizer(text, return_tensors="pt").to(model.device)
    
    outputs = model.generate(
        **inputs,
        max_new_tokens=256,
        past_key_values=kv_cache,
        use_cache=True,
        pad_token_id=tokenizer.eos_token_id,
    )

GitHub Repository

The source code is hosted on GitHub:

https://github.com/mohsenhariri/spectral-kv

Feel free to open issues, suggest improvements, or submit pull requests!

Citation

If you find our work useful or interesting, please consider citing our paper:

@article{hariri2025quantize,
title     = {Quantize What Counts: Bit Allocation Insights Informed by Spectral Gaps in Keys and Values},
author    = {Hariri, Mohsen and Luo, Alan and Nemati, Mohammadreza and Nguyen, Lam and Zhong, Shaochen and Wang, Qifan and Hu, Xia and Han, Xiaotian and Chaudhary, Vipin},
journal   = {arXiv preprint arXiv:2502.15075},
year      = {2025},
url       = {https://arxiv.org/abs/2502.15075v2},
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kvq-0.1.0.tar.gz (28.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

kvq-0.1.0-py3-none-any.whl (20.6 kB view details)

Uploaded Python 3

File details

Details for the file kvq-0.1.0.tar.gz.

File metadata

  • Download URL: kvq-0.1.0.tar.gz
  • Upload date:
  • Size: 28.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.11

File hashes

Hashes for kvq-0.1.0.tar.gz
Algorithm Hash digest
SHA256 4cc5891ba4945b550e9d1cf84f68dc48a50ba41da1d7c557add323d4830d1c0f
MD5 c1b824dba0b20f4563c0962df288bd4b
BLAKE2b-256 f293beab503a0abfceb4b3923284caa5deb0c338d9e42d99af27d5cd6bfac889

See more details on using hashes here.

File details

Details for the file kvq-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: kvq-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 20.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.11

File hashes

Hashes for kvq-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1e0eb38b0e7c176d247e4a982e10972d148a5f16c228128e8c53e13b60a0e105
MD5 2aa02e1ba8ed03fe25b787795f1d352d
BLAKE2b-256 e6e1e9fd0cdc2deacb5caaea1e80549d24efb33542be83d840c39e8c51b84109

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page