Skip to main content

Norm-Aware KVQuant: Precision Where It Counts

Project description

Norm-Aware KV Cache Quantization

Installation

To install the package from PyPI, run the following command:

pip install kvq

Usage

  1. Initialization

    1.1. Creating a KVQ object using a configuration object:

    import torch
    from kvq import KVQ, KVQCacheConfig
    
    config = KVQCacheConfig(
        nbits_k=4,
        nbits_v=2,
        axis_key=0,
        axis_value=0,
        q_group_size=64,
        residual_length=128,
        compute_dtype=torch.bfloat16,
        backend="quanto",
        device=model.device,
    )
    kvq = KVQ(config)

    1.2. Creating a KVQ object directly from a dictionary:

    kvq_dict = {
        "nbits_k": 4,
        "nbits_v": 2,
        "axis_key": 0,
        "axis_value": 0,
        "q_group_size": 64,
        "residual_length": 128,
        "compute_dtype": torch.bfloat16,
        "backend": "quanto",
        "device": model.device,
    }
    kvq = KVQ(kvq_dict)
  2. Using KVQ during text generation with a transformer model

    # Assume 'model' is a transformer-like model (e.g. Llama, Mistral, ...)
    # that supports caching past key-value states.
    
    outputs = model.generate(
        **inputs,
        max_new_tokens=1024,
        use_cache=True,
        past_key_values=kvq,
    )
    print(outputs)

GitHub Repository

The source code is hosted on GitHub:

https://github.com/mohsenhariri/kvq

Feel free to open issues, suggest improvements, or submit pull requests!

Citation

If you find our work useful or interesting, please consider citing our paper:

@article{hariri2025quantize,
title     = {Quantize What Counts: Bit Allocation Insights Informed by Spectral Gaps in Keys and Values},
author    = {Hariri, Mohsen and Luo, Alan and Nemati, Mohammadreza and Nguyen, Lam and Zhong, Shaochen and Wang, Qifan and Hu, Xia and Han, Xiaotian and Chaudhary, Vipin},
journal   = {arXiv preprint arXiv:2502.15075},
year      = {2025},
url       = {https://arxiv.org/abs/2502.15075v2},
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kvq-0.0.5.tar.gz (21.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

kvq-0.0.5-py3-none-any.whl (20.2 kB view details)

Uploaded Python 3

File details

Details for the file kvq-0.0.5.tar.gz.

File metadata

  • Download URL: kvq-0.0.5.tar.gz
  • Upload date:
  • Size: 21.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for kvq-0.0.5.tar.gz
Algorithm Hash digest
SHA256 66761e4bc9f1b4764f8dde0fad5a2dee6c6ff972f3156e32db27f06e7423cd7c
MD5 eeb7cf54489d6b4a5a580bafaa3f6ca4
BLAKE2b-256 38de1c07ff3bf88d923a29a8c5fe7c587520e212599c78cb02dbeed4919385dd

See more details on using hashes here.

File details

Details for the file kvq-0.0.5-py3-none-any.whl.

File metadata

  • Download URL: kvq-0.0.5-py3-none-any.whl
  • Upload date:
  • Size: 20.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for kvq-0.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 134d2bf46842eb23ba10b0fbf30aa86d3bd590b1f915d1711938202464ed7cfd
MD5 4b78d6e3c2436222d2a0d336a1096f80
BLAKE2b-256 866cf78de6ee6b5558b527a0442e75c1da3c17d8549ed87a6460df1e9ee5485f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page