The inference kernels for LeanQuant models.

Project description

LeanQuant

Overview

This package provides efficient inference kernels for running non-uniformly quantized LeanQuant models on CUDA-enabled GPUs. LeanQuant is a scalable and accurate quantization algorithm that compresses large language models by 4-8x while maintaining competitive performance.

Installation

Ensure your GPU supports CUDA 11 or CUDA 12. You can check your CUDA version with the command nvidia-smi | grep CUDA.

To install:

# For CUDA 11.x
pip install leanquant[cuda11]

# For CUDA 12.x
pip install leanquant[cuda12]

Models

Quantized LeanQuant models are available for download on our HuggingFace page: huggingface.co/LeanQuant

Technical Details

LeanQuant introduces an innovative loss-error-aware grid approach to quantization that significantly outperforms traditional methods. Our technique:

Achieves superior compression ratios: Reduces model size by 4-8x without sacrificing capability
Preserves model intelligence: Maintains performance comparable to full-precision models across challenging benchmarks
Optimizes GPU execution: Features custom CUDA kernels specifically designed for non-uniform quantization format

The algorithm strategically allocates quantization precision based on parameter sensitivity, ensuring computational resources are focused where they matter most.

For a comprehensive explanation of our methodology and benchmark results, please refer to our research paper.

Citation

If you find LeanQuant useful in your research or applications, please consider citing our work:

@inproceedings{
    zhang2025leanquant,
    title={LeanQuant: Accurate and Scalable Large Language Model Quantization with Loss-error-aware Grid},
    author={Tianyi Zhang and Anshumali Shrivastava},
    booktitle={The Thirteenth International Conference on Learning Representations},
    year={2025},
    url={https://openreview.net/forum?id=ISqx8giekS}
}

Project details

Release history Release notifications | RSS feed

This version

0.1.1

Feb 24, 2025

0.1.0 yanked

Feb 24, 2025

Reason this release was yanked:

Polish README

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

leanquant-0.1.1.tar.gz (5.3 kB view details)

Uploaded Feb 24, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

leanquant-0.1.1-py3-none-any.whl (5.6 kB view details)

Uploaded Feb 24, 2025 Python 3

File details

Details for the file leanquant-0.1.1.tar.gz.

File metadata

Download URL: leanquant-0.1.1.tar.gz
Upload date: Feb 24, 2025
Size: 5.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.10.16

File hashes

Hashes for leanquant-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`5b269964faa392adffdd07f07528debecbe43ba27d9b64f0ae149a6450068c35`
MD5	`a04c2040ee4121ce3462c66d88b5bea3`
BLAKE2b-256	`e70eee91cad8fff5cdc9c543f17eba09c80a374ebcdf5836662dfbeb737d4f57`

See more details on using hashes here.

File details

Details for the file leanquant-0.1.1-py3-none-any.whl.

File metadata

Download URL: leanquant-0.1.1-py3-none-any.whl
Upload date: Feb 24, 2025
Size: 5.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.10.16

File hashes

Hashes for leanquant-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ba123232c54910ae7985cbf3d6109133292d42fca7b72710535454760a961b61`
MD5	`4f44fbe76dfef27b1b8b286b94d8a414`
BLAKE2b-256	`83212839ba4a349c27a971ecae978986949a0180f06c1ceec806df012f4e3e13`

See more details on using hashes here.

leanquant 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

LeanQuant

Overview

Installation

Models

Technical Details

Citation

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes