The inference kernels for LeanQuant models.
Project description
LeanQuant
Overview
This package provides efficient inference kernels for running non-uniformly quantized LeanQuant models on CUDA-enabled GPUs. LeanQuant is a scalable and accurate quantization algorithm that compresses large language models by 4-8x while maintaining competitive performance.
Installation
Ensure your GPU supports CUDA 11 or CUDA 12. You can check your CUDA version with the command nvidia-smi | grep CUDA.
To install:
# For CUDA 11.x
pip install leanquant[cuda11]
# For CUDA 12.x
pip install leanquant[cuda12]
Models
Quantized LeanQuant models are available for download on our HuggingFace page: huggingface.co/LeanQuant
Technical Details
LeanQuant introduces an innovative loss-error-aware grid approach to quantization that significantly outperforms traditional methods. Our technique:
- Achieves superior compression ratios: Reduces model size by 4-8x without sacrificing capability
- Preserves model intelligence: Maintains performance comparable to full-precision models across challenging benchmarks
- Optimizes GPU execution: Features custom CUDA kernels specifically designed for non-uniform quantization format
The algorithm strategically allocates quantization precision based on parameter sensitivity, ensuring computational resources are focused where they matter most.
For a comprehensive explanation of our methodology and benchmark results, please refer to our research paper.
Citation
If you find LeanQuant useful in your research or applications, please consider citing our work:
@inproceedings{
zhang2025leanquant,
title={LeanQuant: Accurate and Scalable Large Language Model Quantization with Loss-error-aware Grid},
author={Tianyi Zhang and Anshumali Shrivastava},
booktitle={The Thirteenth International Conference on Learning Representations},
year={2025},
url={https://openreview.net/forum?id=ISqx8giekS}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file leanquant-0.1.1.tar.gz.
File metadata
- Download URL: leanquant-0.1.1.tar.gz
- Upload date:
- Size: 5.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.16
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5b269964faa392adffdd07f07528debecbe43ba27d9b64f0ae149a6450068c35
|
|
| MD5 |
a04c2040ee4121ce3462c66d88b5bea3
|
|
| BLAKE2b-256 |
e70eee91cad8fff5cdc9c543f17eba09c80a374ebcdf5836662dfbeb737d4f57
|
File details
Details for the file leanquant-0.1.1-py3-none-any.whl.
File metadata
- Download URL: leanquant-0.1.1-py3-none-any.whl
- Upload date:
- Size: 5.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.16
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ba123232c54910ae7985cbf3d6109133292d42fca7b72710535454760a961b61
|
|
| MD5 |
4f44fbe76dfef27b1b8b286b94d8a414
|
|
| BLAKE2b-256 |
83212839ba4a349c27a971ecae978986949a0180f06c1ceec806df012f4e3e13
|