Uni-Quant: CUDA-accelerated quantization/dequantization for TensorFlow models
Project description
Uni-Quant
Small library to quantize/dequantize TensorFlow models using PyTorch CUDA kernels.
Requirements
- Python: 3.13.13 (haven't tested on any other)
- CUDA Toolkit: >=12.8
- Python Dependencies: All required packages are listed in
requirements.txt
Installing Dependencies
pip install -r requirements.txt
Installation from pip
pip install uni-quant-cuda
Usage
Importing Functions
from uniquant import quantize, dequantize, dequantize_save
Main Functions
quantize(model_path, quant_directory="", quant_name="", pack_size=32, quant_size=4, overwrite=False)
Quantizes a TensorFlow or XGBoost model.
Arguments:
model_path(str): Path to the model to quantize (with extension)quant_directory(str): Directory path to save the quantized modelquant_name(str): Filename for the quantized modelpack_size(int): Number of weights in one quantization batch (must be divisible by 2)quant_size(int): Number of bits per weight (available: 4 or 8)overwrite(bool): Whether to overwrite existing file
dequantize(quant_path, literal=False, balanced=True)
Dequantizes a model and returns it.
Arguments:
quant_path(str): Path to the .uniq file to dequantizeliteral(bool): Whether weights should be unscaledbalanced(bool): Whether weights should be balanced around 0
dequantize_save(quant_path, model_directory="", model_name="", overwrite=False)
Dequantizes a model, saves it, and returns it.
Arguments:
quant_path(str): Path to the .uniq file to dequantizemodel_directory(str): Directory path to save the dequantized modelmodel_name(str): Filename for the dequantized modeloverwrite(bool): Whether to overwrite existing file
Notes
- This package compiles CUDA kernels at runtime using
torch.utils.cpp_extension.load_inline. - Installing and using the CUDA compilation requires a compatible CUDA toolkit on the target machine (tested with >=12.8).
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file uni_quant_cuda-0.2.8.tar.gz.
File metadata
- Download URL: uni_quant_cuda-0.2.8.tar.gz
- Upload date:
- Size: 17.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
46dd5f06928c41d120797bb264180ccd7817bc6b4d4a47ce85465f0a7195fef5
|
|
| MD5 |
32a0bb831ce7ed7968f7a61b690b27f8
|
|
| BLAKE2b-256 |
4f4227f775c28fd82e634ab1703f9d9b32f54a19a726cedc30c601679d5dfcba
|
File details
Details for the file uni_quant_cuda-0.2.8-py3-none-any.whl.
File metadata
- Download URL: uni_quant_cuda-0.2.8-py3-none-any.whl
- Upload date:
- Size: 13.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fe1ebd81e4a1ec2cf3f7fd27c5bd0896e07342f469a705787f51fcb567fbc147
|
|
| MD5 |
dce1e4ef32a5058193856e59fce7d43b
|
|
| BLAKE2b-256 |
47fc7f6d16dca50f72172e49a501e65222c83d207fc485d77d700d59a38d1b04
|