GPU inference for losslessly compressed (DFloat11) large language models
Project description
DFloat11: Lossless LLM Compression for Efficient GPU Inference
DFloat11 is a lossless compression framework that reduces the size of Large Language Models (LLMs) by approximately 30% while preserving bit-for-bit identical outputs to the original model. It enables efficient GPU inference on resource-constrained hardware without sacrificing accuracy.
📦 Installation
Requires CUDA-compatible GPU, and PyTorch installed.
pip install dfloat11[cuda12]
# or if you have CUDA version 11:
# pip install dfloat11[cuda11]
🔧 Key Features
- 📉 Significant size reduction: Compresses LLM weights by ~30%, losslessly.
- ✅ Zero loss in accuracy: Produces bit-for-bit identical outputs to the original BFloat16 model.
- 🧩 Easy to use: Seamlessly integrates with HuggingFace framework.
- ⚡ High throughput: Enables up to 38.8× faster generation compared to CPU offloading alternatives.
- 🧠 Supports longer inputs: Extends maximum context length by up to 13.17× under the same GPU memory budget.
🔗 Links
👉 Explore pre-compressed DFloat11 models ready to use on HuggingFace: https://huggingface.co/DFloat11
📂 Official Code Repository: https://github.com/LeanModels/DFloat11
🧪 Quickstart
from dfloat11 import DFloat11ModelForCausalLM
model = DFloat11ModelForCausalLM.from_pretrained(
"<huggingface-model-name>",
"<path-to-dfloat11-model>",
device_map='auto',
)
# model is ready to use like a regular huggingface model
📚 Citation
If you found our work useful or interesting, please consider citing our paper:
@misc{zhang2025dfloat11,
title = {70\% Size, 100\% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float},
author = {Tianyi Zhang and Yang Sui and Shaochen Zhong and Vipin Chaudhary and Xia Hu and Anshumali Shrivastava},
year = {2025},
eprint = {2504.11651},
archivePrefix= {arXiv},
primaryClass = {cs.LG},
url = {https://arxiv.org/abs/2504.11651}
}
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dfloat11-0.1.0.tar.gz.
File metadata
- Download URL: dfloat11-0.1.0.tar.gz
- Upload date:
- Size: 10.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.16
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
59e9f1ccd838f6c66b913f357d180b124d596ce606df2cd47f00768b0bf80ab1
|
|
| MD5 |
b33e0cb784923a899d6bb2c26ecba1a9
|
|
| BLAKE2b-256 |
fc6b9bde2b7785c3a57dafed8299b8f0a39a8be3e3c771f032c2d99cbde2342b
|
File details
Details for the file dfloat11-0.1.0-py3-none-any.whl.
File metadata
- Download URL: dfloat11-0.1.0-py3-none-any.whl
- Upload date:
- Size: 10.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.16
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7651dd71927196b04c1abc40269baf576e62dfc9a245d82378ba40634fb53b35
|
|
| MD5 |
385321c36c78c8f4f019b15680c264f1
|
|
| BLAKE2b-256 |
a98a0aa45f2534a0297788616a331d4f90be8a0249817f2c5b98671e4698f148
|