Skip to main content

GPU inference for losslessly compressed (DFloat11) large language models

Project description

DFloat11: Lossless LLM Compression for Efficient GPU Inference

DFloat11 is a lossless compression framework that reduces the size of Large Language Models (LLMs) by approximately 30% while preserving bit-for-bit identical outputs to the original model. It enables efficient GPU inference on resource-constrained hardware without sacrificing accuracy.

📦 Installation

Requires CUDA-compatible GPU, and PyTorch installed.

pip install dfloat11[cuda12]
# or if you have CUDA version 11:
# pip install dfloat11[cuda11]

🔧 Key Features

  • 📉 Significant size reduction: Compresses LLM weights by ~30%, losslessly.
  • ✅ Zero loss in accuracy: Produces bit-for-bit identical outputs to the original BFloat16 model.
  • 🧩 Easy to use: Seamlessly integrates with HuggingFace framework.
  • ⚡ High throughput: Enables up to 38.8× faster generation compared to CPU offloading alternatives.
  • 🧠 Supports longer inputs: Extends maximum context length by up to 13.17× under the same GPU memory budget.

🔗 Links

👉 Explore pre-compressed DFloat11 models ready to use on HuggingFace: https://huggingface.co/DFloat11

📂 Official Code Repository: https://github.com/LeanModels/DFloat11

🧪 Quickstart

from dfloat11 import DFloat11ModelForCausalLM

model = DFloat11ModelForCausalLM.from_pretrained(
    "<huggingface-model-name>",
    "<path-to-dfloat11-model>",
    device_map='auto',
)

# model is ready to use like a regular huggingface model

📚 Citation

If you found our work useful or interesting, please consider citing our paper:

@misc{zhang2025dfloat11,
  title        = {70\% Size, 100\% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float},
  author       = {Tianyi Zhang and Yang Sui and Shaochen Zhong and Vipin Chaudhary and Xia Hu and Anshumali Shrivastava},
  year         = {2025},
  eprint       = {2504.11651},
  archivePrefix= {arXiv},
  primaryClass = {cs.LG},
  url          = {https://arxiv.org/abs/2504.11651}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dfloat11-0.1.0.tar.gz (10.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dfloat11-0.1.0-py3-none-any.whl (10.8 kB view details)

Uploaded Python 3

File details

Details for the file dfloat11-0.1.0.tar.gz.

File metadata

  • Download URL: dfloat11-0.1.0.tar.gz
  • Upload date:
  • Size: 10.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.16

File hashes

Hashes for dfloat11-0.1.0.tar.gz
Algorithm Hash digest
SHA256 59e9f1ccd838f6c66b913f357d180b124d596ce606df2cd47f00768b0bf80ab1
MD5 b33e0cb784923a899d6bb2c26ecba1a9
BLAKE2b-256 fc6b9bde2b7785c3a57dafed8299b8f0a39a8be3e3c771f032c2d99cbde2342b

See more details on using hashes here.

File details

Details for the file dfloat11-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: dfloat11-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 10.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.16

File hashes

Hashes for dfloat11-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 7651dd71927196b04c1abc40269baf576e62dfc9a245d82378ba40634fb53b35
MD5 385321c36c78c8f4f019b15680c264f1
BLAKE2b-256 a98a0aa45f2534a0297788616a331d4f90be8a0249817f2c5b98671e4698f148

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page