Skip to main content

An efficent implementation for the paper: "The Era of 1-bit LLMs"

Project description

BitMat: Improving Ternary Matrix Multiplication with Triton

Currently supported models

Model Supported
Mistral
LLama
Gemma

0️⃣1️⃣ Introduction

BitMat is a Python package designed to optimize matrix multiplication operations by utilizing custom kernels written in Triton. Our package leverages the principles outlined in the "1bit-LLM Era" paper, specifically utilizing packed int8 data to enhance computational efficiency and performance in deep learning and numerical computing tasks.

🎛 Features

Custom Triton Kernels: Utilize highly optimized kernels for matrix multiplication, tailored for performance and efficiency.

Packed int8 Operations: During inference the model uses packed int8 data to reduce memory usage and improve computational efficiency.

Ease of Integration: BitMat is designed to be easily integrated into existing PyTorch/transformers workflows, providing a seamless user experience.

💾 Installation

pip install bitmat-tl

At the moment we only support Linux platforms. Windows installation is possible but is not tested.

🏁 Quick Start

High-level API (tranformers-compatible)

from transformers import AutoModelForCausalLM
from bitmat import convert_hf_model

# Initialize your model from an available hf model
model= AutoModelForCausalLM.from_pretrained("some-repo/some-model")
# Convert the model to use BitLinear layers
model = convert_hf_model(model)
# Save the converted model
model.save_pretrained('some_local_folder')

Loading the converted 1.58Bit Model

To utilize the converted 1.58Bit model, such as a customized version of Mistral in this exmaple, you will need to load the model from the AutoClass. Below is an example demonstrating how to load the model from a local directory:

from bitmat import Auto158ModelForCausalLM

# Replace 'path_to_your_model' with the actual path to your model's directory
model = Auto158ModelForCausalLM.from_pretrained('path_to_your_model')

Once loaded, the model operates in two distinct modes:

  • Evaluation Mode: By default, the model employs quantized weights, optimizing performance for inference tasks. Activate this mode using model.eval().

  • Training Mode: Switching to this mode, via model.train(), allows the model to leverage full-precision weights, which is essential for training and fine-tuning processes, ensuring accurate gradient calculations and effective model updates.

This API is fully compatible with the HuggingFace's Ecosystem

Low-level API

import torch
from bitmat import BitLinear

layer = BitLinear(in_features=1024, out_features=512, bias=True, eps=1e-5)
# You can use the layer as a normal torch.nn.Linear layer

📊 Results

It can be observed that the performance of the custom matmul to handle the multiplication of ternary matrices is better for higher precision. This may be due to the optimized process within the GPU.

(left) 16-bit precision, (right) 32-bit precision

Graph #1 Graph #2

🫱🏼‍🫲🏽 Contributing

We welcome contributions from the community, whether it's adding new features, improving documentation, or reporting bugs. Please refer to our contribution guidelines before making a pull request.

📜 License

BitMat is open-sourced under the Apache-2.0 license.

Citation

If you use BitMat in your research, please cite it using the following Bibtex entry:

@article{bitmat2024,
  title={BitMat: Improving Matrix Multiplication with Custom Triton Kernels},
  author={AstraMind AI},
  journal={https://github.com/astramind-ai/BitMat},
  year={2024}
}

Support

For questions, issues, or support regarding BitMat, please open an issue on our GitHub repository.

Acknowledgments

Special thanks to the Triton community and the authors of the "1bit-LLM Era" paper for their groundbreaking work and inspiration.

Also thanks to the developer of BitDelta and UnSloth since part of the code is based on their work.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bitmat_tl-0.3.7.tar.gz (63.8 kB view details)

Uploaded Source

Built Distribution

bitmat_tl-0.3.7-py3-none-any.whl (67.5 kB view details)

Uploaded Python 3

File details

Details for the file bitmat_tl-0.3.7.tar.gz.

File metadata

  • Download URL: bitmat_tl-0.3.7.tar.gz
  • Upload date:
  • Size: 63.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.12.3

File hashes

Hashes for bitmat_tl-0.3.7.tar.gz
Algorithm Hash digest
SHA256 5fc84462cd53bd49189fe89a78665dbeb8fc50ad7daa866bc4713eaae87fa4db
MD5 6823eee5784a0fbf82d02d7bb17795be
BLAKE2b-256 260c7cebf1cd2a37df461fdd5ca0d5730fb5d01c2a7935ecce02522eb9a192df

See more details on using hashes here.

File details

Details for the file bitmat_tl-0.3.7-py3-none-any.whl.

File metadata

  • Download URL: bitmat_tl-0.3.7-py3-none-any.whl
  • Upload date:
  • Size: 67.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.12.3

File hashes

Hashes for bitmat_tl-0.3.7-py3-none-any.whl
Algorithm Hash digest
SHA256 aeed3d5b4cc9df53d3fb5c755baea5fb8773e8afb4ad9d5575b5ce1b9a633d1c
MD5 a450f4f01c71bfda7ae738665e37fc09
BLAKE2b-256 391238a5d38bda062f79dfef3c2ab692bc72d49bc8688a77510b7f70260b37f6

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page