Skip to main content

8-bit optimizers and matrix multiplication routines.

Project description

bitsandbytes

The bitsandbytes is a lightweight wrapper around CUDA custom functions, in particular 8-bit optimizers, matrix multiplication (LLM.int8()), and quantization functions.

Resources:

TL;DR

Requirements Python >=3.8. Linux distribution (Ubuntu, MacOS, etc.) + CUDA > 10.0. LLM.int8() requires Turing or Ampere GPUs.

Installation: pip install bitsandbytes

Using 8-bit optimizer:

  1. Comment out optimizer: #torch.optim.Adam(....)
  2. Add 8-bit optimizer of your choice bnb.optim.Adam8bit(....) (arguments stay the same)
  3. Replace embedding layer if necessary: torch.nn.Embedding(..) -> bnb.nn.Embedding(..)

Using 8-bit Inference:

  1. Comment out torch.nn.Linear: #linear = torch.nn.Linear(...)
  2. Add bnb 8-bit linear light module: linear = bnb.nn.Linear8bitLt(...) (base arguments stay the same)
  3. There are two modes:
    • Mixed 8-bit training with 16-bit main weights. Pass the argument has_fp16_weights=True (default)
    • Int8 inference. Pass the argument has_fp16_weights=False
  4. To use the full LLM.int8() method, use the threshold=k argument. We recommend k=6.0.
# LLM.int8()
linear = bnb.nn.Linear8bitLt(dim1, dim2, bias=True, has_fp16_weights=False, threshold=6.0)
# inputs need to be fp16
out = linear(x.to(torch.float16))

Features

  • 8-bit Matrix multiplication with mixed precision decomposition
  • LLM.int8() inference
  • 8-bit Optimizers: Adam, AdamW, RMSProp, LARS, LAMB (saves 75% memory)
  • Stable Embedding Layer: Improved stability through better initialization, and normalization
  • 8-bit quantization: Quantile, Linear, and Dynamic quantization
  • Fast quantile estimation: Up to 100x faster than other algorithms

Requirements & Installation

Requirements: anaconda, cudatoolkit, pytorch

Hardware requirements:

  • LLM.int8(): NVIDIA Turing (RTX 20xx; T4) or Ampere GPU (RTX 30xx; A4-A100); (a GPU from 2018 or older).
  • 8-bit optimizers and quantization: NVIDIA Kepler GPU or newer (>=GTX 78X).

Supported CUDA versions: 10.2 - 12.0

The bitsandbytes library is currently only supported on Linux distributions. Windows is not supported at the moment.

The requirements can best be fulfilled by installing pytorch via anaconda. You can install PyTorch by following the "Get Started" instructions on the official website.

To install run:

pip install bitsandbytes

Using bitsandbytes

Using Int8 Matrix Multiplication

For straight Int8 matrix multiplication with mixed precision decomposition you can use bnb.matmul(...). To enable mixed precision decomposition, use the threshold parameter:

bnb.matmul(..., threshold=6.0)

For instructions how to use LLM.int8() inference layers in your own code, see the TL;DR above or for extended instruction see this blog post.

Using the 8-bit Optimizers

With bitsandbytes 8-bit optimizers can be used by changing a single line of code in your codebase. For NLP models we recommend also to use the StableEmbedding layers (see below) which improves results and helps with stable 8-bit optimization. To get started with 8-bit optimizers, it is sufficient to replace your old optimizer with the 8-bit optimizer in the following way:

import bitsandbytes as bnb

# adam = torch.optim.Adam(model.parameters(), lr=0.001, betas=(0.9, 0.995)) # comment out old optimizer
adam = bnb.optim.Adam8bit(model.parameters(), lr=0.001, betas=(0.9, 0.995)) # add bnb optimizer
adam = bnb.optim.Adam(model.parameters(), lr=0.001, betas=(0.9, 0.995), optim_bits=8) # equivalent


torch.nn.Embedding(...) ->  bnb.nn.StableEmbedding(...) # recommended for NLP models

Note that by default all parameter tensors with less than 4096 elements are kept at 32-bit even if you initialize those parameters with 8-bit optimizers. This is done since such small tensors do not save much memory and often contain highly variable parameters (biases) or parameters that require high precision (batch norm, layer norm). You can change this behavior like so:

# parameter tensors with less than 16384 values are optimized in 32-bit
# it is recommended to use multiplies of 4096
adam = bnb.optim.Adam8bit(model.parameters(), min_8bit_size=16384)

Change Bits and other Hyperparameters for Individual Parameters

If you want to optimize some unstable parameters with 32-bit Adam and others with 8-bit Adam, you can use the GlobalOptimManager. With this, we can also configure specific hyperparameters for particular layers, such as embedding layers. To do that, we need two things: (1) register the parameter while they are still on the CPU, (2) override the config with the new desired hyperparameters (anytime, anywhere). See our guide for more details

Fairseq Users

To use the Stable Embedding Layer, override the respective build_embedding(...) function of your model. Make sure to also use the --no-scale-embedding flag to disable scaling of the word embedding layer (nor replaced with layer norm). You can use the optimizers by replacing the optimizer in the respective file (adam.py etc.).

Release and Feature History

For upcoming features and changes and full history see Patch Notes.

Errors

  1. RuntimeError: CUDA error: no kernel image is available for execution on the device. Solution
  2. _fatbinwrap.. Solution

Compile from source

To compile from source, please follow the compile_from_source.md instructions.

License

The majority of bitsandbytes is licensed under MIT, however portions of the project are available under separate license terms: Pytorch is licensed under the BSD license.

We thank Fabio Cannizzo for his work on FastBinarySearch which we use for CPU quantization.

How to cite us

If you found this library and found LLM.int8() useful, please consider citing our work:

@article{dettmers2022llmint8,
  title={LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale},
  author={Dettmers, Tim and Lewis, Mike and Belkada, Younes and Zettlemoyer, Luke},
  journal={arXiv preprint arXiv:2208.07339},
  year={2022}
}

For 8-bit optimizers or quantization routines, please consider citing the following work:

@article{dettmers2022optimizers,
  title={8-bit Optimizers via Block-wise Quantization},
  author={Dettmers, Tim and Lewis, Mike and Shleifer, Sam and Zettlemoyer, Luke},
  journal={9th International Conference on Learning Representations, ICLR},
  year={2022}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bitsandbytes-windows-0.37.3.tar.gz (43.4 kB view details)

Uploaded Source

Built Distribution

bitsandbytes_windows-0.37.3-py3-none-any.whl (48.6 kB view details)

Uploaded Python 3

File details

Details for the file bitsandbytes-windows-0.37.3.tar.gz.

File metadata

  • Download URL: bitsandbytes-windows-0.37.3.tar.gz
  • Upload date:
  • Size: 43.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.13

File hashes

Hashes for bitsandbytes-windows-0.37.3.tar.gz
Algorithm Hash digest
SHA256 715afd13e9ba9ee1a2ce685ba769f5337d99e0a079a32adc541ac13d7d03aa28
MD5 4e0fceb198cdef43e905b0d57247163f
BLAKE2b-256 a76eae6c0e31a52c23ca6a9a166cc21c0d2054b9af50556811ed3480e33d6582

See more details on using hashes here.

File details

Details for the file bitsandbytes_windows-0.37.3-py3-none-any.whl.

File metadata

File hashes

Hashes for bitsandbytes_windows-0.37.3-py3-none-any.whl
Algorithm Hash digest
SHA256 0850e82609eeb83bf52a0be8fc9ba76c49ec21c8246d3d8723b456d21d242ffa
MD5 3f1ec11f3119c931999c3adb470bad02
BLAKE2b-256 f0b4a28918261209ff634c7642afa02c74337cc7aa874583c3d1d450af261978

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page