bitnet - Pytorch

These details have not been verified by PyPI

Project links

Project description

BitNet

bitnet Implementation of the "BitNet: Scaling 1-bit Transformers for Large Language Models"

BitLinear = tensor -> layernorm -> Binarize -> abs max quantization -> dequant

"The implementation of the BitNet architecture is quite simple, requiring only the replacement of linear projections (i.e., nn.Linear in PyTorch) in the Transformer. " -- BitNet is really easy to implement just swap out the linears with the BitLinear modules!

NEWS

BitNet Transformer has been trained using the train.py file that trains on enwiki8 a small 1gb dataset of wikipedia: HERE IS THE LINK

Appreciation

Dimitry, Nullonix for analysis and code review and revision
Vyom, for providing 4080 to train!

Installation

pip install bitnet

Usage:

`BitLinear`

Example of the BitLinear layer which is the main innovation of the paper!

import torch

from bitnet import BitLinear

# Input
x = torch.randn(10, 512)

# BitLinear layer
layer = BitLinear(512, 400)

# Output
y = layer(x)

print(y)

`BitNetTransformer`

Fully implemented Transformer as described in the diagram with MHA, and BitFeedforwards
Can be utilized not just for text but for images and maybe even video or audio processing
Complete with residuals and skip connections for gradient flow

import torch
from bitnet import BitNetTransformer

bitnet = BitNetTransformer(
    num_tokens=20000,
    dim=512,
    depth=6,
    dim_head=64,
    heads=8,
    ff_mult=4,
)

tokens = torch.randint(0, 20000, (1, 512))
logits = bitnet(tokens)
print(logits.shape)

`BitFeedForward`

Feedforward as shown in the diagram with BitLinear and a GELU:
Linear -> GELU -> Linear
You can add dropouts, or layernorms, or other layers for a better ffn

import torch
from bitnet.bitffn import BitFeedForward

# Random input
x = torch.randn(10, 512)

# FFN
ff = BitFeedForward(512)

# Apply FFN
y = ff(x)

print(y.shape)
# torch.Size([10, 512])

Inference

from bitnet import BitNetInference

bitnet = BitNetInference()
bitnet.load_model('../model_checkpoint.pth') #Download model
output_str = bitnet.generate("The dog jumped over the ", 512)
print(output_str)

License

MIT

Citation

@misc{2310.11453,
Author = {Hongyu Wang and Shuming Ma and Li Dong and Shaohan Huang and Huaijie Wang and Lingxiao Ma and Fan Yang and Ruiping Wang and Yi Wu and Furu Wei},
Title = {BitNet: Scaling 1-bit Transformers for Large Language Models},
Year = {2023},
Eprint = {arXiv:2310.11453},
}

Todo

Double check BitLinear implementation and make sure it works exactly as in paper
Implement training script for BitNetTransformer
Train on Enwiki8, copy and past code and data from Lucidrains repos
Benchmark performance
Look into Straight Through Estimator for non-differentiable backprop
Implement BitFeedForward
Clean up codebase
Add unit tests for each module

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.2.5

Apr 28, 2024

0.2.4

Apr 4, 2024

0.2.3

Apr 3, 2024

0.2.2

Apr 3, 2024

0.2.1

Apr 1, 2024

0.2.0

Apr 1, 2024

0.1.9

Apr 1, 2024

0.1.8

Mar 30, 2024

0.1.7

Mar 25, 2024

0.1.6

Mar 25, 2024

0.1.5

Mar 12, 2024

0.1.4

Mar 1, 2024

0.1.2

Mar 1, 2024

This version

0.1.0

Feb 29, 2024

0.0.8

Jan 3, 2024

0.0.7

Jan 3, 2024

0.0.6

Jan 3, 2024

0.0.3

Oct 24, 2023

0.0.2

Oct 18, 2023

0.0.1

Oct 18, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bitnet-0.1.0.tar.gz (7.8 kB view hashes)

Uploaded Feb 29, 2024 Source

Built Distribution

bitnet-0.1.0-py3-none-any.whl (8.1 kB view hashes)

Uploaded Feb 29, 2024 Python 3

Hashes for bitnet-0.1.0.tar.gz

Hashes for bitnet-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`5b030f3b34d3ea54b650eb3c6c9dd40b3ad1262807df82a366795cb00a6d32d2`
MD5	`5d1d887bdf73c563084e1c30454c4420`
BLAKE2b-256	`1709b238e0a9565c8b73add146b111038f3377a144f860a50f8ddd5ba17b6293`

Hashes for bitnet-0.1.0-py3-none-any.whl

Hashes for bitnet-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`0503bd1274d8dfa371f639c1ce4d8a545de593e9bed3af1b4f7c36cd6ad899db`
MD5	`fe9e10dbbc6c301f7ca87457c8d51f81`
BLAKE2b-256	`0e5af485f1fd69b8f12517fe110785e2653d646f6573bcb45bb00f5de1e05f6e`