LongNet - Pytorch

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

LongNet: Scaling Transformers to 1,000,000,000 Tokens

LongNetBanner

Discord

This is an open source implementation for the paper LongNet: Scaling Transformers to 1,000,000,000 Tokens by Jiayu Ding, Shuming Ma, Li Dong, Xingxing Zhang, Shaohan Huang, Wenhui Wang, Furu Wei. The LongNet is a Transformer variant designed to scale sequence length up to more than 1 billion tokens without sacrificing performance on shorter sequences.

News 📰

This implementation of LongNet is brought to you by Agora, we're an all-new open source AI research organization with 1,500+ AI researchers all striving to advance Humanity! Join us and help contribute to LongNet and or recieve FAST support in the Agora discord!
Execute tasks and help accelerate AI research with the project board

Installation

pip install LongNet

Usage

Once you have installed LongNet, you can use the DilatedAttention class as follows:

import timeit
import torch
from longnet.attention import DilatedAttention


#model config
d_model = 512
num_heads = 8
dilation_rate = 2
segment_size = 64

device = "cuda:0"
dtype=torch.float16

#input data
batch_size = 32
seq_len = 10000000


#create model and data
model = DilatedAttention(d_model, num_heads, dilation_rate, segment_size).to(device)
x = torch.randn((batch_size, seq_len, d_model), device=device, dtype=dtype)


#test forward pass
with torch.no_grad():
    output = model(x)
    print(f"Output shape: {output.shape}") # expected (batch_size, seq_Len)


#benchmark model
num_runs = 1000
start_time = timeit.default_timer()
for _ in range(num_runs):
    model(x)

elapsed_time = timeit.default_timer() - start_time
print(f"Average forward pass time: {elapsed_time / num_runs:.6f} seconds")

Introduction

Scaling sequence length has become a critical bottleneck in the era of large language models. However, existing methods struggle with either computational complexity or model expressivity, rendering the maximum sequence length restricted. In this paper, they introduce LongNet, a Transformer variant that can scale sequence length to more than 1 billion tokens, without sacrificing the performance on shorter sequences. Specifically, they propose dilated attention, which expands the attentive field exponentially as the distance grows.

Features

LongNet has significant advantages:

It has a linear computation complexity and a logarithm dependency between tokens.
It can be served as a distributed trainer for extremely long sequences.
Its dilated attention is a drop-in replacement for standard attention, which can be seamlessly integrated with the existing Transformer-based optimization.

Experiment results demonstrate that LongNet yields strong performance on both long-sequence modeling and general language tasks. Their work opens up new possibilities for modeling very long sequences, e.g., treating a whole corpus or even the entire Internet as a sequence.

Here's the updated usage and installation section with two methods: git clone or pip install LongNet:

Documentation

Click here for the model documentation

Training the Model

We're still working on the model configuation as closely in the paper as possible. There are 2 methods, one is accelerate and the other from LongNet import Train

Method 1

Git clone installation
Init your parameters accelerate config
Then accelerate launch LongNet/training.py

Method 2

Pip install method

from LongNet import Train

Train()

Share with Friends

Share LongNet with your friends and colleagues who might find it useful. Simply click on the links below to share on various platforms:

Thank you for sharing!

Share LongNet Repository

Roadmap

Recreate the sparsification mechanism
Recreate the gathering mechanism
Implement FlashAttention2.0
Implement Distributed Setup
create the all-gather operation in the backward that becomes a reduce-scatter operation

Citation

@inproceedings{ding2023longnet,
  title={LongNet: Scaling Transformers to 1,000,000,000 Tokens},
  author={Ding, Jiayu and Ma, Shuming and Dong, Li and Zhang, Xingxing and Huang, Shaohan and Wang, Wenhui and Wei, Furu},
  booktitle={Proceedings of the 10th International Conference on Learning Representations},
  year={2023}
}

Project details

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

0.5.7

Jan 7, 2024

0.5.6

Dec 20, 2023

0.5.5

Dec 20, 2023

0.5.3

Nov 17, 2023

This version

0.5.0

Aug 30, 2023

0.4.9

Aug 16, 2023

0.4.8

Aug 10, 2023

0.4.3

Aug 10, 2023

0.4.2

Aug 2, 2023

0.4.1

Jul 17, 2023

0.4.0

Jul 14, 2023

0.3.9

Jul 14, 2023

0.3.8

Jul 14, 2023

0.3.7

Jul 12, 2023

0.3.6

Jul 12, 2023

0.3.5

Jul 12, 2023

0.3.4

Jul 12, 2023

0.3.2

Jul 12, 2023

0.3.1

Jul 12, 2023

0.3.0

Jul 12, 2023

0.2.9

Jul 12, 2023

0.2.7

Jul 12, 2023

0.2.5

Jul 10, 2023

0.2.4

Jul 10, 2023

0.2.3

Jul 10, 2023

0.2.2

Jul 10, 2023

0.2.1

Jul 10, 2023

0.2.0

Jul 10, 2023

0.1.9

Jul 10, 2023

0.1.8

Jul 10, 2023

0.1.7

Jul 10, 2023

0.1.6

Jul 10, 2023

0.1.5

Jul 10, 2023

0.1.3

Jul 7, 2023

0.1.1

Jul 7, 2023

0.0.9

Jul 7, 2023

0.0.8

Jul 7, 2023

0.0.7

Jul 7, 2023

0.0.6

Jul 7, 2023

0.0.4

Jul 7, 2023

0.0.3

Jul 7, 2023

0.0.2

Jul 7, 2023

0.0.1

Jul 6, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

longnet-0.5.0.tar.gz (47.7 kB view hashes)

Uploaded Aug 30, 2023 Source

Built Distribution

longnet-0.5.0-py3-none-any.whl (57.9 kB view hashes)

Uploaded Aug 30, 2023 Python 3

Hashes for longnet-0.5.0.tar.gz

Hashes for longnet-0.5.0.tar.gz
Algorithm	Hash digest
SHA256	`9c5e485abe58f7e9dbbca67055a40b879050a48d8ed2b6d3d9910aabece030e4`
MD5	`0fe44e2b2c0eb745e39dc0bbffa3b44a`
BLAKE2b-256	`0e6def5c88383b24a3b6a7cac7717b33ebb916b43456067c41e569768a83163b`

Hashes for longnet-0.5.0-py3-none-any.whl

Hashes for longnet-0.5.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d8da6eea273365411d58da8e955a12b7d08ec29510a2ca23e3a01525911ea743`
MD5	`c9fea7a757b7eb0b722a87e00f1cea94`
BLAKE2b-256	`17d6f142a9e22452bde39814529fd1097efe44ab6139f2872378b638d7fdf9b8`