Skip to main content

LongNet - Pytorch

Project description

Multi-Modality

LongNet: Scaling Transformers to 1,000,000,000 Tokens

LongNetBanner

GitHub issues GitHub forks GitHub stars GitHub license Share on Twitter Share on Facebook Share on LinkedIn Discord Share on Reddit Share on Hacker News Share on Pinterest Share on WhatsApp

This is an open source implementation for the paper LongNet: Scaling Transformers to 1,000,000,000 Tokens by Jiayu Ding, Shuming Ma, Li Dong, Xingxing Zhang, Shaohan Huang, Wenhui Wang, Furu Wei. The LongNet is a Transformer variant designed to scale sequence length up to more than 1 billion tokens without sacrificing performance on shorter sequences.

Installation

pip install longnet

Usage

Once you have installed LongNet, you can use the DilatedAttention class as follows:

import torch
from long_net import DilatedAttention


# model config
dim = 512
heads = 8
dilation_rate = 2
segment_size = 64

# input data
batch_size = 32
seq_len = 8192


# create model and data
model = DilatedAttention(dim, heads, dilation_rate, segment_size, qk_norm=True)
x = torch.randn((batch_size, seq_len, dim))

output = model(x)
print(output)

LongNetTransformer

A fully ready to train transformer model with dilated transformer blocks with Feedforwards with layernorm, SWIGLU, and a parallel transformer block

import torch
from long_net.model import LongNetTransformer

longnet = LongNetTransformer(
    num_tokens=20000,
    dim=512,
    depth=6,
    dim_head=64,
    heads=8,
    ff_mult=4,
)

tokens = torch.randint(0, 20000, (1, 512))
logits = longnet(tokens)
print(logits)

Train

  • To run a simple training run on the enwiki8 dataset, gitclone, install the requirements.txt, and then run python3 train.py

LongNet Summarized

Scaling sequence length has become a critical bottleneck in the era of large language models. However, existing methods struggle with either computational complexity or model expressivity, rendering the maximum sequence length restricted. In this paper, they introduce LongNet, a Transformer variant that can scale sequence length to more than 1 billion tokens, without sacrificing the performance on shorter sequences. Specifically, they propose dilated attention, which expands the attentive field exponentially as the distance grows.

Features

LongNet has significant advantages:

  1. It has a linear computation complexity and a logarithm dependency between tokens.
  2. It can be served as a distributed trainer for extremely long sequences.
  3. Its dilated attention is a drop-in replacement for standard attention, which can be seamlessly integrated with the existing Transformer-based optimization.

Experiment results demonstrate that LongNet yields strong performance on both long-sequence modeling and general language tasks. Their work opens up new possibilities for modeling very long sequences, e.g., treating a whole corpus or even the entire Internet as a sequence.

Citation

@inproceedings{ding2023longnet,
  title={LongNet: Scaling Transformers to 1,000,000,000 Tokens},
  author={Ding, Jiayu and Ma, Shuming and Dong, Li and Zhang, Xingxing and Huang, Shaohan and Wang, Wenhui and Wei, Furu},
  booktitle={Proceedings of the 10th International Conference on Learning Representations},
  year={2023}
}

Todo

  • Fix the ParallelTransformer Block's forward pass with dilated attn
  • Train on enwiki 8 and test
  • Create multihead iteration

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

longnet-0.5.7.tar.gz (17.9 kB view details)

Uploaded Source

Built Distribution

longnet-0.5.7-py3-none-any.whl (16.3 kB view details)

Uploaded Python 3

File details

Details for the file longnet-0.5.7.tar.gz.

File metadata

  • Download URL: longnet-0.5.7.tar.gz
  • Upload date:
  • Size: 17.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.3.2 CPython/3.11.0 Darwin/22.4.0

File hashes

Hashes for longnet-0.5.7.tar.gz
Algorithm Hash digest
SHA256 7a75e8272e7bb805c0dccfa6b2f5ea19fdef59a1b50a21a5a9990e8e4387c13b
MD5 7a84750538cf733cabd4ad104a435404
BLAKE2b-256 f98339f58fd658950b7ee03a3ca6f22f535519ef551ce21b66b63bb07e9d38cd

See more details on using hashes here.

File details

Details for the file longnet-0.5.7-py3-none-any.whl.

File metadata

  • Download URL: longnet-0.5.7-py3-none-any.whl
  • Upload date:
  • Size: 16.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.3.2 CPython/3.11.0 Darwin/22.4.0

File hashes

Hashes for longnet-0.5.7-py3-none-any.whl
Algorithm Hash digest
SHA256 7c7da669e1176900cce14914dd3cc3271554b5b6ea675c8450408196fa6ab5f9
MD5 2d761894a9e61f9a7b88dba13a5f993b
BLAKE2b-256 b0a7dc9d6dbee9e754e0c006a3fc7b8254af4b6fe64b1eb3ea0672090777c31b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page