nGPT
Project description
nGPT (normalized GPT) - Pytorch
Quick implementation of nGPT, learning entirely on the hypersphere, from NvidiaAI. The question is whether there is any loss of expressivity they swept under the rug, but I'll take it with good faith.
This type of network should also be studied in the context of continual learning and loss of plasticity
Adaptation to vision transformers is here
Install
$ pip install nGPT-pytorch
Usage
import torch
from nGPT_pytorch import nGPT
model = nGPT(
num_tokens = 256,
dim = 512,
depth = 4,
attn_norm_qk = True
)
x = torch.randint(0, 256, (2, 2048))
loss = model(x, return_loss = True)
loss.backward()
logits = model(x) # (2, 2048, 256)
Test
Enwik8
$ python train.py
Citations
@inproceedings{Loshchilov2024nGPTNT,
title = {nGPT: Normalized Transformer with Representation Learning on the Hypersphere},
author = {Ilya Loshchilov and Cheng-Ping Hsieh and Simeng Sun and Boris Ginsburg},
year = {2024},
url = {https://api.semanticscholar.org/CorpusID:273026160}
}
@article{Luo2017CosineNU,
title = {Cosine Normalization: Using Cosine Similarity Instead of Dot Product in Neural Networks},
author = {Chunjie Luo and Jianfeng Zhan and Lei Wang and Qiang Yang},
journal = {ArXiv},
year = {2017},
volume = {abs/1702.05870},
url = {https://api.semanticscholar.org/CorpusID:1505432}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
ngpt_pytorch-0.1.14.tar.gz
(36.9 MB
view hashes)
Built Distribution
Close
Hashes for ngpt_pytorch-0.1.14-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9cf4e57072fbf6e0eab2c81d09d1779aa0dec14d9af334a7b2835f0c82ae8d72 |
|
MD5 | 750db2bd9e06cc2f8f15f3ae041b02a2 |
|
BLAKE2b-256 | 799ce9e7f2a64a339b4a07e41e621532cf4c33141a2c5a347b6d91e10e46a204 |