Skip to main content

Plug-in-and-Play Toolbox for Stablizing Transformer Training

Project description

GitHub PWCMaintenance

Admin-Torch

Transformers Training **Stabilized**

What's New?Key IdeaHow To UseDocsExamplesCitationLicense

Here, we provide a plug-in-and-play implementation of Admin, which stabilizes previously-diverged Transformer training and achieves better performance, without introducing additional hyper-parameters. The design of Admin is half-precision friendly and can be reparameterized into the original Transformer.


What's New?

Beyond the original admin implementation:

  1. admin-torch removed the profilling stage and is plug-in-and-play.
  2. admin-torch's implementation is more robust (see below).

Comparison w. the DeepNet Init and the Original Admin Init (on WMT'17).

Regular batch size (8x4096) Huge batch size (128x4096)
Original Admin
DeepNet
admin-torch

More details can be found in our example.

Key Idea

What complicates Transformer training?

For Transformer f, input x, randomly initialized weight w, we describe its stability (output_change_scale) as

In our study, we show that, an original N-layer Transformer's output_change_scale is O(n), which unstabilizes its training. Admin stabilize Transformer's training by regulating this scale to O(logn) or O(1).

More details can be found in our paper.

How to use?

install

pip install admin-torch

import

import admin-torch

enjoy

def __init__(self, ...):
...
+(residual = admin-torch.as_module(self, self.number_of_sub_layers))+
...

def forward(self, ):
...
-!x = x + f(x)!-
+(x = residual(x, f(x)))+
x = self.LN(x)
...

An elaborated example can be found at our doc, and a real working example can be found at LiyuanLucasLiu/fairseq (training recipe is available at our example).

Citation

Please cite the following papers if you found our model useful. Thanks!

Liyuan Liu, Xiaodong Liu, Jianfeng Gao, Weizhu Chen, and Jiawei Han (2020). Understanding the Difficulty of Training Transformers. Proc. 2020 Conf. on Empirical Methods in Natural Language Processing (EMNLP'20).

@inproceedings{liu2020admin,
  title={Understanding the Difficulty of Training Transformers},
  author = {Liu, Liyuan and Liu, Xiaodong and Gao, Jianfeng and Chen, Weizhu and Han, Jiawei},
  booktitle = {Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP 2020)},
  year={2020}
}

Xiaodong Liu, Kevin Duh, Liyuan Liu, and Jianfeng Gao (2020). Very Deep Transformers for Neural Machine Translation. arXiv preprint arXiv:2008.07772 (2020).

@inproceedings{liu_deep_2020,
 author = {Liu, Xiaodong and Duh, Kevin and Liu, Liyuan and Gao, Jianfeng},
 booktitle = {arXiv:2008.07772 [cs]},
 title = {Very Deep Transformers for Neural Machine Translation},
 year = {2020}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

admin_torch-0.1.0.tar.gz (1.2 MB view details)

Uploaded Source

Built Distribution

admin_torch-0.1.0-py3-none-any.whl (6.5 kB view details)

Uploaded Python 3

File details

Details for the file admin_torch-0.1.0.tar.gz.

File metadata

  • Download URL: admin_torch-0.1.0.tar.gz
  • Upload date:
  • Size: 1.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.9.5

File hashes

Hashes for admin_torch-0.1.0.tar.gz
Algorithm Hash digest
SHA256 fc7fce3fafed83d719e0a0594a0201bee54e53807d6f9f0715391bfb89803db4
MD5 10ac9b8de35b6f3a0f71457bf05b215b
BLAKE2b-256 e56f9b420533c0f9f09536d88f17fe5b79e046c225b414d6f9b872de17978189

See more details on using hashes here.

File details

Details for the file admin_torch-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: admin_torch-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 6.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.9.5

File hashes

Hashes for admin_torch-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 fd5696ff43a699b97bee2f58b476653dcf393c0320f43d9a22eee9a6a71ddc70
MD5 cde297e48c2df8115ddba75ce12ffb29
BLAKE2b-256 8ff8a637a2448682e641efb852821084906ff07e6548d75c2ef322f35c342763

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page