An easy-to-use library for GLU (Gated Linear Units) and GLU variants in TensorFlow.
Project description
GLU
An easy-to-use library for GLU (Gated Linear Units) and GLU variants in TensorFlow. This repository allows you to easily make use of the following activation functions:
- GLU introduced in the paper Language Modeling with Gated Convolutional Networks [1]
- Bilinear introduced in the paper Language Modeling with Gated Convolutional Networks [1] atrributed to Mnih et al. [2]
- ReGLU introduced in the paper GLU Variants Improve Transformer [3]
- GEGLU introduced in the paper GLU Variants Improve Transformer [3]
- SwiGLU introduced in the paper GLU Variants Improve Transformer [3]
- SeGLU
Gated Linear Units consist of the component-wise product of two linear projections, one of which is first passed through a sigmoid function. Variations on GLU are possible, using different nonlinear (or even linear) functions in place of sigmoid. In the GLU Variants Improve Transformer [3] paper, in a fine-tuning scenario the new variants seem to produce better perplexities for the de-noising objective used in pre-training, as well as better results on many downstream language-understanding tasks. Furthermore these do not have any apparent computational drawbacks.
Installation
Run the following to install:
pip install glu-tf
Developing glu-tf
To install glu-tf
, along with tools you need to develop and test, run the following in your virtualenv:
git clone https://github.com/Rishit-dagli/GLU.git
# or clone your own fork
cd GLU
pip install -e .[dev]
Usage
In this section, I show a minimal example of using the SwiGLU activation function but you can use the other activations in similar manner:
import tensorflow as tf
from glu_tf import SwiGLU
model = tf.keras.Sequential()
model.add(tf.keras.layers.Dense(units=10)
model.add(SwiGLU(bias = False, dim=-1, name='swiglu'))
Want to Contribute 🙋♂️?
Awesome! If you want to contribute to this project, you're always welcome! See Contributing Guidelines. You can also take a look at open issues for getting more information about current or upcoming tasks.
Want to discuss? 💬
Have any questions, doubts or want to present your opinions, views? You're always welcome. You can start discussions.
References
[1] Dauphin, Yann N., et al. ‘Language Modeling with Gated Convolutional Networks’. ArXiv:1612.08083 [Cs], Sept. 2017. arXiv.org, http://arxiv.org/abs/1612.08083.
[2] Mnih, A., and Hinton, G. 2007. Three new graphical models for statistical language modelling. In Proceedings of the 24th international conference on Machine learning (pp. 641–648).
[3] Shazeer, Noam. ‘GLU Variants Improve Transformer’. ArXiv:2002.05202 [Cs, Stat], Feb. 2020. arXiv.org, http://arxiv.org/abs/2002.05202.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file GLU-tf-0.1.0.tar.gz
.
File metadata
- Download URL: GLU-tf-0.1.0.tar.gz
- Upload date:
- Size: 8.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.0 CPython/3.9.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 61ef3750ee2eadfbeb77c32e50a093a3dbdd82976aee9ea308aace4dbd7fd0fa |
|
MD5 | 56aac065e2655311ed490476fea065a1 |
|
BLAKE2b-256 | ab4969067da41713ecc126e0f45dd63152ad6facbf99ffde2c3abb3b100d065e |
File details
Details for the file GLU_tf-0.1.0-py3-none-any.whl
.
File metadata
- Download URL: GLU_tf-0.1.0-py3-none-any.whl
- Upload date:
- Size: 8.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.0 CPython/3.9.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 319d1067ee84617fb1e7bbef2b3a307e1d3dde7af02f942ca5aa8a40a1219384 |
|
MD5 | 1fdfa227c61fca7b0847d2e8dda282d8 |
|
BLAKE2b-256 | c0d7276656ece1c7168706f6b7f8e2554e1f3c2a183a28119471b369256ae0da |