Project description

Adam Layer-wise LR Decay

In ELECTRA, which had been published by Stanford University and Google Brain, they had used Layerwise LR Decay technique for the Adam optimizer to prevent Catastrophic forgetting of Pre-trained model.

This repo contains the implementation of Layer-wise LR Decay for Adam, with new Optimizer API that had been proposed in TensorFlow 2.11.

Usage

Installations:

$ pip install adam-lr-decay  # this method does not install tensorflow

For CPU:

$ pip install adam-lr-decay[cpu]  # this method installs tensorflow-cpu>=2.11

For GPU:

$ pip install adam-lr-decay[gpu]  # this method installs tensorflow>=2.11

from tensorflow.keras import layers, models
from adam_lr_decay import AdamLRDecay

# ... prepare training data

# model definition
model = models.Sequential([
    layers.Dense(3, input_shape=(2,), name='hidden_dense'),
    layers.Dense(1, name='output')
])

# optimizer definition with layerwise lr decay
adam = AdamLRDecay(learning_rate=1e-3)
adam.apply_layerwise_lr_decay(var_name_dicts={
    'hidden_dense': 0.1,
    'output': 0.
})
# this config decays the key layers by the value, 
# which is (lr * (1. - decay_rate))

# compile the model
model.compile(optimizer=adam)

# ... training loop

In official ELECTRA repo, they have defined the decay rate in the code. The adapted version is as follows:

import collections
from adam_lr_decay import AdamLRDecay

def _get_layer_lrs(layer_decay, n_layers):
    key_to_depths = collections.OrderedDict({
        '/embeddings/': 0,
        '/embeddings_project/': 0,
        'task_specific/': n_layers + 2,
    })
    for layer in range(n_layers):
        key_to_depths['encoder/layer_' + str(layer) + '/'] = layer + 1
    return {
        key: 1. - (layer_decay ** (n_layers + 2 - depth))
        for key, depth in key_to_depths.items()
    }

# ... ELECTRA model definition

adam = AdamLRDecay(learning_rate=1e-3)
adam.apply_layerwise_lr_decay(var_name_dicts=_get_layer_lrs(0.9, 8))

# ... custom training loop

The generated decay rate must be looked like this. 0.0 means there is no decay and 1.0 means it is zero learning rate. (non-trainable)

{
  "/embeddings/": 0.6513215599,
  "/embeddings_project/": 0.6513215599, 
  "task_specific/": 0.0, 
  "encoder/layer_0/": 0.6125795109999999, 
  "encoder/layer_1/": 0.5695327899999999, 
  "encoder/layer_2/": 0.5217030999999999, 
  "encoder/layer_3/": 0.46855899999999995, 
  "encoder/layer_4/": 0.40950999999999993, 
  "encoder/layer_5/": 0.3439, 
  "encoder/layer_6/": 0.2709999999999999, 
  "encoder/layer_7/": 0.18999999999999995
}

Citation

@article{clark2020electra,
  title={Electra: Pre-training text encoders as discriminators rather than generators},
  author={Clark, Kevin and Luong, Minh-Thang and Le, Quoc V and Manning, Christopher D},
  journal={arXiv preprint arXiv:2003.10555},
  year={2020}
}

Project details

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

This version

0.0.8

Oct 18, 2023

0.0.7

Aug 10, 2023

0.0.6

May 26, 2023

0.0.5

Mar 29, 2023

0.0.4

Mar 29, 2023

0.0.1

Mar 29, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

adam_lr_decay-0.0.8.tar.gz (4.7 kB view hashes)

Uploaded Oct 18, 2023 Source

Built Distribution

adam_lr_decay-0.0.8-py3-none-any.whl (5.3 kB view hashes)

Uploaded Oct 18, 2023 Python 3

Hashes for adam_lr_decay-0.0.8.tar.gz

Hashes for adam_lr_decay-0.0.8.tar.gz
Algorithm	Hash digest
SHA256	`d55f718f8466d0a98a3f1160acb4dd2d354a154dab18014cabf7901946614b0b`
MD5	`b1cd4170ac71e266f428a6608b13eb73`
BLAKE2b-256	`7897a904efa1cb4532785a8478125b678cba7da57ea569c39ced722ea8f45429`

Hashes for adam_lr_decay-0.0.8-py3-none-any.whl

Hashes for adam_lr_decay-0.0.8-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ee36199771c7577ed2058450d469b3e8ac3032f21ce8e90ac87a09a271e90dc7`
MD5	`aa5d52fad938ab83acc9f6654af9ee17`
BLAKE2b-256	`c6d1441ecd9a895e98593cabd4d47fc354adaf7dc60fa231fd93458304b0edcb`