Skip to main content

Adam Layer-wise LR Decay

Project description

Adam Layer-wise LR Decay

In ELECTRA, which had been published by Stanford University and Google Brain, they had used Layerwise LR Decay technique for the Adam optimizer to prevent Catastrophic forgetting of Pre-trained model.

This repo contains the implementation of Layer-wise LR Decay for Adam, with new Optimizer API that had been proposed in TensorFlow 2.11.

Usage

Installations:

$ pip install adam-lr-decay  # this method does not install tensorflow

For CPU:

$ pip install adam-lr-decay[cpu]  # this method installs tensorflow-cpu>=2.11

For GPU:

$ pip install adam-lr-decay[gpu]  # this method installs tensorflow>=2.11
from tensorflow.keras import layers, models
from adam_lr_decay import AdamLRDecay

# ... prepare training data

# model definition
model = models.Sequential([
    layers.Dense(3, input_shape=(2,), name='hidden_dense'),
    layers.Dense(1, name='output')
])

# optimizer definition with layerwise lr decay
adam = AdamLRDecay(learning_rate=1e-3)
adam.apply_layerwise_lr_decay(var_name_dicts={
    'hidden_dense': 0.1,
    'output': 0.
})
# this config decays the key layers by the value, 
# which is (lr * (1. - decay_rate))

# compile the model
model.compile(optimizer=adam)

# ... training loop

In official ELECTRA repo, they have defined the decay rate in the code. The adapted version is as follows:

import collections
from adam_lr_decay import AdamLRDecay

def _get_layer_lrs(layer_decay, n_layers):
    key_to_depths = collections.OrderedDict({
        '/embeddings/': 0,
        '/embeddings_project/': 0,
        'task_specific/': n_layers + 2,
    })
    for layer in range(n_layers):
        key_to_depths['encoder/layer_' + str(layer) + '/'] = layer + 1
    return {
        key: 1. - (layer_decay ** (n_layers + 2 - depth))
        for key, depth in key_to_depths.items()
    }

# ... ELECTRA model definition

adam = AdamLRDecay(learning_rate=1e-3)
adam.apply_layerwise_lr_decay(var_name_dicts=_get_layer_lrs(0.9, 8))

# ... custom training loop

The generated decay rate must be looked like this. 0.0 means there is no decay and 1.0 means it is zero learning rate. (non-trainable)

{
  "/embeddings/": 0.6513215599,
  "/embeddings_project/": 0.6513215599, 
  "task_specific/": 0.0, 
  "encoder/layer_0/": 0.6125795109999999, 
  "encoder/layer_1/": 0.5695327899999999, 
  "encoder/layer_2/": 0.5217030999999999, 
  "encoder/layer_3/": 0.46855899999999995, 
  "encoder/layer_4/": 0.40950999999999993, 
  "encoder/layer_5/": 0.3439, 
  "encoder/layer_6/": 0.2709999999999999, 
  "encoder/layer_7/": 0.18999999999999995
}

Citation

@article{clark2020electra,
  title={Electra: Pre-training text encoders as discriminators rather than generators},
  author={Clark, Kevin and Luong, Minh-Thang and Le, Quoc V and Manning, Christopher D},
  journal={arXiv preprint arXiv:2003.10555},
  year={2020}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

adam-lr-decay-0.0.6.tar.gz (5.0 kB view details)

Uploaded Source

Built Distribution

adam_lr_decay-0.0.6-py3-none-any.whl (5.4 kB view details)

Uploaded Python 3

File details

Details for the file adam-lr-decay-0.0.6.tar.gz.

File metadata

  • Download URL: adam-lr-decay-0.0.6.tar.gz
  • Upload date:
  • Size: 5.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.16

File hashes

Hashes for adam-lr-decay-0.0.6.tar.gz
Algorithm Hash digest
SHA256 1182ef1ae4f6a590839cf1142e115a9ad1981d9ef7b8211bf8689ed74ed152dc
MD5 6a646ecea96e7356e9839b0a94a9ab8f
BLAKE2b-256 5056339741f974679cb18f428f40df4ea5bb30686fb8b6aa0c1d5a56f0d0aeec

See more details on using hashes here.

File details

Details for the file adam_lr_decay-0.0.6-py3-none-any.whl.

File metadata

File hashes

Hashes for adam_lr_decay-0.0.6-py3-none-any.whl
Algorithm Hash digest
SHA256 80fdede6525941f2680a94d6e9b640f50182a447866c057482bf21e701bd004a
MD5 c9c3ca12d944d481c71be8288f4732af
BLAKE2b-256 2c55bf82324d0afd57a698cf5823b7c177676fa9bdb08670d72e907302a70167

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page