Simple stochastic weight averaging callback for Keras.

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

Keras SWA - Stochastic Weight Averaging

This is an implemention of SWA for Keras and TF-Keras. It currently only implements the constant learning rate scheduler, the cyclic learning rate described in the paper will come soon.

Introduction

Stochastic weight averaging (SWA) is build upon the same principle as snapshot ensembling and fast geometric ensembling. The idea is that averaging select stages of training can lead to better models. Where as the two former methods average by sampling and ensembling models, SWA instead average weights. This has been shown to give comparable improvements confined into a single model.

Paper

Title: Averaging Weights Leads to Wider Optima and Better Generalization
Link: https://arxiv.org/abs/1803.05407
Authors: Pavel Izmailov, Dmitrii Podoprikhin, Timur Garipov, Dmitry Vetrov, Andrew Gordon Wilson
Repo: https://github.com/timgaripov/swa (PyTorch)

Installation

pip install keras-swa

Batch Normalization

Last epoch will be a forward pass, i.e. have learning rate set to zero, for models with batch normalization. This is due to the fact that batch normalization uses the running mean and variance of it's preceding layer to make a normalization. SWA will offset this normalization by suddenly changing the weights in the end of training. Therefore it is necessary for the last epoch to be used to reset and recalculate batch normalization for the updated weights.

SWA

Keras callback object for SWA.

Arguments

start_epoch - Starting epoch for SWA.

lr_schedule - Learning rate scheduler (optional), 'constant' for the non-cyclic scheduler from the paper.

swa_lr - Minimum learning rate for scheduler.

batch_size - Batch size (Keras API only, automatic in TF-Keras).

verbose - Verbosity mode, 0 or 1.

Example

For Keras

from sklearn.datasets.samples_generator import make_blobs
from keras.utils import to_categorical
from keras.models import Sequential
from keras.layers import Dense
from keras.optimizers import SGD

from swa.keras import SWA
 
# make dataset
X, y = make_blobs(n_samples=1000, 
                  centers=3, 
                  n_features=2, 
                  cluster_std=2, 
                  random_state=2)

y = to_categorical(y)

# build model
model = Sequential()
model.add(Dense(50, input_dim=2, activation='relu'))
model.add(Dense(3, activation='softmax'))

model.compile(loss='categorical_crossentropy', 
              optimizer=SGD(learning_rate=0.1))

epochs = 100
start_epoch = 75

# define swa callback
swa = SWA(start_epoch=start_epoch, 
          lr_schedule='constant', 
          swa_lr=0.01, 
          verbose=1)

# train
model.fit(X, y, epochs=epochs, verbose=1, callbacks=[swa])

Or for Keras in Tensorflow

from sklearn.datasets.samples_generator import make_blobs
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.optimizers import SGD

from swa.tfkeras import SWA

# make dataset
X, y = make_blobs(n_samples=1000, 
                  centers=3, 
                  n_features=2, 
                  cluster_std=2, 
                  random_state=2)

y = to_categorical(y)

# build model
model = Sequential()
model.add(Dense(50, input_dim=2, activation='relu'))
model.add(Dense(3, activation='softmax'))

model.compile(loss='categorical_crossentropy', 
              optimizer=SGD(learning_rate=0.1))

epochs = 100
start_epoch = 75

# define swa callback
swa = SWA(start_epoch=start_epoch, 
          lr_schedule='constant', 
          swa_lr=0.01, 
          verbose=1)

# train
model.fit(X, y, epochs=epochs, verbose=1, callbacks=[swa])

Output

Epoch 1/100
1000/1000 [==============================] - 1s 703us/step - loss: 0.7518
Epoch 2/100
1000/1000 [==============================] - 0s 47us/step - loss: 0.5997
...
Epoch 74/100
1000/1000 [==============================] - 0s 31us/step - loss: 0.3913
Epoch 75/100
Epoch 00075: starting stochastic weight averaging
1000/1000 [==============================] - 0s 202us/step - loss: 0.3907
Epoch 76/100
1000/1000 [==============================] - 0s 47us/step - loss: 0.3911
...
Epoch 99/100
1000/1000 [==============================] - 0s 31us/step - loss: 0.3910
Epoch 100/100
1000/1000 [==============================] - 0s 47us/step - loss: 0.3905

Epoch 00100: final model weights set to stochastic weight average

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

0.1.7

Sep 28, 2021

0.1.6

May 2, 2021

0.1.5

Feb 24, 2020

0.1.4

Jan 16, 2020

0.1.3

Dec 14, 2019

0.1.2

Oct 22, 2019

0.1.1

Oct 19, 2019

0.1.0

Oct 18, 2019

This version

0.0.6

Oct 18, 2019

0.0.5

Oct 14, 2019

0.0.4

Oct 14, 2019

0.0.3

Oct 7, 2019

0.0.2

Oct 3, 2019

0.0.1

Oct 2, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

keras-swa-0.0.6.tar.gz (3.1 kB view hashes)

Uploaded Oct 18, 2019 Source

Hashes for keras-swa-0.0.6.tar.gz

Hashes for keras-swa-0.0.6.tar.gz
Algorithm	Hash digest
SHA256	`b679df949073ca7e64c6cbea9660b92485a0b6e60b0262acf927116b297538b5`
MD5	`fabe0ad5b6f0a793815066b24d7941f3`
BLAKE2b-256	`dbf42aabf7096c60c58f3ff6bb086ea359a474c6bd96c741dbaeb9866f477fcf`