torch-clip a library to improve optimization methods by clipping off heavy-tailed gradient. This makes it possible to increase the accuracy and speed of convergence during the training of neural networks on a specific number of tasks.

Project description

TorchClippedOptimizers

torch-clip a library to improve optimization methods by clipping off heavy-tailed gradient. This makes it possible to increase the accuracy and speed of convergence during the training of neural networks on a specific number of tasks.

Installation

you can install our library using pip:
pip install torch-clip

numpy~=1.20.0
torch~=1.11.0+cu113
matplotlib~=3.4.3
tqdm~=4.62.3

What do you need us for?

In the last few years, for various neural network training models (for example, BERT + CoLA), it has been found that in the case of "large stochastic gradients", it is advantageous to use special clipping (clipping/normalization) of the batched gradient. Since all modern machine learning, one way or another, ultimately boils down to stochastic optimization problems, the question of exactly how to "clip" large values of batched gradients plays a key role in the development of effective numerical training methods for a large class of models. This repository implements optimizers for the pytorch library with different clipping methods.

Our repository

The source code and research results can be found at the link: https://github.com/EugGolovanov/TorchClippedOptimizers

Use example

You can use our optimizers as well as all the standard optimizers from the pytorch library

from torch_clip.optimizers import  ClippedSGD
optimizer = ClippedSGD(model.parameters(), lr=5e-2, momentum=0.9, clipping_type="layer_wise", clipping_level=1)

loss = my_loss_function
for epoch in range(EPOCHS):
    for i, data in enumerate(train_loader, 0):
        outputs = net(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
        optimizer.zero_grad()

Use example (with restarts)

from torch_clip.optimizers import ClippedSGD
from torch_clip.restarter import Restarter
from torch_clip.optimizers_collector import OptimizerProperties, ModelProperties, RestartProperties

loss = my_loss_function
model = my_model_object

optimizer_props = OptimizerProperties(ClippedSGD, lr=5e-2, momentum=0.9, 
                                      clipping_type="layer_wise", clipping_level=1)
restarter = Restarter(optimizer_properties=optimizer_props, first_restart_steps_cnt=50,
                      restart_coeff=1.25, max_steps_cnt=2000)
optimizer = optimizer_props.optimizer_class(model.parameters(), **optimizer_props.optimizer_kwargs)

for epoch in range(EPOCHS):
    for i, data in enumerate(train_loader, 0):
        outputs = model(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
        optimizer.zero_grad()
        restarter.add_coords(model.parameters())
        optimizers = restarter.make_restart(net, optimizer)

Project details

Release history Release notifications | RSS feed

This version

0.0.6

Jul 22, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

torch_clip-0.0.6.tar.gz (12.1 kB view hashes)

Uploaded Jul 22, 2022 Source

Hashes for torch_clip-0.0.6.tar.gz

Hashes for torch_clip-0.0.6.tar.gz
Algorithm	Hash digest
SHA256	`34ab255b33e4c5371bebe6b582b89be6633097e3ec47d1781c4ea121ca8d1ef7`
MD5	`2b2e9958ea2e35bcedb595af28add25c`
BLAKE2b-256	`595fc10d9df2ce115a106b0a879e43d2ad40e67b34b1552b108031d873f2407d`