PyTorch implementation of L2L execution algorithm
Project description
L2L execution algorithm PyTorch
PyTorch implementation of L2L execution algorithm from paper Training Large Neural Networks with Constant Memory using a New Execution Algorithm
🚀 Example
You need to define a torch model where all layers are specified in ModuleList.
See examples folder
Basic usage
import torch
from torch import nn, optim
class M(nn.Module):
def __init__(self, depth: int, dim: int, hidden_dim: Optional[int] = None):
super().__init__()
hidden_dim = hidden_dim or dim
self.layers = nn.ModuleList(
[
nn.Sequential(
nn.Linear(dim, hidden_dim),
nn.BatchNorm1d(hidden_dim),
nn.LeakyReLU(),
)
]
+ [
nn.Sequential(
nn.Linear(hidden_dim, hidden_dim),
nn.BatchNorm1d(hidden_dim),
nn.LeakyReLU(),
)
for i in range(depth)
]
+ [nn.Linear(hidden_dim, dim), nn.Sigmoid()]
)
def forward(self, batch: torch.Tensor) -> torch.Tensor:
x = batch
for l in self.layers:
x = l(x)
return x
model = M(depth=5, dim=40).train() # on CPU
Then, you can use the L2L wrapper over this model.
from layer_to_layer_pytorch.l2l import Layer2Layer
l2l_model = Layer2Layer(
model,
layers_attr="layers", # attribute with ModuleList
microbatch_size=100, # size of a microbatch in a minibatch :) from original paper
verbose=False # enable tqdm
)
And train it, like torch model (almost):
from tqdm.auto import tqdm, trange
x = torch.rand(1_000, 40) # on CPU
y = torch.rand(1_000, 40) # on CPU
losses = []
criterion = nn.MSELoss()
optimizer = optim.AdamW(l2l_model.main_params) # optimizer works with the main model on CPU
for i in trange(2000):
l2l_model.zero_grad()
_ = l2l_model.forward(x)
loss_value: float = l2l_model.compute_loss(y, criterion)
if i % 50 == 0:
tqdm.write(f"[{i}] loss = {loss_value}")
losses.append(loss_value)
l2l_model.backward()
optimizer.step()
l2l_model.update_main_model_params() # Sync params with CPU
FP-16 usage
Cross-mixes-precision available in init params
from layer_to_layer_pytorch.l2l import Layer2Layer
l2l_model = Layer2Layer(
model,
layers_attr="layers",
microbatch_size=100,
# fp-16
mixed_precision=True,
loss_scale = 128.0
)
And then train the same way 😉
Installation
pip install layer-to-layer-pytorch
or install with Poetry
poetry add layer-to-layer-pytorch
📈 Releases
You can see the list of available releases on the GitHub Releases page.
We follow Semantic Versions specification.
🛡 License
This project is licensed under the terms of the MIT
license. See LICENSE for more details.
📃 Citation
This library
@misc{layer-to-layer-pytorch,
author = {Roman Tezikov},
title = {PyTorch implementation of L2L execution algorithm},
year = {2020},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/TezRomacH/layer-to-layer-pytorch}}
}
Original paper
@article{Pudipeddi2020TrainingLN,
title={Training Large Neural Networks with Constant Memory using a New Execution Algorithm},
author={Bharadwaj Pudipeddi and Maral Mesmakhosroshahi and J. Xi and S. Bharadwaj},
journal={ArXiv},
year={2020},
volume={abs/2002.05645}
}
Credits
This project was generated with python-package-template
.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for layer-to-layer-pytorch-0.3.0.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9f0ef78dfba153b608a082a9d05ca043007e6637c63ddc6472ff0bf07f639211 |
|
MD5 | 2c8f42c537e6cbad761ad851b9741534 |
|
BLAKE2b-256 | 21a693c53ef94e64f592725a5f2d677acb1f2838f21ff651fe8ee508a367ab0b |
Hashes for layer_to_layer_pytorch-0.3.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9e193ff1b64d48c15d0e70f1248160fe12e9fc32bb7ce8ade212f2fb2dc3fa95 |
|
MD5 | 7acde36142bdab34be8cefa8e08517c0 |
|
BLAKE2b-256 | 1e9ba371dc39f83046a5d779bc5d4a2fb84ea1f148d6540e53379e850bec4163 |