PyTorch interface for TrueGrad-AdamW

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

Project description

TrueGrad

PyTorch interface for TrueGrad-AdamW

Getting Started

Installation

python3 -m pip install truegrad

Examples

Patch Custom Models

The easiest way to integrate TrueGrad into existing models is to patch them using truegrad.utils.patch_model(). patch_model() will go through alltorch.nn.Module's in PyTorch model and convert their torch.nn.Parameter's to truegrad.nn.TrueGradParameter's. A TrueGradParameter acts largely the same as a torch.nn.Parameter, but adds required operations into the model's backward pass.
Patching an existing

import transformers
from truegrad.utils import patch_model
from truegrad.optim import TGAdamW

model = transformers.BertModel.from_pretrained("google/bert_uncased_L-2_H-128_A-2")  # any existing model
tokenizer = transformers.BertTokenizer.from_pretrained("google/bert_uncased_L-2_H-128_A-2")

patch_model(model)  # replace torch.nn.Parameter with truegrad.nn.Parameter
optim = TGAdamW(model.parameters())  # truegrad.optim.TGAdamW instead of torch.optim.AdamW

# training loop as normal
for sample in ["Hello", "World", "!"]:
    out = model(**tokenizer([sample], return_tensors="pt"))
    out[0].mean().backward()
    optim.step()

nn

Patching existing PyTorch computation graphs on the fly might add unnecessary memory and computation. That's why a pre-patched alternative of torch.nn with hand-crafted gradients exists alongside the truegrad.utils module. Compared to truegrad.utils.patch_model(), truegrad.nn offers higher speeds and lower memory usage, although it might require code alterations and doesn't support all models. You cannot (currently) use truegrad.nn with truegrad.utils, as both use different ways to arrive at the same value.

import torch
from truegrad import nn
from truegrad.optim import TGAdamW

# define model by mixing truegrad.nn and torch.nn
model = torch.nn.Sequential(nn.Linear(1, 10),
                            nn.LayerNorm([1, 10]),
                            torch.nn.ReLU(),
                            nn.Linear(10, 1))
optim = TGAdamW(model.parameters())  # truegrad.optim.TGAdamW instead of torch.optim.AdamW

# training loop as normal
while True:
    input = torch.randn((16, 1))
    model(input).mean().backward()
    optim.step()

Partial TrueGrad

Unfortunately, it's not always sensible to apply TrueGrad, as some backward passes are too slow to do them twice. Therefore, it can be an option to use TGAdamW only on specific subsections of the model. To do so, you can either check which parameters are of type truegrad.nn.TrueGradParameter when using truegrad.utils.patch_model() or which parameters belong to a module listed in truegrad.nn.modules. For example, the code from #nn could be extended in the following way:

import torch
from truegrad import nn
from truegrad.optim import TGAdamW

model = torch.nn.Sequential(nn.Linear(1, 10),  # Weights coming from truegrad.nn 
                            nn.LayerNorm([1, 10]),
                            torch.nn.ReLU(),
                            torch.nn.Linear(10, 1))  # Weights coming torch.nn

truegrad_parameters = []
normal_parameters = []


def get_parameters(mod: torch.nn.Module):
    if isinstance(mod, nn.modules):
        truegrad_parameters.extend(list(mod.parameters(recurse=False)))
    else:
        # you could do truegrad.utils.patch_model(mod, recurse=False) here!
        normal_parameters.extend(list(mod.parameters(recurse=False)))


model = model.apply(get_parameters)

optim0 = TGAdamW(truegrad_parameters)
optim1 = torch.optim.AdamW(normal_parameters)

while True:
    input = torch.randn((16, 1))
    model(input).mean().backward()
    optim0.step()  # update both parameter sets separately
    optim1.step()

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

Release history Release notifications | RSS feed

5.0.0

Feb 4, 2024

4.0.3

Jul 16, 2023

4.0.2

Apr 23, 2023

4.0.1

Apr 23, 2023

4.0.0

Apr 23, 2023

3.1.1

Mar 18, 2023

3.0.0

Mar 18, 2023

2.6.0

Mar 4, 2023

2.5.4

Feb 24, 2023

2.5.3

Feb 23, 2023

2.5.2

Feb 23, 2023

2.5.1

Feb 23, 2023

2.5.0

Feb 19, 2023

2.4.0

Jan 14, 2023

2.3.5

Jan 14, 2023

2.3.4

Jan 14, 2023

2.3.3

Jan 14, 2023

2.3.2

Jan 14, 2023

2.3.1

Jan 14, 2023

2.3.0

Jan 14, 2023

2.2.0

Jan 14, 2023

2.1.1

Nov 29, 2022

2.1.0

Nov 29, 2022

2.0.0

Nov 27, 2022

1.0.0

Nov 27, 2022

0.1.0

Nov 26, 2022

0.0.9

Nov 23, 2022

0.0.7

Nov 22, 2022

0.0.6

Nov 21, 2022

This version

0.0.4

Nov 21, 2022

0.0.3

Nov 21, 2022

0.0.2

Nov 21, 2022

0.0.1

Nov 20, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

truegrad-0.0.4.tar.gz (9.1 kB view hashes)

Uploaded Nov 21, 2022 Source

Built Distribution

truegrad-0.0.4-py3-none-any.whl (8.4 kB view hashes)

Uploaded Nov 21, 2022 Python 3

Hashes for truegrad-0.0.4.tar.gz

Hashes for truegrad-0.0.4.tar.gz
Algorithm	Hash digest
SHA256	`6cecc404dcc301d7ab88b205adadd62f0d0714d61bfcfa9e3b14e5c9ac844cc7`
MD5	`8e0b206970c0709a6191e31d766a2f70`
BLAKE2b-256	`c071c15e6da31fc93cc3408b27db6303c0aeb3011389bc6296462217d500228d`

Hashes for truegrad-0.0.4-py3-none-any.whl

Hashes for truegrad-0.0.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a1c306ab45db0a89184808d88ec760d5a84767a60c26f16cfa2431e183002120`
MD5	`a2e135b7927b8eaa2bd26564015be5c4`
BLAKE2b-256	`c284ecb2da09606fcaf52692a3ed0f670dc6e1ec4f8c6d729b4bc32353eac8f4`