Debug PyTorch code using PySnooper.
Project description
TorchSnooper
Checks:
Deploy (only succeed on tagged commits):
Do you want to look at the shape/dtype/etc. of every step of you model, but tired of manually writing prints?
Are you bothered by errors like RuntimeError: Expected object of scalar type Double but got scalar type Float
, and want to quickly figure out the problem?
TorchSnooper is a PySnooper extension that helps you debugging these errors.
To use TorchSnooper, you just use it like using PySnooper. Remember to replace the pysnooper.snoop
with torchsnooper.snoop
in your code.
To install:
pip install torchsnooper
Example 1
We're writing a simple function:
def myfunc(mask, x):
y = torch.zeros(6)
y.masked_scatter_(mask, x)
return y
and use it like below
mask = torch.tensor([0, 1, 0, 1, 1, 0], device='cuda')
source = torch.tensor([1.0, 2.0, 3.0], device='cuda')
y = myfunc(mask, source)
The above code seems to be correct, but unfortunately, we are getting the following error:
RuntimeError: Expected object of backend CPU but got backend CUDA for argument #2 'mask'
What is the problem? Let's snoop it! Decorate our function with torchsnooper.snoop()
:
import torch
import torchsnooper
@torchsnooper.snoop()
def myfunc(mask, x):
y = torch.zeros(6)
y.masked_scatter_(mask, x)
return y
mask = torch.tensor([0, 1, 0, 1, 1, 0], device='cuda')
source = torch.tensor([1.0, 2.0, 3.0], device='cuda')
y = myfunc(mask, source)
Run our script, and we will see:
Starting var:.. mask = tensor<(6,), int64, cuda:0>
Starting var:.. x = tensor<(3,), float32, cuda:0>
21:41:42.941668 call 5 def myfunc(mask, x):
21:41:42.941834 line 6 y = torch.zeros(6)
New var:....... y = tensor<(6,), float32, cpu>
21:41:42.943443 line 7 y.masked_scatter_(mask, x)
21:41:42.944404 exception 7 y.masked_scatter_(mask, x)
Now pay attention to the devices of tensors, we notice
New var:....... y = tensor<(6,), float32, cpu>
Now, it's clear that, the problem is because y
is a tensor on CPU, that is,
we forget to specify the device on y = torch.zeros(6)
. Changing it to
y = torch.zeros(6, device='cuda')
, this problem is solved.
But when running the script again we are getting another error:
RuntimeError: Expected object of scalar type Byte but got scalar type Long for argument #2 'mask'
Look at the trace above again, pay attention to the dtype of variables, we notice
Starting var:.. mask = tensor<(6,), int64, cuda:0>
OK, the problem is that, we didn't make the mask
in the input a byte tensor. Changing the line into
mask = torch.tensor([0, 1, 0, 1, 1, 0], device='cuda', dtype=torch.uint8)
Problem solved.
Example 2
We are building a linear model
class Model(torch.nn.Module):
def __init__(self):
super().__init__()
self.layer = torch.nn.Linear(2, 1)
def forward(self, x):
return self.layer(x)
and we want to fit y = x1 + 2 * x2 + 3
, so we create a dataset:
x = torch.tensor([[0.0, 0.0], [0.0, 1.0], [1.0, 0.0], [1.0, 1.0]])
y = torch.tensor([3.0, 5.0, 4.0, 6.0])
We train our model on this dataset using SGD optimizer:
model = Model()
optimizer = torch.optim.SGD(model.parameters(), lr=0.1)
for _ in range(10):
optimizer.zero_grad()
pred = model(x)
squared_diff = (y - pred) ** 2
loss = squared_diff.mean()
print(loss.item())
loss.backward()
optimizer.step()
But unfortunately, the loss does not go down to a low enough number.
What's wrong? Let's snoop it! Putting the training loop inside snoop:
with torchsnooper.snoop():
for _ in range(100):
optimizer.zero_grad()
pred = model(x)
squared_diff = (y - pred) ** 2
loss = squared_diff.mean()
print(loss.item())
loss.backward()
optimizer.step()
Part of the trace looks like:
New var:....... x = tensor<(4, 2), float32, cpu>
New var:....... y = tensor<(4,), float32, cpu>
New var:....... model = Model( (layer): Linear(in_features=2, out_features=1, bias=True))
New var:....... optimizer = SGD (Parameter Group 0 dampening: 0 lr: 0....omentum: 0 nesterov: False weight_decay: 0)
22:27:01.024233 line 21 for _ in range(100):
New var:....... _ = 0
22:27:01.024439 line 22 optimizer.zero_grad()
22:27:01.024574 line 23 pred = model(x)
New var:....... pred = tensor<(4, 1), float32, cpu, grad>
22:27:01.026442 line 24 squared_diff = (y - pred) ** 2
New var:....... squared_diff = tensor<(4, 4), float32, cpu, grad>
22:27:01.027369 line 25 loss = squared_diff.mean()
New var:....... loss = tensor<(), float32, cpu, grad>
22:27:01.027616 line 26 print(loss.item())
22:27:01.027793 line 27 loss.backward()
22:27:01.050189 line 28 optimizer.step()
We notice that, y
has shape (4,)
, but pred
has shape (4, 1)
. As a result, squared_diff
has shape (4, 4)
due to broadcasting!
This is not the expected behavior, let's fix it: pred = model(x).squeeze()
, now everything looks good:
New var:....... x = tensor<(4, 2), float32, cpu>
New var:....... y = tensor<(4,), float32, cpu>
New var:....... model = Model( (layer): Linear(in_features=2, out_features=1, bias=True))
New var:....... optimizer = SGD (Parameter Group 0 dampening: 0 lr: 0....omentum: 0 nesterov: False weight_decay: 0)
22:28:19.778089 line 21 for _ in range(100):
New var:....... _ = 0
22:28:19.778293 line 22 optimizer.zero_grad()
22:28:19.778436 line 23 pred = model(x).squeeze()
New var:....... pred = tensor<(4,), float32, cpu, grad>
22:28:19.780250 line 24 squared_diff = (y - pred) ** 2
New var:....... squared_diff = tensor<(4,), float32, cpu, grad>
22:28:19.781099 line 25 loss = squared_diff.mean()
New var:....... loss = tensor<(), float32, cpu, grad>
22:28:19.781361 line 26 print(loss.item())
22:28:19.781537 line 27 loss.backward()
22:28:19.798983 line 28 optimizer.step()
And the final model converge to the desired values.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for TorchSnooper-0.4.1.linux-x86_64.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | ca20917a03d2dca23c4474d28a635981ec5917d80d0d34d5e4b72327ab8d40f5 |
|
MD5 | 596f855ea38106c26b8847e330121d26 |
|
BLAKE2b-256 | b2b5754db71878deeb8bb591b3becaa19ae19df86a4e3efdaf26c684302373a3 |
Hashes for TorchSnooper-0.4.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 696e54b09cbc95d07b7c040f206fb0812308a37528c0dd77c1edb5cdd4c12480 |
|
MD5 | 47950d50797b57227e74ab677b8a5fdc |
|
BLAKE2b-256 | e162bd535c0b1533be9e573cfd753a4b89c2e643c6716871798d44425d234885 |