Debug PyTorch code using PySnooper.
Project description
TorchSnooper
Status:
Deploy (only run on release):
Do you want to look at the shape/dtype/etc. of every step of you model, but tired of manually writing prints?
Are you bothered by errors like RuntimeError: Expected object of scalar type Double but got scalar type Float
, and want to quickly figure out the problem?
TorchSnooper is a PySnooper extension that helps you debugging these errors.
To use TorchSnooper, you just use it like using PySnooper. Remember to replace the pysnooper.snoop
with torchsnooper.snoop
in your code.
To install:
pip install torchsnooper
TorchSnooper also support snoop. To use TorchSnooper with snoop, simply execute:
torchsnooper.register_snoop()
or
torchsnooper.register_snoop(verbose=True)
at the beginning, and use snoop normally.
Example 1: Monitoring device and dtype
We're writing a simple function:
def myfunc(mask, x): y = torch.zeros(6) y.masked_scatter_(mask, x) return y
and use it like below
mask = torch.tensor([0, 1, 0, 1, 1, 0], device='cuda') source = torch.tensor([1.0, 2.0, 3.0], device='cuda') y = myfunc(mask, source)
The above code seems to be correct, but unfortunately, we are getting the following error:
RuntimeError: Expected object of backend CPU but got backend CUDA for argument #2 'mask'
What is the problem? Let's snoop it! Decorate our function with torchsnooper.snoop()
:
import torch import torchsnooper @torchsnooper.snoop() def myfunc(mask, x): y = torch.zeros(6) y.masked_scatter_(mask, x) return y mask = torch.tensor([0, 1, 0, 1, 1, 0], device='cuda') source = torch.tensor([1.0, 2.0, 3.0], device='cuda') y = myfunc(mask, source)
Run our script, and we will see:
Starting var:.. mask = tensor<(6,), int64, cuda:0>
Starting var:.. x = tensor<(3,), float32, cuda:0>
21:41:42.941668 call 5 def myfunc(mask, x):
21:41:42.941834 line 6 y = torch.zeros(6)
New var:....... y = tensor<(6,), float32, cpu>
21:41:42.943443 line 7 y.masked_scatter_(mask, x)
21:41:42.944404 exception 7 y.masked_scatter_(mask, x)
Now pay attention to the devices of tensors, we notice
New var:....... y = tensor<(6,), float32, cpu>
Now, it's clear that, the problem is because y
is a tensor on CPU, that is,
we forget to specify the device on y = torch.zeros(6)
. Changing it to
y = torch.zeros(6, device='cuda')
, this problem is solved.
But when running the script again we are getting another error:
RuntimeError: Expected object of scalar type Byte but got scalar type Long for argument #2 'mask'
Look at the trace above again, pay attention to the dtype of variables, we notice
Starting var:.. mask = tensor<(6,), int64, cuda:0>
OK, the problem is that, we didn't make the mask
in the input a byte tensor. Changing the line into
mask = torch.tensor([0, 1, 0, 1, 1, 0], device='cuda', dtype=torch.uint8)
Problem solved.
Example 2: Monitoring shape
We are building a linear model
class Model(torch.nn.Module): def __init__(self): super().__init__() self.layer = torch.nn.Linear(2, 1) def forward(self, x): return self.layer(x)
and we want to fit y = x1 + 2 * x2 + 3
, so we create a dataset:
x = torch.tensor([[0.0, 0.0], [0.0, 1.0], [1.0, 0.0], [1.0, 1.0]]) y = torch.tensor([3.0, 5.0, 4.0, 6.0])
We train our model on this dataset using SGD optimizer:
model = Model() optimizer = torch.optim.SGD(model.parameters(), lr=0.1) for _ in range(10): optimizer.zero_grad() pred = model(x) squared_diff = (y  pred) ** 2 loss = squared_diff.mean() print(loss.item()) loss.backward() optimizer.step()
But unfortunately, the loss does not go down to a low enough number.
What's wrong? Let's snoop it! Putting the training loop inside snoop:
with torchsnooper.snoop(): for _ in range(100): optimizer.zero_grad() pred = model(x) squared_diff = (y  pred) ** 2 loss = squared_diff.mean() print(loss.item()) loss.backward() optimizer.step()
Part of the trace looks like:
New var:....... x = tensor<(4, 2), float32, cpu>
New var:....... y = tensor<(4,), float32, cpu>
New var:....... model = Model( (layer): Linear(in_features=2, out_features=1, bias=True))
New var:....... optimizer = SGD (Parameter Group 0 dampening: 0 lr: 0....omentum: 0 nesterov: False weight_decay: 0)
22:27:01.024233 line 21 for _ in range(100):
New var:....... _ = 0
22:27:01.024439 line 22 optimizer.zero_grad()
22:27:01.024574 line 23 pred = model(x)
New var:....... pred = tensor<(4, 1), float32, cpu, grad>
22:27:01.026442 line 24 squared_diff = (y  pred) ** 2
New var:....... squared_diff = tensor<(4, 4), float32, cpu, grad>
22:27:01.027369 line 25 loss = squared_diff.mean()
New var:....... loss = tensor<(), float32, cpu, grad>
22:27:01.027616 line 26 print(loss.item())
22:27:01.027793 line 27 loss.backward()
22:27:01.050189 line 28 optimizer.step()
We notice that, y
has shape (4,)
, but pred
has shape (4, 1)
. As a result, squared_diff
has shape (4, 4)
due to broadcasting!
This is not the expected behavior, let's fix it: pred = model(x).squeeze()
, now everything looks good:
New var:....... x = tensor<(4, 2), float32, cpu>
New var:....... y = tensor<(4,), float32, cpu>
New var:....... model = Model( (layer): Linear(in_features=2, out_features=1, bias=True))
New var:....... optimizer = SGD (Parameter Group 0 dampening: 0 lr: 0....omentum: 0 nesterov: False weight_decay: 0)
22:28:19.778089 line 21 for _ in range(100):
New var:....... _ = 0
22:28:19.778293 line 22 optimizer.zero_grad()
22:28:19.778436 line 23 pred = model(x).squeeze()
New var:....... pred = tensor<(4,), float32, cpu, grad>
22:28:19.780250 line 24 squared_diff = (y  pred) ** 2
New var:....... squared_diff = tensor<(4,), float32, cpu, grad>
22:28:19.781099 line 25 loss = squared_diff.mean()
New var:....... loss = tensor<(), float32, cpu, grad>
22:28:19.781361 line 26 print(loss.item())
22:28:19.781537 line 27 loss.backward()
22:28:19.798983 line 28 optimizer.step()
And the final model converge to the desired values.
Example 3: Monitoring nan and inf
Let's say we have a model that output the likelihood of something. For this example, we will just use a mock:
class MockModel(torch.nn.Module): def __init__(self): super(MockModel, self).__init__() self.unused = torch.nn.Linear(6, 4) def forward(self, x): return torch.tensor([0.0, 0.25, 0.9, 0.75]) + self.unused(x) * 0.0 model = MockModel()
During training, we want to minimize the negative log likelihood, we have code:
for epoch in range(100): batch_input = torch.randn(6, 6) likelihood = model(batch_input) log_likelihood = likelihood.log() target = log_likelihood.mean() print(target.item()) optimizer.zero_grad() target.backward() optimizer.step()
Unfortunately, we first get inf
then nan
for our target during training. What's wrong? Let's snoop it:
with torchsnooper.snoop(): for epoch in range(100): batch_input = torch.randn(6, 6) likelihood = model(batch_input) log_likelihood = likelihood.log() target = log_likelihood.mean() print(target.item()) optimizer.zero_grad() target.backward() optimizer.step()
We will see the part of the output of the snoop looks like:
19:30:20.928316 line 18 for epoch in range(100):
New var:....... epoch = 0
19:30:20.928575 line 19 batch_input = torch.randn(6, 6)
New var:....... batch_input = tensor<(6, 6), float32, cpu>
19:30:20.929671 line 20 likelihood = model(batch_input)
New var:....... likelihood = tensor<(6, 4), float32, cpu, grad>
19:30:20.930284 line 21 log_likelihood = likelihood.log()
New var:....... log_likelihood = tensor<(6, 4), float32, cpu, grad, has_inf>
19:30:20.930672 line 22 target = log_likelihood.mean()
New var:....... target = tensor<(), float32, cpu, grad, has_inf>
19:30:20.931136 line 23 print(target.item())
19:30:20.931508 line 25 optimizer.zero_grad()
19:30:20.931871 line 26 target.backward()
inf
19:30:20.960028 line 27 optimizer.step()
19:30:20.960673 line 18 for epoch in range(100):
Modified var:.. epoch = 1
19:30:20.961043 line 19 batch_input = torch.randn(6, 6)
19:30:20.961423 line 20 likelihood = model(batch_input)
Modified var:.. likelihood = tensor<(6, 4), float32, cpu, grad, has_nan>
19:30:20.961910 line 21 log_likelihood = likelihood.log()
Modified var:.. log_likelihood = tensor<(6, 4), float32, cpu, grad, has_nan>
19:30:20.962302 line 22 target = log_likelihood.mean()
Modified var:.. target = tensor<(), float32, cpu, grad, has_nan>
19:30:20.962715 line 23 print(target.item())
19:30:20.963089 line 25 optimizer.zero_grad()
19:30:20.963464 line 26 target.backward()
19:30:20.964051 line 27 optimizer.step()
Reading the output, we find that, at the first epoch (epoch = 0
), the log_likelihood
has has_inf
flag.
The has_inf
flag means, your tensor contains inf
in its value. The same flag appears for target
.
And at the second epoch, starting from likelihood
, tensors all have a has_nan
flag.
From our experience with deep learning, we would guess this is because the first epoch has inf
, which causes
the gradient to be nan
, and when parameters are updated, these nan
propagate to parameters and causing all
future steps to have nan
result.
Taking a deeper look, we figure out that the likelihood
contains 0 in it, which leads to log(0) = inf
. Changing
the line
log_likelihood = likelihood.log()
into
log_likelihood = likelihood.clamp(min=1e8).log()
Problem solved.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Filename, size  File type  Python version  Upload date  Hashes 

Filename, size TorchSnooper0.8.linuxx86_64.tar.gz (9.5 kB)  File type Source  Python version None  Upload date  Hashes View hashes 
Filename, size TorchSnooper0.8py3noneany.whl (7.2 kB)  File type Wheel  Python version py3  Upload date  Hashes View hashes 
Hashes for TorchSnooper0.8.linuxx86_64.tar.gz
Algorithm  Hash digest  

SHA256  baa441da053f9ce943f2862c3fcf983f90b23f0006ed7cafe7403606e5067de2 

MD5  b80acd6ae1342b1a351d45c2ac59cab2 

BLAKE2256  4f787a8da20a9bb5883825ba3ce0bd4f646dd1d7268e9d0a8fc9f2b3488508a1 
Hashes for TorchSnooper0.8py3noneany.whl
Algorithm  Hash digest  

SHA256  5b5c7a8cd6952c4b9dada9a9502baa9ef28410747c97e6702eef1330b715115a 

MD5  bf923c0e21a17bf8001ab80a62953616 

BLAKE2256  8c317ce9965aa20d00394733dc3614b929065f402fcbcbee6fba300d0df3b061 