Measure neural network device specific metrics (latency, flops, etc.)
Project description
torchprof
A minimal dependency library for layer-by-layer profiling of Pytorch models.
All metrics are derived using the PyTorch autograd profiler.
Quickstart
pip install torchprof
import torch
import torchvision
import torchprof
model = torchvision.models.alexnet(pretrained=False).cuda()
x = torch.rand([1, 3, 224, 224]).cuda()
with torchprof.Profile(model, use_cuda=True) as prof:
model(x)
print(prof.display(show_events=False)) # equivalent to `print(prof)` and `print(prof.display())`
Module | Self CPU total | CPU total | CUDA total
---------------|----------------|-----------|-----------
AlexNet | | |
├── features | | |
│ ├── 0 | 1.938ms | 7.639ms | 7.696ms
│ ├── 1 | 65.590us | 65.590us | 66.560us
│ ├── 2 | 117.789us | 191.029us | 164.864us
│ ├── 3 | 251.648us | 963.273us | 1.737ms
│ ├── 4 | 18.019us | 18.019us | 19.456us
│ ├── 5 | 30.349us | 53.739us | 54.272us
│ ├── 6 | 130.109us | 482.766us | 645.056us
│ ├── 7 | 17.250us | 17.250us | 18.336us
│ ├── 8 | 83.779us | 297.796us | 538.656us
│ ├── 9 | 16.840us | 16.840us | 17.408us
│ ├── 10 | 85.119us | 301.186us | 441.024us
│ ├── 11 | 16.910us | 16.910us | 17.408us
│ └── 12 | 28.240us | 49.630us | 49.280us
├── avgpool | 43.489us | 76.088us | 80.896us
└── classifier | | |
├── 0 | 626.506us | 1.240ms | 1.362ms
├── 1 | 235.148us | 235.148us | 648.192us
├── 2 | 18.360us | 18.360us | 19.360us
├── 3 | 30.770us | 54.640us | 55.296us
├── 4 | 39.189us | 39.189us | 209.920us
├── 5 | 16.430us | 16.430us | 17.408us
└── 6 | 38.270us | 38.270us | 79.648us
To see the low level operations that occur within each layer, print the contents of prof.display(show_events=True)
.
Module | Self CPU total | CPU total | CUDA total
----------------------------------|----------------|-----------|-----------
AlexNet | | |
├── features | | |
│ ├── 0 | | |
│ │ ├── conv2d | 17.070us | 1.938ms | 1.950ms
│ │ ├── convolution | 12.240us | 1.921ms | 1.935ms
│ │ ├── _convolution | 36.129us | 1.908ms | 1.923ms
│ │ ├── contiguous | 6.820us | 6.820us | 6.688us
│ │ └── cudnn_convolution | 1.865ms | 1.865ms | 1.882ms
│ ├── 1 | | |
│ │ └── relu_ | 65.590us | 65.590us | 66.560us
│ ├── 2 | | |
│ │ ├── max_pool2d | 44.549us | 117.789us | 91.136us
│ │ └── max_pool2d_with_indices | 73.240us | 73.240us | 73.728us
│ ├── 3 | | |
...
The original Pytorch EventList can be returned by calling raw()
on the profile instance.
trace, event_lists_dict = prof.raw()
print(trace[2])
# Trace(path=('AlexNet', 'features', '0'), leaf=True, module=Conv2d(3, 64, kernel_size=(11, 11), stride=(4, 4), padding=(2, 2)))
print(event_lists_dict[trace[2].path][0])
--------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- ---------------
Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg CUDA total % CUDA total CUDA time avg Number of Calls
--------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- ---------------
conv2d 0.88% 17.070us 100.00% 1.938ms 1.938ms 25.34% 1.950ms 1.950ms 1
convolution 0.63% 12.240us 99.12% 1.921ms 1.921ms 25.14% 1.935ms 1.935ms 1
_convolution 1.86% 36.129us 98.49% 1.908ms 1.908ms 24.99% 1.923ms 1.923ms 1
contiguous 0.35% 6.820us 0.35% 6.820us 6.820us 0.09% 6.688us 6.688us 1
cudnn_convolution 96.27% 1.865ms 96.27% 1.865ms 1.865ms 24.45% 1.882ms 1.882ms 1
--------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- ---------------
Self CPU time total: 1.938ms
CUDA time total: 7.696ms
LICENSE
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
torchprof-0.2.1.tar.gz
(7.3 kB
view details)
Built Distribution
File details
Details for the file torchprof-0.2.1.tar.gz
.
File metadata
- Download URL: torchprof-0.2.1.tar.gz
- Upload date:
- Size: 7.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.32.2 CPython/3.7.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1a44af5a752f33054fea60c07351c07d05edddb5b2b2522da5f44b02ae1e57e0 |
|
MD5 | da80392668073f44264be4294e230436 |
|
BLAKE2b-256 | 06504213d9d8006ff8fde082569b900ce8d8ef095888d10d0e70140f41fc4804 |
File details
Details for the file torchprof-0.2.1-py3-none-any.whl
.
File metadata
- Download URL: torchprof-0.2.1-py3-none-any.whl
- Upload date:
- Size: 8.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.32.2 CPython/3.7.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0dc959be125e53bd6596bd317aa1f8f485f6ae4b9cf4f21bf70bb96629c8d932 |
|
MD5 | d5520468a1c972ab2054bc6055328cd1 |
|
BLAKE2b-256 | 0c6132b27cc00e12acef06df2494bf517a38d72efa7054b52d372c446771705e |