Skip to main content

Measure neural network device specific metrics (latency, flops, etc.)

Project description

torchprof

PyPI version CircleCI

A minimal dependency library for layer-by-layer profiling of PyTorch models.

All metrics are derived using the PyTorch autograd profiler.

Quickstart

pip install torchprof

import torch
import torchvision
import torchprof

model = torchvision.models.alexnet(pretrained=False).cuda()
x = torch.rand([1, 3, 224, 224]).cuda()

# `profile_memory` was added in PyTorch 1.6, this will output a runtime warning if unsupported.
with torchprof.Profile(model, use_cuda=True, profile_memory=True) as prof:
    model(x)

# equivalent to `print(prof)` and `print(prof.display())`
print(prof.display(show_events=False))
Module         | Self CPU total | CPU total | Self CUDA total | CUDA total | Self CPU Mem | CPU Mem | Self CUDA Mem | CUDA Mem  | Number of Calls
---------------|----------------|-----------|-----------------|------------|--------------|---------|---------------|-----------|----------------
AlexNet        |                |           |                 |            |              |         |               |           |
├── features   |                |           |                 |            |              |         |               |           |
│├── 0         | 1.831ms        | 7.260ms   | 1.830ms         | 7.230ms    | 0 b          | 0 b     | 3.71 Mb       | 756.50 Kb | 1
│├── 1         | 46.768us       | 68.950us  | 46.976us        | 70.528us   | 0 b          | 0 b     | 0 b           | 0 b       | 1
│├── 2         | 80.361us       | 166.213us | 79.872us        | 149.696us  | 0 b          | 0 b     | 1.60 Mb       | 547.00 Kb | 1
│├── 3         | 277.412us      | 1.205ms   | 492.544us       | 1.932ms    | 0 b          | 0 b     | 2.68 Mb       | 547.00 Kb | 1
│├── 4         | 28.274us       | 40.156us  | 27.872us        | 41.184us   | 0 b          | 0 b     | 0 b           | 0 b       | 1
│├── 5         | 57.138us       | 124.176us | 56.512us        | 109.536us  | 0 b          | 0 b     | 1.11 Mb       | 380.50 Kb | 1
│├── 6         | 173.517us      | 674.434us | 210.880us       | 809.824us  | 0 b          | 0 b     | 8.27 Mb       | 253.50 Kb | 1
│├── 7         | 27.382us       | 38.754us  | 27.648us        | 39.936us   | 0 b          | 0 b     | 0 b           | 0 b       | 1
│├── 8         | 144.863us      | 556.345us | 207.872us       | 798.368us  | 0 b          | 0 b     | 10.20 Mb      | 169.00 Kb | 1
│├── 9         | 27.552us       | 39.224us  | 26.752us        | 39.072us   | 0 b          | 0 b     | 0 b           | 0 b       | 1
│├── 10        | 138.752us      | 531.703us | 173.056us       | 661.568us  | 0 b          | 0 b     | 7.08 Mb       | 169.00 Kb | 1
│├── 11        | 27.743us       | 39.515us  | 27.648us        | 39.936us   | 0 b          | 0 b     | 0 b           | 0 b       | 1
│└── 12        | 60.333us       | 133.099us | 59.392us        | 116.768us  | 0 b          | 0 b     | 324.00 Kb     | 108.00 Kb | 1
├── avgpool    | 55.655us       | 110.770us | 57.344us        | 107.456us  | 0 b          | 0 b     | 108.00 Kb     | 36.00 Kb  | 1
└── classifier |                |           |                 |            |              |         |               |           |
 ├── 0         | 77.746us       | 165.089us | 77.696us        | 144.064us  | 0 b          | 0 b     | 171.00 Kb     | 45.00 Kb  | 1
 ├── 1         | 405.262us      | 425.012us | 796.672us       | 796.672us  | 0 b          | 0 b     | 32.00 Kb      | 16.00 Kb  | 1
 ├── 2         | 29.455us       | 42.329us  | 29.472us        | 42.976us   | 0 b          | 0 b     | 0 b           | 0 b       | 1
 ├── 3         | 53.601us       | 120.870us | 53.248us        | 99.328us   | 0 b          | 0 b     | 76.00 Kb      | 20.00 Kb  | 1
 ├── 4         | 63.981us       | 79.811us  | 232.448us       | 232.448us  | 0 b          | 0 b     | 32.00 Kb      | 16.00 Kb  | 1
 ├── 5         | 27.853us       | 39.445us  | 27.648us        | 40.928us   | 0 b          | 0 b     | 0 b           | 0 b       | 1
 └── 6         | 61.656us       | 76.714us  | 96.256us        | 96.256us   | 0 b          | 0 b     | 8.00 Kb       | 4.00 Kb   | 1

To see the low level operations that occur within each layer, print the contents of prof.display(show_events=True).

Module                              | Self CPU total | CPU total | Self CUDA total | CUDA total | Self CPU Mem | CPU Mem | Self CUDA Mem | CUDA Mem  | Number of Calls
------------------------------------|----------------|-----------|-----------------|------------|--------------|---------|---------------|-----------|----------------
AlexNet                             |                |           |                 |            |              |         |               |           |
├── features                        |                |           |                 |            |              |         |               |           |
│├── 0                              |                |           |                 |            |              |         |               |           |
││├── aten::conv2d                  | 15.779us       | 1.831ms   | 14.336us        | 1.830ms    | 0 b          | 0 b     | 756.50 Kb     | 0 b       | 1
││├── aten::convolution             | 10.139us       | 1.815ms   | 8.512us         | 1.816ms    | 0 b          | 0 b     | 756.50 Kb     | 0 b       | 1
││├── aten::_convolution            | 45.115us       | 1.805ms   | 36.288us        | 1.808ms    | 0 b          | 0 b     | 756.50 Kb     | 0 b       | 1
││├── aten::contiguous              | 8.586us        | 8.586us   | 8.160us         | 8.160us    | 0 b          | 0 b     | 0 b           | 0 b       | 3
││├── aten::cudnn_convolution       | 1.646ms        | 1.682ms   | 1.745ms         | 1.749ms    | 0 b          | 0 b     | 756.50 Kb     | -18.00 Kb | 1
││├── aten::empty                   | 21.821us       | 21.821us  | 0.000us         | 0.000us    | 0 b          | 0 b     | 774.50 Kb     | 774.50 Kb | 2
││├── aten::resize_                 | 7.324us        | 7.324us   | 0.000us         | 0.000us    | 0 b          | 0 b     | 0 b           | 0 b       | 2
││├── aten::stride                  | 2.073us        | 2.073us   | 0.000us         | 0.000us    | 0 b          | 0 b     | 0 b           | 0 b       | 4
││├── aten::reshape                 | 5.701us        | 17.603us  | 1.056us         | 1.056us    | 0 b          | 0 b     | 0 b           | 0 b       | 1
││├── aten::view                    | 11.902us       | 11.902us  | 0.000us         | 0.000us    | 0 b          | 0 b     | 0 b           | 0 b       | 1
││└── aten::add_                    | 56.837us       | 56.837us  | 17.408us        | 17.408us   | 0 b          | 0 b     | 0 b           | 0 b       | 1
│├── 1                              |                |           |                 |            |              |         |               |           |
││├── aten::relu_                   | 24.586us       | 46.768us  | 23.424us        | 46.976us   | 0 b          | 0 b     | 0 b           | 0 b       | 1
││└── aten::threshold_              | 22.182us       | 22.182us  | 23.552us        | 23.552us   | 0 b          | 0 b     | 0 b           | 0 b       | 1
│├── 2                              |                |           |                 |            |              |         |               |           |
...

The original Pytorch EventList can be returned by calling raw() on the profile instance.

trace, event_lists_dict = prof.raw()
print(trace[2])
# Trace(path=('AlexNet', 'features', '0'), leaf=True, module=Conv2d(3, 64, kernel_size=(11, 11), stride=(4, 4), padding=(2, 2)))

print(event_lists_dict[trace[2].path][0])
---------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  
                       Name    Self CPU %      Self CPU   CPU total %     CPU total  CPU time avg     Self CUDA   Self CUDA %    CUDA total  CUDA time avg       CPU Mem  Self CPU Mem      CUDA Mem  Self CUDA Mem    # of Calls  
---------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  
               aten::conv2d         0.86%      15.779us       100.00%       1.831ms       1.831ms      14.336us         0.78%       1.830ms       1.830ms           0 b           0 b     756.50 Kb           0 b             1  
          aten::convolution         0.55%      10.139us        99.14%       1.815ms       1.815ms       8.512us         0.47%       1.816ms       1.816ms           0 b           0 b     756.50 Kb           0 b             1  
         aten::_convolution         2.46%      45.115us        98.58%       1.805ms       1.805ms      36.288us         1.98%       1.808ms       1.808ms           0 b           0 b     756.50 Kb           0 b             1  
           aten::contiguous         0.20%       3.697us         0.20%       3.697us       3.697us       3.616us         0.20%       3.616us       3.616us           0 b           0 b           0 b           0 b             1  
    aten::cudnn_convolution        89.88%       1.646ms        91.85%       1.682ms       1.682ms       1.745ms        95.31%       1.749ms       1.749ms           0 b           0 b     756.50 Kb     -18.00 Kb             1  
                aten::empty         0.67%      12.313us         0.67%      12.313us      12.313us       0.000us         0.00%       0.000us       0.000us           0 b           0 b     756.50 Kb     756.50 Kb             1  
           aten::contiguous         0.14%       2.575us         0.14%       2.575us       2.575us       2.464us         0.13%       2.464us       2.464us           0 b           0 b           0 b           0 b             1  
              aten::resize_         0.37%       6.843us         0.37%       6.843us       6.843us       0.000us         0.00%       0.000us       0.000us           0 b           0 b           0 b           0 b             1  
           aten::contiguous         0.13%       2.314us         0.13%       2.314us       2.314us       2.080us         0.11%       2.080us       2.080us           0 b           0 b           0 b           0 b             1  
              aten::resize_         0.03%       0.481us         0.03%       0.481us       0.481us       0.000us         0.00%       0.000us       0.000us           0 b           0 b           0 b           0 b             1  
               aten::stride         0.07%       1.203us         0.07%       1.203us       1.203us       0.000us         0.00%       0.000us       0.000us           0 b           0 b           0 b           0 b             1  
               aten::stride         0.02%       0.300us         0.02%       0.300us       0.300us       0.000us         0.00%       0.000us       0.000us           0 b           0 b           0 b           0 b             1  
               aten::stride         0.02%       0.290us         0.02%       0.290us       0.290us       0.000us         0.00%       0.000us       0.000us           0 b           0 b           0 b           0 b             1  
               aten::stride         0.02%       0.280us         0.02%       0.280us       0.280us       0.000us         0.00%       0.000us       0.000us           0 b           0 b           0 b           0 b             1  
                aten::empty         0.52%       9.508us         0.52%       9.508us       9.508us       0.000us         0.00%       0.000us       0.000us           0 b           0 b      18.00 Kb      18.00 Kb             1  
              aten::reshape         0.31%       5.701us         0.96%      17.603us      17.603us       1.056us         0.06%       1.056us       1.056us           0 b           0 b           0 b           0 b             1  
                 aten::view         0.65%      11.902us         0.65%      11.902us      11.902us       0.000us         0.00%       0.000us       0.000us           0 b           0 b           0 b           0 b             1  
                 aten::add_         3.10%      56.837us         3.10%      56.837us      56.837us      17.408us         0.95%      17.408us      17.408us           0 b           0 b           0 b           0 b             1  
---------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  
Self CPU time total: 1.831ms
CUDA time total: 1.830ms

Layers can be selected for individually using the optional paths kwarg. Profiling is ignored for all other layers.

model = torchvision.models.alexnet(pretrained=False)
x = torch.rand([1, 3, 224, 224])

# Layer does not have to be a leaf layer
paths = [("AlexNet", "features", "3"), ("AlexNet", "classifier")]

with torchprof.Profile(model, paths=paths) as prof:
    model(x)

print(prof)
Module         | Self CPU total | CPU total | Number of Calls
---------------|----------------|-----------|----------------
AlexNet        |                |           |
├── features   |                |           |
│├── 0         |                |           |
│├── 1         |                |           |
│├── 2         |                |           |
│├── 3         | 2.079ms        | 8.296ms   | 1
│├── 4         |                |           |
│├── 5         |                |           |
│├── 6         |                |           |
│├── 7         |                |           |
│├── 8         |                |           |
│├── 9         |                |           |
│├── 10        |                |           |
│├── 11        |                |           |
│└── 12        |                |           |
├── avgpool    |                |           |
└── classifier | 10.734ms       | 11.282ms  | 1
 ├── 0         |                |           |
 ├── 1         |                |           |
 ├── 2         |                |           |
 ├── 3         |                |           |
 ├── 4         |                |           |
 ├── 5         |                |           |
 └── 6         |                |           |

Citation

If this software is useful to your research, I would greatly appreciate a citation in your work.

@misc{awwong1-torchprof,
  title        = {torchprof},
  author       = {Alexander William Wong},
  month        = 12,
  year         = 2020,
  url          = {https://github.com/awwong1/torchprof}
  note         = {https://github.com/awwong1/torchprof}
}

LICENSE

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

torchprof-1.3.2.tar.gz (12.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

torchprof-1.3.2-py3-none-any.whl (11.1 kB view details)

Uploaded Python 3

File details

Details for the file torchprof-1.3.2.tar.gz.

File metadata

  • Download URL: torchprof-1.3.2.tar.gz
  • Upload date:
  • Size: 12.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.24.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.53.0 CPython/3.7.3

File hashes

Hashes for torchprof-1.3.2.tar.gz
Algorithm Hash digest
SHA256 d7caecb5c0e7199bb67e429910fb8e340406e16a906535d3fb12793f55f0f463
MD5 c37fb69517f7f1614d1b83e6edad95d8
BLAKE2b-256 b7a3f2f34f638827da7d42b966ca99080c2693649d25bab2aeb3ecc218c92a98

See more details on using hashes here.

File details

Details for the file torchprof-1.3.2-py3-none-any.whl.

File metadata

  • Download URL: torchprof-1.3.2-py3-none-any.whl
  • Upload date:
  • Size: 11.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.24.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.53.0 CPython/3.7.3

File hashes

Hashes for torchprof-1.3.2-py3-none-any.whl
Algorithm Hash digest
SHA256 dd823e52cade5bc4d766c6845cece3388234491a27bdab8e05ce44d7e953c073
MD5 86b145c08a1b99ac7c51a1f55cbb8a38
BLAKE2b-256 429f5f0a713cfc91fd15ed87d8acfa172b6ee3273a6335853f5fd5cf5403a302

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page