Skip to main content

Measure neural network device specific metrics (latency, flops, etc.)

Project description

torchprof

PyPI version

A minimal dependency library for layer-by-layer profiling of Pytorch models.

All metrics are derived using the PyTorch autograd profiler.

Quickstart

pip install torchprof

import torch
import torchvision
import torchprof

model = torchvision.models.alexnet(pretrained=False).cuda()
x = torch.rand([1, 3, 224, 224]).cuda()

with torchprof.Profile(model, use_cuda=True) as prof:
    model(x)

print(prof.display(show_events=False)) # equivalent to `print(prof)` and `print(prof.display())`
Module         | Self CPU total | CPU total | CUDA total
---------------|----------------|-----------|-----------
AlexNet        |                |           |
├── features   |                |           |
│├── 0         |        1.956ms |   7.714ms |    7.787ms
│├── 1         |       68.880us |  68.880us |   69.632us
│├── 2         |       85.639us | 155.948us |  155.648us
│├── 3         |      253.419us | 970.386us |    1.747ms
│├── 4         |       18.919us |  18.919us |   19.584us
│├── 5         |       30.910us |  54.900us |   55.296us
│├── 6         |      132.839us | 492.367us |  652.192us
│├── 7         |       17.990us |  17.990us |   18.432us
│├── 8         |       87.219us | 310.776us |  552.544us
│├── 9         |       17.620us |  17.620us |   17.536us
│├── 10        |       85.690us | 303.120us |  437.248us
│├── 11        |       17.910us |  17.910us |   18.400us
│└── 12        |       29.239us |  51.488us |   52.288us
├── avgpool    |       49.230us |  85.740us |   88.960us
└── classifier |                |           |
 ├── 0         |      626.236us |   1.239ms |    1.362ms
 ├── 1         |      235.669us | 235.669us |  635.008us
 ├── 2         |       17.990us |  17.990us |   18.432us
 ├── 3         |       31.890us |  56.770us |   57.344us
 ├── 4         |       39.280us |  39.280us |  212.128us
 ├── 5         |       16.800us |  16.800us |   17.600us
 └── 6         |       38.459us |  38.459us |   79.872us

To see the low level operations that occur within each layer, print the contents of prof.display(show_events=True).

Module                        | Self CPU total | CPU total | CUDA total
------------------------------|----------------|-----------|-----------
AlexNet                       |                |           |
├── features                  |                |           |
│├── 0                        |                |           |
││├── conv2d                  |       15.740us |   1.956ms |    1.972ms
││├── convolution             |       12.000us |   1.940ms |    1.957ms
││├── _convolution            |       36.590us |   1.928ms |    1.946ms
││├── contiguous              |        6.600us |   6.600us |    6.464us
││└── cudnn_convolution       |        1.885ms |   1.885ms |    1.906ms
│├── 1                        |                |           |
││└── relu_                   |       68.880us |  68.880us |   69.632us
│├── 2                        |                |           |
││├── max_pool2d              |       15.330us |  85.639us |   84.992us
││└── max_pool2d_with_indices |       70.309us |  70.309us |   70.656us
│├── 3                        |                |           |
...

The original Pytorch EventList can be returned by calling raw() on the profile instance.

trace, event_lists_dict = prof.raw()
print(trace[2])
# Trace(path=('AlexNet', 'features', '0'), leaf=True, module=Conv2d(3, 64, kernel_size=(11, 11), stride=(4, 4), padding=(2, 2)))

print(event_lists_dict[trace[2].path][0])
---------------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------
Name                   Self CPU total %   Self CPU total      CPU total %        CPU total     CPU time avg     CUDA total %       CUDA total    CUDA time avg  Number of Calls
---------------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------
conv2d                           0.80%         15.740us          100.00%          1.956ms          1.956ms           25.32%          1.972ms          1.972ms                1
convolution                      0.61%         12.000us           99.20%          1.940ms          1.940ms           25.14%          1.957ms          1.957ms                1
_convolution                     1.87%         36.590us           98.58%          1.928ms          1.928ms           24.99%          1.946ms          1.946ms                1
contiguous                       0.34%          6.600us            0.34%          6.600us          6.600us            0.08%          6.464us          6.464us                1
cudnn_convolution               96.37%          1.885ms           96.37%          1.885ms          1.885ms           24.47%          1.906ms          1.906ms                1
---------------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------
Self CPU time total: 1.956ms
CUDA time total: 7.787ms

Layers can be selected for individually using the optional paths kwarg. Profiling is ignored for all other layers.

model = torchvision.models.alexnet(pretrained=False)
x = torch.rand([1, 3, 224, 224])

# Layer does not have to be a leaf layer
paths = [("AlexNet", "features", "3"), ("AlexNet", "classifier")]

with torchprof.Profile(model, paths=paths) as prof:
    model(x)

print(prof)
Module         | Self CPU total | CPU total | CUDA total
---------------|----------------|-----------|-----------
AlexNet        |                |           |           
├── features   |                |           |           
│├── 0         |                |           |           
│├── 1         |                |           |           
│├── 2         |                |           |           
│├── 3         |        2.846ms |  11.368ms |    0.000us
│├── 4         |                |           |           
│├── 5         |                |           |           
│├── 6         |                |           |           
│├── 7         |                |           |           
│├── 8         |                |           |           
│├── 9         |                |           |           
│├── 10        |                |           |           
│├── 11        |                |           |           
│└── 12        |                |           |           
├── avgpool    |                |           |           
└── classifier |       12.016ms |  12.206ms |    0.000us
 ├── 0         |                |           |           
 ├── 1         |                |           |           
 ├── 2         |                |           |           
 ├── 3         |                |           |           
 ├── 4         |                |           |           
 ├── 5         |                |           |           
 └── 6         |                |           |           

LICENSE

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

torchprof-1.0.0.tar.gz (7.4 kB view details)

Uploaded Source

Built Distribution

torchprof-1.0.0-py3-none-any.whl (8.3 kB view details)

Uploaded Python 3

File details

Details for the file torchprof-1.0.0.tar.gz.

File metadata

  • Download URL: torchprof-1.0.0.tar.gz
  • Upload date:
  • Size: 7.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.4.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.7.3

File hashes

Hashes for torchprof-1.0.0.tar.gz
Algorithm Hash digest
SHA256 bf48fe5683326f019f8a6f33674f25672b90b029848aecfd5a2381e0e3c83a78
MD5 3c7a6db3d4c0bb3b984a3ff3178f6877
BLAKE2b-256 8653346b18903175669095ff2dcc72bef285f133c2af7c4bfdcefda1050f75a2

See more details on using hashes here.

File details

Details for the file torchprof-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: torchprof-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 8.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.4.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.7.3

File hashes

Hashes for torchprof-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 7a7076e639edc4bda2544be2bf99aa3a5d95d416b8ca18e69b7f87b9d94236ae
MD5 f2fdb8a86724096c33b424cfd1d21648
BLAKE2b-256 fcf182e7de460ac2bb0b6cd220b682168d896e9a9048cb7c0bfd8815118933d9

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page