Skip to main content

Measure neural network device specific metrics (latency, flops, etc.)

Project description

torchprof

PyPI version

A minimal dependency library for layer-by-layer profiling of Pytorch models.

All metrics are derived using the PyTorch autograd profiler.

Quickstart

pip install torchprof

import torch
import torchvision
import torchprof

model = torchvision.models.alexnet(pretrained=False).cuda()
x = torch.rand([1, 3, 224, 224]).cuda()

with torchprof.Profile(model, use_cuda=True) as prof:
    model(x)

print(prof.display(show_events=False)) # equivalent to `print(prof)` and `print(prof.display())`
Module         | Self CPU total | CPU total | CUDA total
---------------|----------------|-----------|-----------
AlexNet        |                |           |
├── features   |                |           |
│├── 0         |        1.956ms |   7.714ms |    7.787ms
│├── 1         |       68.880us |  68.880us |   69.632us
│├── 2         |       85.639us | 155.948us |  155.648us
│├── 3         |      253.419us | 970.386us |    1.747ms
│├── 4         |       18.919us |  18.919us |   19.584us
│├── 5         |       30.910us |  54.900us |   55.296us
│├── 6         |      132.839us | 492.367us |  652.192us
│├── 7         |       17.990us |  17.990us |   18.432us
│├── 8         |       87.219us | 310.776us |  552.544us
│├── 9         |       17.620us |  17.620us |   17.536us
│├── 10        |       85.690us | 303.120us |  437.248us
│├── 11        |       17.910us |  17.910us |   18.400us
│└── 12        |       29.239us |  51.488us |   52.288us
├── avgpool    |       49.230us |  85.740us |   88.960us
└── classifier |                |           |
 ├── 0         |      626.236us |   1.239ms |    1.362ms
 ├── 1         |      235.669us | 235.669us |  635.008us
 ├── 2         |       17.990us |  17.990us |   18.432us
 ├── 3         |       31.890us |  56.770us |   57.344us
 ├── 4         |       39.280us |  39.280us |  212.128us
 ├── 5         |       16.800us |  16.800us |   17.600us
 └── 6         |       38.459us |  38.459us |   79.872us

To see the low level operations that occur within each layer, print the contents of prof.display(show_events=True).

Module                        | Self CPU total | CPU total | CUDA total
------------------------------|----------------|-----------|-----------
AlexNet                       |                |           |
├── features                  |                |           |
│├── 0                        |                |           |
││├── conv2d                  |       15.740us |   1.956ms |    1.972ms
││├── convolution             |       12.000us |   1.940ms |    1.957ms
││├── _convolution            |       36.590us |   1.928ms |    1.946ms
││├── contiguous              |        6.600us |   6.600us |    6.464us
││└── cudnn_convolution       |        1.885ms |   1.885ms |    1.906ms
│├── 1                        |                |           |
││└── relu_                   |       68.880us |  68.880us |   69.632us
│├── 2                        |                |           |
││├── max_pool2d              |       15.330us |  85.639us |   84.992us
││└── max_pool2d_with_indices |       70.309us |  70.309us |   70.656us
│├── 3                        |                |           |
...

The original Pytorch EventList can be returned by calling raw() on the profile instance.

trace, event_lists_dict = prof.raw()
print(trace[2])
# Trace(path=('AlexNet', 'features', '0'), leaf=True, module=Conv2d(3, 64, kernel_size=(11, 11), stride=(4, 4), padding=(2, 2)))

print(event_lists_dict[trace[2].path][0])
---------------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------
Name                   Self CPU total %   Self CPU total      CPU total %        CPU total     CPU time avg     CUDA total %       CUDA total    CUDA time avg  Number of Calls
---------------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------
conv2d                           0.80%         15.740us          100.00%          1.956ms          1.956ms           25.32%          1.972ms          1.972ms                1
convolution                      0.61%         12.000us           99.20%          1.940ms          1.940ms           25.14%          1.957ms          1.957ms                1
_convolution                     1.87%         36.590us           98.58%          1.928ms          1.928ms           24.99%          1.946ms          1.946ms                1
contiguous                       0.34%          6.600us            0.34%          6.600us          6.600us            0.08%          6.464us          6.464us                1
cudnn_convolution               96.37%          1.885ms           96.37%          1.885ms          1.885ms           24.47%          1.906ms          1.906ms                1
---------------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------
Self CPU time total: 1.956ms
CUDA time total: 7.787ms

LICENSE

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

torchprof-0.3.0.tar.gz (9.0 kB view details)

Uploaded Source

Built Distribution

torchprof-0.3.0-py3-none-any.whl (10.0 kB view details)

Uploaded Python 3

File details

Details for the file torchprof-0.3.0.tar.gz.

File metadata

  • Download URL: torchprof-0.3.0.tar.gz
  • Upload date:
  • Size: 9.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.32.2 CPython/3.7.3

File hashes

Hashes for torchprof-0.3.0.tar.gz
Algorithm Hash digest
SHA256 fdbaee9177bb7acf88d877707d790bdbc589ca54df972e8e902954865a5d4b07
MD5 959194581e9fc8be883b645e5ad2acc5
BLAKE2b-256 39062e76c3400f56aa6bfbe035f64bc256d7cd8ed1e280b861162f580970ec5c

See more details on using hashes here.

File details

Details for the file torchprof-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: torchprof-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 10.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.32.2 CPython/3.7.3

File hashes

Hashes for torchprof-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a9ba3fb4840ba6962d048ef39a86d8f68efaa317e7d741c349beb6637c944437
MD5 16fe73b8278d3f0584901b804ed1e0d5
BLAKE2b-256 4f7a1304b6cd588636a6cf8f2c078e9b1a1889ad0d373d148e07d5ae5e35eca7

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page