Easily benchmark PyTorch model FLOPs, latency, throughput, max allocated memory and energy consumption in one go.

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

⏱ pytorch-benchmark

Easily benchmark model inference FLOPs, latency, throughput, max allocated memory and energy consumption

Install

pip install pytorch-benchmark

Usage

import torch
from torchvision.models import efficientnet_b0
from pytorch_benchmark import benchmark


model = efficientnet_b0()
sample = torch.randn(8, 3, 224, 224)  # (B, C, H, W)
results = benchmark(model, sample, num_runs=100)

Sample results 💻

Macbook Pro (16-inch, 2019), 2.6 GHz 6-Core Intel Core i7

device: cpu
flops: 401669732
machine_info:
  cpu:
    architecture: x86_64
    cores:
      physical: 6
      total: 12
    frequency: 2.60 GHz
    model: Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
  gpus: null
  memory:
    available: 5.86 GB
    total: 16.00 GB
    used: 7.29 GB
  system:
    node: d40049
    release: 21.2.0
    system: Darwin
params: 5288548
timing:
  batch_size_1:
    on_device_inference:
      human_readable:
        batch_latency: 74.439 ms +/- 6.459 ms [64.604 ms, 96.681 ms]
        batches_per_second: 13.53 +/- 1.09 [10.34, 15.48]
      metrics:
        batches_per_second_max: 15.478907181264278
        batches_per_second_mean: 13.528026359855625
        batches_per_second_min: 10.343281300091244
        batches_per_second_std: 1.0922382209314958
        seconds_per_batch_max: 0.09668111801147461
        seconds_per_batch_mean: 0.07443853378295899
        seconds_per_batch_min: 0.06460404396057129
        seconds_per_batch_std: 0.006458734193132054
  batch_size_8:
    on_device_inference:
      human_readable:
        batch_latency: 509.410 ms +/- 30.031 ms [405.296 ms, 621.773 ms]
        batches_per_second: 1.97 +/- 0.11 [1.61, 2.47]
      metrics:
        batches_per_second_max: 2.4673319862230025
        batches_per_second_mean: 1.9696935126370148
        batches_per_second_min: 1.6083039834656554
        batches_per_second_std: 0.11341204895590185
        seconds_per_batch_max: 0.6217730045318604
        seconds_per_batch_mean: 0.509410228729248
        seconds_per_batch_min: 0.40529608726501465
        seconds_per_batch_std: 0.030031445467788704

Limitations

Usage assumptions:

The model has as a __call__ method that takes the sample, i.e. model(sample).
The Model also works if the sample had a batch size of 1 (first dimension).

Feature limitataions:

Allocated memory uses torch.cuda.max_memory_allocated, which is only available if the model resides on a CUDA device.
Energy consumption can only be measured on ntel CPU with RAPL support, a NVIDIA GPU.

Citation

If you like the tool and use it in you research, please consider citing it:

@article{hedegaard2022torchbenchmark,
  title={PyTorch Benchmark},
  author={Lukas Hedegaard},
  journal={GitHub. Note: https://github.com/LukasHedegaard/pytorch-benchmark},
  year={2022}
}

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

0.3.6

Aug 10, 2023

0.3.4

Feb 22, 2022

0.3.3

Feb 22, 2022

0.3.2

Feb 18, 2022

0.3.1

Feb 17, 2022

0.3.0

Feb 15, 2022

0.2.2

Feb 14, 2022

0.2.1

Feb 14, 2022

0.2.0

Feb 11, 2022

0.1.2

Feb 10, 2022

This version

0.1.1

Feb 10, 2022

0.1.0

Feb 10, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pytorch-benchmark-0.1.1.tar.gz (10.1 kB view hashes)

Uploaded Feb 10, 2022 Source

Built Distribution

pytorch_benchmark-0.1.1-py3-none-any.whl (13.9 kB view hashes)

Uploaded Feb 10, 2022 Python 3

Hashes for pytorch-benchmark-0.1.1.tar.gz

Hashes for pytorch-benchmark-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`3cc8da8eb1ea287fa7e8290723c797d422594e032509dcf814cb751771660749`
MD5	`32af2c57f00d62dbf100b90e5565b6c2`
BLAKE2b-256	`654b2c7c6a925db17091b6660f920c0922faeeba8e920171c60a50e07628d420`

Hashes for pytorch_benchmark-0.1.1-py3-none-any.whl

Hashes for pytorch_benchmark-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`6bdaa68b4eb99d9cabbbb4b2971bc775fbc5347f5df1474c380d0b588e2280d7`
MD5	`6baffcb424ab70e29817c4a79dec287f`
BLAKE2b-256	`956de48cf2e88882675910335010d0bb6ef1176a6b1b6d413a793cae0072320b`