Easily benchmark PyTorch model FLOPs, latency, throughput, max allocated memory and energy consumption in one go.
Project description
⏱ pytorch-benchmark
Easily benchmark model inference FLOPs, latency, throughput, max allocated memory and energy consumption
Install
pip install pytorch-benchmark
Usage
import torch
from torchvision.models import efficientnet_b0
from pytorch_benchmark import benchmark
model = efficientnet_b0()
sample = torch.randn(8, 3, 224, 224) # (B, C, H, W)
results = benchmark(model, sample, num_runs=100)
Sample results 💻
Macbook Pro (16-inch, 2019), 2.6 GHz 6-Core Intel Core i7
device: cpu
flops: 401669732
machine_info:
cpu:
architecture: x86_64
cores:
physical: 6
total: 12
frequency: 2.60 GHz
model: Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
gpus: null
memory:
available: 5.86 GB
total: 16.00 GB
used: 7.29 GB
system:
node: d40049
release: 21.2.0
system: Darwin
params: 5288548
timing:
batch_size_1:
on_device_inference:
human_readable:
batch_latency: 74.439 ms +/- 6.459 ms [64.604 ms, 96.681 ms]
batches_per_second: 13.53 +/- 1.09 [10.34, 15.48]
metrics:
batches_per_second_max: 15.478907181264278
batches_per_second_mean: 13.528026359855625
batches_per_second_min: 10.343281300091244
batches_per_second_std: 1.0922382209314958
seconds_per_batch_max: 0.09668111801147461
seconds_per_batch_mean: 0.07443853378295899
seconds_per_batch_min: 0.06460404396057129
seconds_per_batch_std: 0.006458734193132054
batch_size_8:
on_device_inference:
human_readable:
batch_latency: 509.410 ms +/- 30.031 ms [405.296 ms, 621.773 ms]
batches_per_second: 1.97 +/- 0.11 [1.61, 2.47]
metrics:
batches_per_second_max: 2.4673319862230025
batches_per_second_mean: 1.9696935126370148
batches_per_second_min: 1.6083039834656554
batches_per_second_std: 0.11341204895590185
seconds_per_batch_max: 0.6217730045318604
seconds_per_batch_mean: 0.509410228729248
seconds_per_batch_min: 0.40529608726501465
seconds_per_batch_std: 0.030031445467788704
Limitations
Usage assumptions:
- The model has as a
__call__
method that takes the sample, i.e.model(sample)
. - The Model also works if the sample had a batch size of 1 (first dimension).
Feature limitataions:
- Allocated memory uses torch.cuda.max_memory_allocated, which is only available if the model resides on a CUDA device.
- Energy consumption can only be measured on ntel CPU with RAPL support, a NVIDIA GPU.
Citation
If you like the tool and use it in you research, please consider citing it:
@article{hedegaard2022torchbenchmark,
title={PyTorch Benchmark},
author={Lukas Hedegaard},
journal={GitHub. Note: https://github.com/LukasHedegaard/pytorch-benchmark},
year={2022}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pytorch-benchmark-0.1.1.tar.gz
(10.1 kB
view hashes)
Built Distribution
Close
Hashes for pytorch_benchmark-0.1.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6bdaa68b4eb99d9cabbbb4b2971bc775fbc5347f5df1474c380d0b588e2280d7 |
|
MD5 | 6baffcb424ab70e29817c4a79dec287f |
|
BLAKE2b-256 | 956de48cf2e88882675910335010d0bb6ef1176a6b1b6d413a793cae0072320b |