A simple package to time CPU/GPU/Multi-GPU ops
Project description
Torch Simple Timing
A simple yet versatile package to time CPU/GPU/Multi-GPU ops.
- "I want to time operations once"
- That's what a
Clock
is for
- That's what a
- "I want to time the same operations multiple times"
- That's what a
Timer
is for
- That's what a
In simple terms:
- A
Clock
is an object (and context-manager) that will compute the ellapsed time between itsstart()
(or__enter__
) andstop()
(or__exit__
) - A
Timer
will internally manage clocks so that you can focus on readability and not data structures
Installation
pip install torch_simple_timing
How to use
A Clock
from torch_simple_parsing import Clock
import torch
t = torch.rand(2000, 2000)
gpu = torch.cuda.is_available()
with Clock(gpu=gpu) as context_clock:
torch.inverse(t @ t.T)
clock = Clock(gpu=gpu).start()
torch.inverse(t @ t.T)
clock.stop()
print(context_clock.duration) # 0.29688501358032227
print(clock.duration) # 0.292896032333374
More examples, including bout how to easily share data structures using a store
can be found in the documentation.
A Timer
from torch_simple_timing import Timer
import torch
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
X = torch.rand(5000, 5000, device=device)
Y = torch.rand(5000, 100, device=device)
model = torch.nn.Linear(5000, 100).to(device)
optimizer = torch.optim.Adam(model.parameters())
gpu = device.type == "cuda"
timer = Timer(gpu=gpu)
for epoch in range(10):
timer.clock("epoch").start()
for b in range(50):
x = X[b*100: (b+1)*100]
y = Y[b*100: (b+1)*100]
optimizer.zero_grad()
with timer.clock("forward", ignore=epoch>0):
p = model(x)
loss = torch.nn.functional.cross_entropy(p, y)
with timer.clock("backward", ignore=epoch>0):
loss.backward()
optimizer.step()
timer.clock("epoch").stop()
stats = timer.stats()
# use stats for display and/or logging
# wandb.summary.update(stats)
print(timer.display(stats=stats, precision=5))
epoch : 0.25064 ± 0.02728 (n=10)
forward : 0.00226 ± 0.00526 (n=50)
backward : 0.00209 ± 0.00387 (n=50)
A decorator
You can also use a decorator to time functions without much overhead in your code:
from torch_simple_timing import timeit, get_global_timer, reset_global_timer
import torch
# Use the function name as the timer name
@timeit(gpu=True)
def train():
x = torch.rand(1000, 1000, device="cuda" if torch.cuda.is_available() else "cpu")
return torch.inverse(x @ x)
# Use a custom name
@timeit("test")
def test_cpu():
return torch.inverse(torch.rand(1000, 1000) @ torch.rand(1000, 1000))
if __name__ == "__main__":
for _ in range((epochs := 10)):
train()
test_cpu()
timer = get_global_timer()
print(timer.display())
reset_global_timer()
Prints:
train : 0.045 ± 0.007 (n=10)
test : 0.046 (n= 1)
By default the @timeit
decodrator takes at least a name
, will use gpu=False
and use the global timer (torch_simple_timing.TIMER
). You can pass your own timer with @timeit(name, timer=timer)
.
See in the docs.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
torch_simple_timing-0.1.4.tar.gz
(12.7 kB
view hashes)
Built Distribution
Close
Hashes for torch_simple_timing-0.1.4.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | e66f1602b6c45c2d237be0e846f6f22d3d80c923011794443ef0e663a10a3f1b |
|
MD5 | 428d4c4ad0ea556975d2ceacfdcc5556 |
|
BLAKE2b-256 | 0a9cecf5670636af340b438ff9b230a83be60b529640d40b4fc21a2a29103fae |
Close
Hashes for torch_simple_timing-0.1.4-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f9994256367d5539121b4e7f9372a8b0abd29a37af93c2827d644dcc5e032f40 |
|
MD5 | 49b69b5f3dca7c3af4ece73dff822536 |
|
BLAKE2b-256 | 1c273d70f45979089aefe0903f8bfef30f91eb1198221c317de6333720ce9861 |