simple profiling framwork with little overhead
Project description
KappaProfiler
Lightweight profiling utilities for identifying bottlenecks and timing program parts in your python application.
Also supports async profiling for cuda.
Setup
- new install:
pip install kappaprofiler - uprade to new version:
pip install kappaprofiler --upgrade
Usage
Time your whole application
With decorators
import kappaprofiler as kp
import time
@kp.profile
def main():
time.sleep(0.3) # simulate some operation
some_method()
@kp.profile
def some_method():
time.sleep(0.5) # simulate some operation
if __name__ == "__main__":
main()
print(kp.profiler.to_string())
The result will be (time.sleep calls are not 100% accurate)
0.82 main
0.51 main.some_method
With contextmanagers
import kappaprofiler as kp
import time
def main():
with kp.named_profile("main"):
time.sleep(0.3) # simulate some operation
with kp.named_profile("method"):
some_method()
with kp.named_profile("main2"):
time.sleep(0.2) # simulate some operation
def some_method():
time.sleep(0.5) # simulate some operation
if __name__ == "__main__":
main()
print(kp.profiler.to_string())
The result will be (time.sleep calls are not 100% accurate)
0.82 main
0.51 main.method
0.20 main2
Query nodes
Each profiling entry is represented by a node from which detailed information can be retrieved
query = "main.some_method"
node = kp.profiler.get_node(query)
print(f"{query} was called {node.count} time and took {node.to_string()} seconds in total")
main.some_method was called 1 time and took 0.51 seconds in total
Time only a part of your program
import kappaprofiler as kp
with kp.Stopwatch() as sw:
# some operation
...
print(f"operation took {sw.elapsed_milliseconds} milliseconds")
print(f"operation took {sw.elapsed_seconds} seconds")
Time subparts
import kappaprofiler as kp
import time
sw1 = kp.Stopwatch()
sw2 = kp.Stopwatch()
for i in range(1, 3):
with sw1:
# operation1
time.sleep(0.1 * i)
with sw2:
# operation2
time.sleep(0.2 * i)
print(f"operation1 took {sw1.elapsed_seconds:.2f} seconds (average {sw1.average_lap_time:.2f})")
print(f"operation2 took {sw2.elapsed_seconds:.2f} seconds (average {sw2.average_lap_time:.2f})")
operation1 took 0.32 seconds (average 0.16)
operation2 took 0.61 seconds (average 0.30)
Time async operations
Showcase: timing cuda operations in pytorch
Asynchronous operations can only be timed properly when the asynchronous call is awaited or a synchronization point is created after the timing should end. Natively in pytorch this would look something like this:
# submit a start event to the event stream
start_event = torch.cuda.Event(enable_timing=True)
start_event.record()
# submit a async operation to the event stream
...
# submit a end event to the event stream
end_event = torch.cuda.Event(enable_timing=True)
end_event.record()
# synchronize
torch.cuda.synchronize()
print(start_event.elapsed_time(end_event))
which is quite a lot of boilerplate for timing one operation.
With kappaprofiler it looks like this:
import kappaprofiler as kp
import torch
def main():
device = torch.device("cuda")
x = torch.randn(15000, 15000, device=device)
with kp.named_profile("matmul_wrong"):
# matrix multiplication (@) is asynchronous
_ = x @ x
# the timing for "matmul_wrong" is only the time it took to
# submit the x @ x operation to the cuda event stream
# not the actual time the x @ x operation took
with kp.named_profile_async("matmul_right"):
_ = x @ x
matmul_method(x)
@kp.profile_async
def matmul_method(x):
_ = x @ x
def start_async():
start_event = torch.cuda.Event(enable_timing=True)
start_event.record()
return start_event
def end_async(start_event):
end_event = torch.cuda.Event(enable_timing=True)
end_event.record()
torch.cuda.synchronize()
# torch.cuda.Event.elapsed_time returns milliseconds but kappaprofiler expects seconds
return start_event.elapsed_time(end_event) / 1000
if __name__ == "__main__":
kp.setup_async(start_async, end_async)
main()
print(kp.profiler.to_string())
0.56 matmul_wrong
4.69 matmul_right
4.72 matmul_method
NOTE: Synchronization points slow down overall program execution, so they should only be used for investigating bottlenecks/runtimes
To remove all synchronization points in your program either:
- remove the
kp.setup_asynccall ->kp.named_profile_async/kp.profile_asyncwill default to a noop (NOTE: this removes the node completely, so it's also not possible to query it) - replace the
kp.setup_asynccall withkp.setup_async_as_syncto make the asynchronous calls behave just like the synchronous calls. This will make the async times wrong (likematmul_wrongabove) but still creates a node for the operation (e.g. for querying how often it was called).
Multi-process pytorch profiling
Only synchronizing cuda operations is not sufficient when multiple processes are used (e.g. for multi-gpu training). In addition to cuda synchronization, the processes have to be synced up.
import torch.distributed as dist
def end_async(start_event):
if dist.is_available() and dist.is_initialized():
torch.cuda.synchronize()
dist.barrier()
end_event = torch.cuda.Event(enable_timing=True)
end_event.record()
torch.cuda.synchronize()
# torch.cuda.Event.elapsed_time returns milliseconds but kappaprofiler expects seconds
return start_event.elapsed_time(end_event) / 1000
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file kappaprofiler-1.0.11.tar.gz.
File metadata
- Download URL: kappaprofiler-1.0.11.tar.gz
- Upload date:
- Size: 8.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.11.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d61bff90d949eede1f6f98593902ff50bf80808c8f723b8bb0f3a7855069cf48
|
|
| MD5 |
b52f74ff4f24255b137d78452bc12035
|
|
| BLAKE2b-256 |
200df5e1e94b09f417e738a82d31e28fb04e55f6933a56b23475bf2da0c08d6e
|
File details
Details for the file kappaprofiler-1.0.11-py3-none-any.whl.
File metadata
- Download URL: kappaprofiler-1.0.11-py3-none-any.whl
- Upload date:
- Size: 8.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.11.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e7cc7e3ae565c45688608655eac1f260aec7dbbc1aa85bce70d7ec1fac91dde4
|
|
| MD5 |
8cd85bbf0f182b8f552bcd0d1abe9456
|
|
| BLAKE2b-256 |
5f0b6bcf0655370cdcc8face2695d027f4fa6380b1f82019421467361e954b0c
|