NVIDIA Profier tools
Project description
Tools to help working with nvprof SQLite files, specifically for profiling scripts to train deep learning models. The files can be big and thus slow to scp and work with in NVVP. This tool is aimed in extracting the small bits of important information and make profiling in NVVP faster.
You can remove a big number of unimportant events and take a small time slice, so that you can shrink the sqlite database a few MBs.
Author: Bohumír Zámečník bohumir.zamecnik@gmail.com, Rossum
License: MIT
Installing
Install package nvprof - for just using it:
$ pip install nvprof
…or for development:
$ pip install -e .
Features
$ nvprof_tools --help usage: nvprof_tools [-h] {info,truncate,slice} ... NVIDIA Profiler tools positional arguments: {info,truncate,slice} optional arguments: -h, --help show this help message and exit
$ nvprof_tools slice --help usage: nvprof_tools slice [-h] [-s START] [-e END] db_file positional arguments: db_file optional arguments: -h, --help show this help message and exit -s START, --start START start time (sec) -e END, --end END end time (sec)
Summary about the file
It can show:
total time (can be used to decide which time slice to take in nvvp)
number of events in the tables sorted from highest
compute utilization percentage
number of GPUs
$ nvprof_tools info foo.sqlite Number of GPUs: 1 Compute utilization: 10.07 % Total time: 6.659 sec Total number of events: 516874 Events by table: CUPTI_ACTIVITY_KIND_RUNTIME : 348080 CUPTI_ACTIVITY_KIND_CONCURRENT_KERNEL : 63792 CUPTI_ACTIVITY_KIND_DRIVER : 48279 CUPTI_ACTIVITY_KIND_SYNCHRONIZATION : 19741 CUPTI_ACTIVITY_KIND_CUDA_EVENT : 17860 CUPTI_ACTIVITY_KIND_MEMCPY : 15974 CUPTI_ACTIVITY_KIND_MEMSET : 2816 CUPTI_ACTIVITY_KIND_OVERHEAD : 309 CUPTI_ACTIVITY_KIND_STREAM : 12 CUPTI_ACTIVITY_KIND_DEVICE_ATTRIBUTE : 8 CUPTI_ACTIVITY_KIND_NAME : 1 CUPTI_ACTIVITY_KIND_CONTEXT : 1 CUPTI_ACTIVITY_KIND_DEVICE : 1
In case of multiple GPUs compute utilization is calculated for each device:
Number of GPUs: 4 Compute utilization (mean): 43.04 % GPU 0: 42.86 % GPU 1: 42.34 % GPU 2: 43.42 % GPU 3: 43.55 % Total time: 35.041 sec Total number of events: 5670557
Remove unnecessary events
Typically 80% of the events are runtime/driver CUDA calls, which are not essential for profiling deep learning scripts. Let’s remove them.
NOTE: It will overwrite the input file.
$ nvprof_tools truncate foo.sqlite
Eg. we shrinked a database from 29 MB to 8 MB.
Slice only a small time range
# keep only events between 5 and 6 seconds $ nvprof_tools slice foo.sqlite -s 5.0 -e 6.0
More information
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file nvprof-0.2.tar.gz
.
File metadata
- Download URL: nvprof-0.2.tar.gz
- Upload date:
- Size: 4.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6db38cbe1a5ce6d7a0926f1b5c2092b1bf30bb4053446e99fba4d308ff4adbf9 |
|
MD5 | 2f546a8e3fc79e8bc39864826e8692d2 |
|
BLAKE2b-256 | 83635b6abfe4db6ce3f0eef5f9fb7f36acb7a7796be0b7662a5232ef00b74a54 |
File details
Details for the file nvprof-0.2-py2.py3-none-any.whl
.
File metadata
- Download URL: nvprof-0.2-py2.py3-none-any.whl
- Upload date:
- Size: 7.3 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7636d883287359c1a390cb4d3d99e504c98369f3560c782dda31ef9af8f70023 |
|
MD5 | bba745e4d6a58433732aa0372a3bd2c0 |
|
BLAKE2b-256 | fe936e82240f973ab93fa17c1d2de1a70f22d63dee29ae0fda557d8611d2daf1 |