Skip to main content

Translate a NVIDIA Nsight System trace to a Paraver trace

Project description

logo

nsys2prv: Translate NVIDIA Nsight Systems traces to Paraver traces

PyPI - Version Gitlab Pipeline Status PyPI - Downloads

nsys2prv is a Python package that parses and interprets the exported data of an NVIDIA Nsight Systems^1 report and converts it to Paraver semantics in order to browse the trace with Paraver. Paraver is a parallel performance analysis tool by the Performance Tools team at BSC, and is a parallel trace visualization system allowing for large scale trace execution analysis. Paraver can be obtained at https://tools.bsc.es/downloads.

The Nsight Systems traces should include minimum GPU kernel activity. Apart from this, nsys2prv can also translate information about CUDA runtime, OpenACC constructs, MPI runtime, GPU metrics and NVTX regions. In addition to the different programming model semantics, one of the main features of nsys2prv is its ability to merge different Nsight Systems reports into one trace, allowing for easier analysis of multi-node, large scale parallel executions.

How it works

This tool relies on the export functionality of nsys. The data collection consists of a mix of the nsys stats predefined scripts, and a manual parsing of the .sqlite exported format data. The following figure summarizes the translation workflow: translation workflow

More details on the workflow and the data parsing logic can be read on the wiki pages.

Installation

nsys2prv is distributed as a PyPI package and thus can be installed with pip. The following requirements for the package to work will be installed automatically by pip:

  • python >= 3.10
  • pandas > 2.2.2
  • sqlalchemy
  • tqdm

Additionally, it requires an installation of NIDIA Nsight Systems in your PATH to extract the data. Alternatively, you can set the NSYS_HOME environment variable. It is required that the version of Nsight Systems is always greater than the one used to obtain the trace. For translating, a minumum version of 24.6 is required.

To install the package just use pip globally or create a vitual environment:

pip install --global nsys2prv
# or
python -m venv ./venv
source ./venv/bin/activate
pip install nsys2prv

Basic usage

To translate a trace from Nsight Systems you need the .nsys-rep report file that nsys profile outputs. The basic usage of Nsight Systems to get a trace is the following:

nsys profile --gpu-metrics-devices=cuda-visible -t cuda,nvtx -o ./llm_all --capture-range=nvtx --env-var=NSYS_NVTX_PROFILER_REGISTER_ONLY=0 --nvtx-capture=RANGE_NAME python TestLLAMA.py

In this example, we ask the profiler to only trace during the “RANGE_NAME” NVTX range, to get a trace for our phase of interest.

This should output the llm_all.nsys-rep file, that serves as input to nsys2prv.

nsys2prv -t type1,type2 llm_all.nsys-rep output-paraver-name

Currently, the translator needs that the user manually specifies the different programming models information to translate using the --trace,-t flag. By default it always extracts kernel execution information, so it is mandatory that the nsys report contains GPU activity. Future releases will automatically detect the information that is available in the report and make this flag optional. The accepted value for the flag is a comma-separated list with any of the following categories:

  • cuda_api_trace: CUDA API calls
  • nvtx_pushpop_trace: The developer defined NVTX Push/Pop regions
  • nvtx_startend_trace: The developer defined NVTX Start/End regions
  • gpu_metrics: The sampling events of hardware metrics for the GPUs
  • mpi_event_trace: The MPI calls
  • openacc: The OpenACC constructs
  • graphs: If your trace includes CUDA Graphs, include this flag to make sure that all Graph activity is translated, wether it is Node tracing or Graph tracing. If you are not sure, you can still include it and it will be disabled if there are no CUDA Graphs.
  • nccl: If your application has NCCL activity, you can add this flag to include the NCCL payloads (e.g. communication size, reduction operation, root rank, etc.) in the NCCL kernel events.
  • osrt: OS runtime calls (except pthread ones)
  • pthread: The POSIX threads library calls

Finally, the output-paraver-name.prv trace can be opened with Paraver and analyzed.

For multi-report translation, simply add the -m flag and add all the .nsys-rep files:

nsys2prv -t type1,type2 -m source_rank0.nsys-rep [source_rank1.nsys-rep [source_rank2.nsys-rep ...]] output-paraver-name

How to analyze your trace

A predefined set of Paraver Config Files (CFGs) can be found in the cfgs directory of this repository. If you open these files with Paraver, you will see predefined windows showing the information available in the trace. Some of them are more advanced and will show analysis views.

Config files are grouped in 5 different sets:

Folder Description
basics Configurations that show the available raw events in timelines
views Compound views to show different behaviors and kernel semantics
analysis More complex configurations aimed to summarize insight in the form of efficiency metrics from derived metrics
hwcounters Specific views to show hardware counters obtained with "gpu-metrics" flag
model Profile configurations intended to support POP3's efficiency model computation

As a starting point, the views/cuda_activities.cfg configuration file will show a timeline aggregating all CUDA-related activities: CUDA API calls, kernel execution at the GPU and memory operations.

For documentation about trace analysis and config files (CFGs) provided, please refer to the wiki pages.

Bug reporting and contribution

A list of the current bugs and features targeted can be seen in the GitLab repository. The project is still currently under initial development and is expecting a major code refactoring and changes in the command line interface (CLI). As it is a tool to support and enable performance analysts' work, new use cases or petitions for other programming model information support are always welcomed. Please contact marc.clasca@bsc.es or beppp@bsc.es for any of those requests, recommendations, or contribution ideas.

A list of common issues and errors has been compiled on the wiki page Troubleshooting.

If you are a regular user and would like to receive updates on new important releases and bug fixes, you can subscribe to the users mailing list sending an email to nsys2prv-users-join@bsc.es.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nsys2prv-0.6.0.dev2602111.tar.gz (1.4 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

nsys2prv-0.6.0.dev2602111-py3-none-any.whl (72.0 kB view details)

Uploaded Python 3

File details

Details for the file nsys2prv-0.6.0.dev2602111.tar.gz.

File metadata

  • Download URL: nsys2prv-0.6.0.dev2602111.tar.gz
  • Upload date:
  • Size: 1.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.3.2 CPython/3.11.14 Linux/5.10.0-35-amd64

File hashes

Hashes for nsys2prv-0.6.0.dev2602111.tar.gz
Algorithm Hash digest
SHA256 579c5c7a241f3418a1cc6f6c73e0ee8fbc419ba4177beb256a4fe1958328a3e2
MD5 e135d14b985cb81ea6591dc39ddc6d51
BLAKE2b-256 ad94b19fa0b6a316606a6aed170b9d3cf183e9cd37ead45fa69a4309a453ada8

See more details on using hashes here.

File details

Details for the file nsys2prv-0.6.0.dev2602111-py3-none-any.whl.

File metadata

  • Download URL: nsys2prv-0.6.0.dev2602111-py3-none-any.whl
  • Upload date:
  • Size: 72.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.3.2 CPython/3.11.14 Linux/5.10.0-35-amd64

File hashes

Hashes for nsys2prv-0.6.0.dev2602111-py3-none-any.whl
Algorithm Hash digest
SHA256 3c5712c0479c31a3d6a7b57d15ee54fa2b3a063dafa95250138d7f7d2b83208b
MD5 f90b5c2dc8c922bb032b5980f4699031
BLAKE2b-256 06e1a47a44212747005c9fa52e09ee83abf338faa7494894ba5048e3ca575d29

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page