Skip to main content

A performance analysis tool for distributed GPU workloads

Project description

CircleCI codecov Docs License PRs Welcome

Holistic Trace Analysis

Holistic Trace Analysis (HTA), is a performance analysis tool to identify performance bottlenecks in distributed training workloads. HTA achieves this by analyzing traces collected through the PyTorch Profiler a.k.a. Kineto.

Features

HTA provides the following features:

  1. Temporal Breakdown - Breakdown of time taken by the GPUs in terms of time spent in computation, communication, memory events, and idle time across all ranks.
  2. Kernel Breakdown - Finds kernels with the longest duration on each rank.
  3. Kernel Duration Distribution - Distribution of average time taken by longest kernels across different ranks.
  4. Idle Time Breakdown - Breakdown of GPU idle time into waiting for the host, waiting for another kernel or attribution to an unknown cause.
  5. Communication Computation Overlap - Calculate the percentage of time when communication overlaps computation.
  6. Frequent CUDA Kernel Patterns - Find the CUDA kernels most frequently launched by any given PyTorch or user defined operator.
  7. CUDA Kernel Launch Statistics - Distributions of GPU kernels with very small duration, large duration, and excessive launch time.
  8. Augmented Counters (Queue length, Memory bandwidth) - Augmented trace files which provide insights into memory bandwidth utilized and number of outstanding operations on each CUDA stream.
  9. Trace Comparison - A trace comparison tool to identify and visualize the differences between traces.
  10. CUPTI Counter Analysis - An experimental API to get GPU performance counters. By attributing performance measurements from kernels to PyTorch operators roofline analysis can be performed and kernels can be optimized.

Installation

HTA runs on Linux and Mac with Python >= 3.10.

Setup a Conda environment (optional)

See here to install Miniconda.

Create the environment env_name

conda create -n env_name

Activate the environment

conda activate env_name

Deactivate the environment

conda deactivate

Install using PyPI (stable)

pip install HolisticTraceAnalysis

Install from source

git clone https://github.com/facebookresearch/HolisticTraceAnalysis.git
cd HolisticTraceAnalysis
git submodule update --init
pip install -r requirements.txt
pip install -e .

Documentation

Learn more about the features and the API from our documentation.

Usage

Data Preparation

All traces collected from a job must reside in a unique folder.

Analysis in a Jupyter notebook

Activate the Conda environment and launch a Jupyter notebook.

conda activate env_name
jupyter notebook

Import HTA, and create a TraceAnalysis object

from hta.trace_analysis import TraceAnalysis
analyzer = TraceAnalysis(trace_dir = "/path/to/folder/containing/the/traces")

Basic Usage

# Temporal breakdown
temporal_breakdown_df = analyzer.get_temporal_breakdown()

# Kernel breakdown
kernel_breakdown_df = analyzer.get_gpu_kernel_breakdown()

# Idle time breakdown
idle_time_df = analyzer.get_idle_time_breakdown()

# Communication computation overlap
comm_comp_overlap_df = analyzer.get_comm_comp_overlap()

# Frequent CUDA kernel patterns
frequent_patterns_df = analyzer.get_frequent_cuda_kernel_patterns(operator_name="aten::linear", output_dir="/new/trace/path")

# CUDA kernel launch statistics
cuda_launch_kernel_stats = analyzer.get_cuda_kernel_launch_stats()

# Memory bandwidth time series
memory_bw_series = analyzer.get_memory_bw_time_series()

# Memory bandwidth summary
memory_bw_summary = analyzer.get_memory_bw_summary()

# Queue length time series
ql_series = analyzer.get_queue_length_time_series()

# Queue length summary
ql_summary = analyzer.get_queue_length_summary()

For a detailed demo run the trace_analysis_demo and trace_diff_demo notebooks in the examples folder.

Advanced Usage

Logging Level

Logging level is set through a configuration file in HTA. The default logging level is set in hta/configs/logging.config and can be changed in the [logger_hta] section of the file. If needed, a different logging file can be configured to use by modifying hta/configs/trace_analyzer.json.

Repo Map

├── examples                       # folder containing demo notebooks
│         ├── ...
├── hta
│         ├── analyzers            # core logic for each analysis
│         │       ├── ...
│         ├── common               # code common to multiple analysis
│         │       ├── ...
│         ├── configs              # config files
│         │       ├── ...
│         ├── trace_analysis.py    # entrypoint for TraceAnalysis API
│         ├── trace_diff.py        # entrypoint for TraceDiff API
│         └── utils                # utility files
│                 └── ...
├── scripts                        # generic tools for traces
│         └── ...
│── tests                          # unittests
│         └── ...

Contributing

We welcome new contributions. If you plan to contribute new features or extensions, please first open an issue and discuss the feature with us. To learn more about how to contribute, see our contributing guidelines.

Please let us know if you encounter a bug by filing an issue.

The Team

HTA is currently maintained by: Xizhou Feng, Wei Sun, Chengguang Zhu, Yifan Liu, Louis Feng and Michael Au-Yeung. Past contributors include Brian Coutinho, Sung-Han Lin, and Yuzhen Huang.

License

Holistic Trace Analysis is licensed under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

traceinsight-0.6.1.tar.gz (194.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

traceinsight-0.6.1-py3-none-any.whl (154.2 kB view details)

Uploaded Python 3

File details

Details for the file traceinsight-0.6.1.tar.gz.

File metadata

  • Download URL: traceinsight-0.6.1.tar.gz
  • Upload date:
  • Size: 194.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for traceinsight-0.6.1.tar.gz
Algorithm Hash digest
SHA256 be800b6f21f0e98cfb2580e3b57585c8fe799d9dae8fcbe4cc3f66cb9f3bef0f
MD5 f3d2993a933e279d8e742e15ebeecc5e
BLAKE2b-256 9c6e55ba5227b3fdfa62d7ac37570f4e21cfd2809426be74da8e57ac81ed34b1

See more details on using hashes here.

Provenance

The following attestation bundles were made for traceinsight-0.6.1.tar.gz:

Publisher: release.yml on facebookresearch/HolisticTraceAnalysis

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file traceinsight-0.6.1-py3-none-any.whl.

File metadata

  • Download URL: traceinsight-0.6.1-py3-none-any.whl
  • Upload date:
  • Size: 154.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for traceinsight-0.6.1-py3-none-any.whl
Algorithm Hash digest
SHA256 23d239caaa349a0bd5278ecd4cb38e6fa8ed58f3dfd9eaf30fa23113999ae667
MD5 fee5698db5b2813225e78388dee85342
BLAKE2b-256 8128fef3002accc40b52c5df5066f6f4caa46909674c0a056d3c52422eea86c3

See more details on using hashes here.

Provenance

The following attestation bundles were made for traceinsight-0.6.1-py3-none-any.whl:

Publisher: release.yml on facebookresearch/HolisticTraceAnalysis

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page