Skip to main content

A performance analysis tool for distributed GPU workloads

Project description

CircleCI codecov Docs License PRs Welcome

Holistic Trace Analysis

Holistic Trace Analysis (HTA), is a performance analysis tool to identify performance bottlenecks in distributed training workloads. HTA achieves this by analyzing traces collected through the PyTorch Profiler a.k.a. Kineto.

Features

HTA provides the following features:

  1. Temporal Breakdown - Breakdown of time taken by the GPUs in terms of time spent in computation, communication, memory events, and idle time across all ranks.
  2. Kernel Breakdown - Finds kernels with the longest duration on each rank.
  3. Kernel Duration Distribution - Distribution of average time taken by longest kernels across different ranks.
  4. Idle Time Breakdown - Breakdown of GPU idle time into waiting for the host, waiting for another kernel or attribution to an unknown cause.
  5. Communication Computation Overlap - Calculate the percentage of time when communication overlaps computation.
  6. Frequent CUDA Kernel Patterns - Find the CUDA kernels most frequently launched by any given PyTorch or user defined operator.
  7. CUDA Kernel Launch Statistics - Distributions of GPU kernels with very small duration, large duration, and excessive launch time.
  8. Augmented Counters (Queue length, Memory bandwidth) - Augmented trace files which provide insights into memory bandwidth utilized and number of outstanding operations on each CUDA stream.
  9. Trace Comparison - A trace comparison tool to identify and visualize the differences between traces.
  10. CUPTI Counter Analysis - An experimental API to get GPU performance counters. By attributing performance measurements from kernels to PyTorch operators roofline analysis can be performed and kernels can be optimized.

Installation

HTA runs on Linux and Mac with Python >= 3.10.

Setup a Conda environment (optional)

See here to install Miniconda.

Create the environment env_name

conda create -n env_name

Activate the environment

conda activate env_name

Deactivate the environment

conda deactivate

Install using PyPI (stable)

pip install HolisticTraceAnalysis

Install from source

git clone https://github.com/facebookresearch/HolisticTraceAnalysis.git
cd HolisticTraceAnalysis
git submodule update --init
pip install -r requirements.txt
pip install -e .

Documentation

Learn more about the features and the API from our documentation.

Usage

Data Preparation

All traces collected from a job must reside in a unique folder.

Analysis in a Jupyter notebook

Activate the Conda environment and launch a Jupyter notebook.

conda activate env_name
jupyter notebook

Import HTA, and create a TraceAnalysis object

from hta.trace_analysis import TraceAnalysis
analyzer = TraceAnalysis(trace_dir = "/path/to/folder/containing/the/traces")

Basic Usage

# Temporal breakdown
temporal_breakdown_df = analyzer.get_temporal_breakdown()

# Kernel breakdown
kernel_breakdown_df = analyzer.get_gpu_kernel_breakdown()

# Idle time breakdown
idle_time_df = analyzer.get_idle_time_breakdown()

# Communication computation overlap
comm_comp_overlap_df = analyzer.get_comm_comp_overlap()

# Frequent CUDA kernel patterns
frequent_patterns_df = analyzer.get_frequent_cuda_kernel_patterns(operator_name="aten::linear", output_dir="/new/trace/path")

# CUDA kernel launch statistics
cuda_launch_kernel_stats = analyzer.get_cuda_kernel_launch_stats()

# Memory bandwidth time series
memory_bw_series = analyzer.get_memory_bw_time_series()

# Memory bandwidth summary
memory_bw_summary = analyzer.get_memory_bw_summary()

# Queue length time series
ql_series = analyzer.get_queue_length_time_series()

# Queue length summary
ql_summary = analyzer.get_queue_length_summary()

For a detailed demo run the trace_analysis_demo and trace_diff_demo notebooks in the examples folder.

Advanced Usage

Logging Level

Logging level is set through a configuration file in HTA. The default logging level is set in hta/configs/logging.config and can be changed in the [logger_hta] section of the file. If needed, a different logging file can be configured to use by modifying hta/configs/trace_analyzer.json.

Repo Map

├── examples                       # folder containing demo notebooks
│         ├── ...
├── hta
│         ├── analyzers            # core logic for each analysis
│         │       ├── ...
│         ├── common               # code common to multiple analysis
│         │       ├── ...
│         ├── configs              # config files
│         │       ├── ...
│         ├── trace_analysis.py    # entrypoint for TraceAnalysis API
│         ├── trace_diff.py        # entrypoint for TraceDiff API
│         └── utils                # utility files
│                 └── ...
├── scripts                        # generic tools for traces
│         └── ...
│── tests                          # unittests
│         └── ...

Contributing

We welcome new contributions. If you plan to contribute new features or extensions, please first open an issue and discuss the feature with us. To learn more about how to contribute, see our contributing guidelines.

Please let us know if you encounter a bug by filing an issue.

The Team

HTA is currently maintained by: Xizhou Feng, Wei Sun, Chengguang Zhu, Yifan Liu, Louis Feng and Michael Au-Yeung. Past contributors include Brian Coutinho, Sung-Han Lin, and Yuzhen Huang.

License

Holistic Trace Analysis is licensed under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

traceinsight-0.6.0.tar.gz (194.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

traceinsight-0.6.0-py3-none-any.whl (154.2 kB view details)

Uploaded Python 3

File details

Details for the file traceinsight-0.6.0.tar.gz.

File metadata

  • Download URL: traceinsight-0.6.0.tar.gz
  • Upload date:
  • Size: 194.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.13

File hashes

Hashes for traceinsight-0.6.0.tar.gz
Algorithm Hash digest
SHA256 c9358c1e170c9a6d281cf1cd2c443a888b2d70b5225f437cc67be4f28b5651f7
MD5 f6e41452f38708019e65cd9c5dfd412f
BLAKE2b-256 b88fcb24b5afe0c491a1362b7f2ec68818c098a4c7aa9e6470e19247a39813ab

See more details on using hashes here.

File details

Details for the file traceinsight-0.6.0-py3-none-any.whl.

File metadata

  • Download URL: traceinsight-0.6.0-py3-none-any.whl
  • Upload date:
  • Size: 154.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.13

File hashes

Hashes for traceinsight-0.6.0-py3-none-any.whl
Algorithm Hash digest
SHA256 906981dbdab9eb9a03b5c6847ccaf041d247674940a2fd248f5317598d707d19
MD5 239706562df3e22dec4cc9e65af250a4
BLAKE2b-256 ac81c692c5956484df473181a2e5ef26c967e33735882b01951698c307bcc91e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page