Skip to main content

Analyze, visualize, and understand I/O performance issues in HPC workflows

Project description

Data Flow Analyzer

Build and Test PyPI - Version PyPI - Wheel PyPI - Python Version

Overview

DFAnalyzer is an open-source tool for analyzing performance data from large-scale workflows on distributed systems. It presents a hierarchical, layer-by-layer summary of an application's execution, from high-level application events down to low-level POSIX calls. For each layer, DFAnalyzer quantifies time, operation counts, and data volume, and calculates key performance metrics like bandwidth and operations per second. It also visualizes the overlap between different layers, helping to characterize and understand complex I/O and compute patterns.

Installation

To install DFAnalyzer through pip (recommended for most users):

# Ensure runtime dependencies for optional features (e.g., Darshan, Recorder) are installed.
# This might involve using your system's package manager or a tool like Spack.
# Example using Spack to prepare the environment:
# spack -e tools install
pip install dftracer-analyzer

To install DFAnalyzer from source (for developers or custom builds):

# 1. Install system dependencies:
#    Refer to the "Install system dependencies" step in .github/workflows/ci.yml
#    (e.g., build-essential, cmake, libarrow-dev, libhdf5-dev, ninja-build, etc.).
#    Alternatively, tools like Spack can help manage these:
#    # spack -e tools install
module load ninja

# 2. Install Python build dependencies:
python -m pip install --upgrade pip meson-python setuptools wheel

# 3. Install DFAnalyzer from the root of this repository:
#    The following command includes optional C++ components (tests and tools).
#    The --prefix argument is optional and specifies the installation location.
pip install -e . \
  -Csetup-args="--prefix=$HOME/.local" \
  -Csetup-args="-Denable_tests=true" \
  -Csetup-args="-Denable_tools=true"

# (Optional) Install dependencies for running tests if you plan to contribute or run local tests:
# pip install -r tests/requirements.txt

Usage

Here's an example of how to run DFAnalyzer using sample data included in the repository:

# Before running, ensure the sample data is extracted.
# For example, to extract the 'dftracer-dlio' sample used below:
# mkdir -p tests/data/extracted
# tar -xzf tests/data/dftracer-dlio.tar.gz -C tests/data/extracted
dfanalyzer analyzer/preset=dlio trace_path=tests/data/extracted/dftracer-dlio view_types=[time_range]

This command analyzes the traces and prints a high-level summary of the application's execution. Below is a sample of the "Time Period Summary" output:

                                                  Time Period Summary
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┓
┃ Metric                                                                         Unit                         Value ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━┩
│ Job Time                                                                       seconds                     56.695 │
│ Total Count                                                                    count                       15,901 │
│ Total Files                                                                    count                           87 │
│ Total Nodes                                                                    count                            0 │
│ Total Processes                                                                count                           23 │
│ App Count                                                                      count                            8 │
│ Training Count                                                                 count                           40 │
│ Compute Count                                                                  count                          200 │
│ Fetch Data Count                                                               count                          160 │
│ Data Loader Count                                                              count                          808 │
│ Data Loader Fork Count                                                         count                           96 │
│ Reader Count                                                                   count                        4,008 │
│ Reader POSIX (Lustre) Count                                                    count                       10,432 │
│ Reader POSIX (Lustre) Size                                                     MB                      111833.161 │
│ Reader POSIX (Lustre) Bandwidth                                                MB/s                       874.982 │
│ Reader POSIX (Lustre) Avg Transfer Size                                        MB                          10.720 │
│ Checkpoint Count                                                               count                            8 │
│ Checkpoint POSIX (Lustre) Count                                                count                           45 │
│ Checkpoint POSIX (Lustre) Size                                                 MB                           0.011 │
│ Checkpoint POSIX (Lustre) Bandwidth                                            MB/s                         0.791 │
│ Checkpoint POSIX (Lustre) Avg Transfer Size                                    MB                           0.000 │
│ Other POSIX Count                                                              count                           96 │
└───────────────────────────────────────────────────────────────────────────────┴────────────────┴────────────────────┘

DFAnalyzer also provides a detailed breakdown of performance metrics for each layer of the application. Here is a snippet of the "Layer Breakdown" section from the same run, which includes the percentage of time each layer overlaps with its parent layer:

                                            Layer Breakdown (w/ overlap %)
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┓
┃ Layer                                Time (s)             Ops    Ops/sec           Size (MB)  Bandwidth (MB/s) ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━┩
│ App                            441.967 (----)        8 (----)      0.018                   -                 - │
│ Training                       439.442 (----)       40 (----)      0.091                   -                 - │
│ Compute                        272.356 (----)      200 (----)      0.734                   -                 - │
│ Fetch Data                     126.179 ( 16%)      160 ( 25%)      1.268                   -                 - │
│ Data Loader                    151.471 ( 45%)      808 ( 46%)      5.334                   -                 - │
│ Data Loader Fork                 2.392 (  0%)       96 (  0%)     40.135                   -                 - │
│ Reader                         299.992 ( 40%)    4,008 ( 51%)     13.360                   -                 - │
│ Reader POSIX (Lustre)          127.812 ( 45%)   10,432 ( 48%)     81.620   111833.161 ( 46%)           874.982 │
│ Checkpoint                       0.014 (  0%)        8 (  0%)    571.551                   -                 - │
│ Checkpoint POSIX (Lustre)        0.014 (  0%)       45 (  0%)   3268.686        0.011 (  0%)             0.791 │
│ Other POSIX                      2.392 (  0%)       96 (  0%)     40.135        0.000 (----)                 - │
└─────────────────────────────┴──────────────────┴────────────────┴───────────┴────────────────────┴──────────────────┘

Further Information

For more details, to report issues, or to contribute to DFAnalyzer, please refer to the following resources:

  • Official DFAnalyzer Documentation: For detailed usage, configuration options, and information about analyzers.
  • Issue Tracker: To report bugs or suggest new features.
  • Contributing Guidelines: For information on how to contribute to the project, including setting up a development environment and coding standards.
  • Citation File: If you use DFAnalyzer in your research, please cite it using the information in this file.

Acknowledgments

This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344. This material is based upon work supported by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research under the DOE Early Career Research Program (LLNL-CONF-862440). Also, this research is supported in part by the National Science Foundation (NSF) under Grants OAC-2104013, OAC-2313154, and OAC-2411318.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dftracer_analyzer-0.0.8.tar.gz (81.2 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

dftracer_analyzer-0.0.8-cp312-cp312-manylinux_2_35_x86_64.whl (33.5 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.35+ x86-64

dftracer_analyzer-0.0.8-cp311-cp311-manylinux_2_35_x86_64.whl (33.5 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.35+ x86-64

dftracer_analyzer-0.0.8-cp310-cp310-manylinux_2_35_x86_64.whl (33.5 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.35+ x86-64

dftracer_analyzer-0.0.8-cp39-cp39-manylinux_2_35_x86_64.whl (33.5 MB view details)

Uploaded CPython 3.9manylinux: glibc 2.35+ x86-64

File details

Details for the file dftracer_analyzer-0.0.8.tar.gz.

File metadata

  • Download URL: dftracer_analyzer-0.0.8.tar.gz
  • Upload date:
  • Size: 81.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for dftracer_analyzer-0.0.8.tar.gz
Algorithm Hash digest
SHA256 4ae967c01bf0370b17e4b01e2a0018ad3fd28527ec0b21348a9c055eb14e61d6
MD5 05a28adad36b08f8ff2d95ae4249d0bb
BLAKE2b-256 dac159b606e9fe85dd0a4506a5ab115563adbf2c23051addc503aad23a1482ec

See more details on using hashes here.

File details

Details for the file dftracer_analyzer-0.0.8-cp312-cp312-manylinux_2_35_x86_64.whl.

File metadata

File hashes

Hashes for dftracer_analyzer-0.0.8-cp312-cp312-manylinux_2_35_x86_64.whl
Algorithm Hash digest
SHA256 dad3e3b446f839e0092d917c88917562a8b24f0cf0861083b23466cc5101bc2f
MD5 e2fb3e13a5773e0e8ea5d0bd2612c4aa
BLAKE2b-256 cd4cd40768ea37d480f7924411c48460ef7649cb8b7b6f6cf40cbb64681f1dc8

See more details on using hashes here.

File details

Details for the file dftracer_analyzer-0.0.8-cp311-cp311-manylinux_2_35_x86_64.whl.

File metadata

File hashes

Hashes for dftracer_analyzer-0.0.8-cp311-cp311-manylinux_2_35_x86_64.whl
Algorithm Hash digest
SHA256 f6269400e21c76b74078130221185589c6ee8994d1b012f2c03d23bae541e4cf
MD5 31dda44acea1db41ed12965b7fcf08f8
BLAKE2b-256 71e65ef4eb656c7550c4eb4768fc2b5ca6a2732f320ac423559679d3df714b3a

See more details on using hashes here.

File details

Details for the file dftracer_analyzer-0.0.8-cp310-cp310-manylinux_2_35_x86_64.whl.

File metadata

File hashes

Hashes for dftracer_analyzer-0.0.8-cp310-cp310-manylinux_2_35_x86_64.whl
Algorithm Hash digest
SHA256 1b9ef12f8ffe55c77e2e204ed3ca7f2c4ab5fcd58bb8de3e57d8c85a6be71026
MD5 9d728db60369a75b20c92259254dbb84
BLAKE2b-256 ec44c1db4ba77ab294fd8721bce60dd42da882bfd4f8b23fac02017b4366677d

See more details on using hashes here.

File details

Details for the file dftracer_analyzer-0.0.8-cp39-cp39-manylinux_2_35_x86_64.whl.

File metadata

File hashes

Hashes for dftracer_analyzer-0.0.8-cp39-cp39-manylinux_2_35_x86_64.whl
Algorithm Hash digest
SHA256 268e68ce04afb6173517a1b1c0d62f1141b4417c9084f411f5115ecbfa194a38
MD5 2df1f8a9cb0429c09d9136b412f1a2b4
BLAKE2b-256 158468b19aea3bfc62f2feba03e393c5a8c1218bee2af96e6bc4a8fed4fefe82

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page