Skip to main content

Frequency Techniques for I/O

Project description

GitHub Release GitHub Release Date contributors issues license CI CD pypi


FTIO

Frequency Techniques for I/O

Explore the approach »
View Demo · Report Bug · Request Feature

FTIO captures periodic I/O using frequency techniques. Many high-performance computing (HPC) applications perform their I/O in bursts following a periodic pattern. Predicting such patterns can be very efficient for I/O contention avoidance strategies, including burst buffer management, for example. FTIO allows offline detection and online prediction of periodic I/O phases. FTIO uses the discrete Fourier transform (DFT), combined with outlier detection methods to extract the dominant frequency in the signal. Additional metrics gauge the confidence in the output and tell how far from being periodic the signal is. A complete description of the approach is provided here.

This repository provides two main Python-based tools:

  • ftio: Uses frequency techniques and outlier detection methods to find the period of I/O phases
  • predictor: Implements the online version of FTIO. It reinvokes FTIO whenever new traces are appended to the monitored file. See online prediction for more details. We recommend using TMIO to generate the file with the I/O traces.

Other tools:

  • ioplot: Generates interactive plots in HTML
  • ioparse: Parses and merges several traces to an Extra-P supported format. This allows one to examine the scaling behavior of the monitored metrics. Traces generated by FTIO (frequency modls), TMIO ( msgpack, json and jsonl) and other tools (Darshan, Recorder, and TAU Metric Proxy) are supported.
Table of Contents
  1. Installation
  2. Usage
  3. Testing
  4. Contributing
  5. Contact
  6. License
  7. Acknowledgments
  8. Citation
  9. Publications

Join the Slack channel or see the latest updates here: Latest News

Installation

FTIO is available on PYPI and can be easily installed via pip. For the most recent stable GitHub version, FTIO can be installed either automatically or manually. For the development version with the latest code functionalities, FTIO can be installed in the development mode. As a prerequisite, for the virtual environment, python3.11-venv is needed, which can be installed on Ubuntu, for example, with:

apt install python3.11-venv

If you want to contribute to the code, we advise that you install FTIO as mentioned under contributing.

Automated installation from GitHub

FTIO is installed by default in a virtual environment. For the automated installation, simply execute the command:

# clone FTIO
git clone https://github.com/tuda-parallel/FTIO.git
cd  FTIO
# uses by default python3 
make install

# or using a specific python version,
# which is often needed on a cluster 
make install PYTHON=python3.12

# or additionally install all optional packages
make full PYTHON=python3.12

This generates a virtual environment in the current directory, sources .venv/bin/activate, and installs FTIO as a module. If you are working on an HPC cluster, you first need to load the Python module (e.g., module load python/3.12) and eventually add ~/.loacl/bin to your PATH (e.g., export PATH=$PATH:~/.local/bin) in case it's not there yet.

If you don't need a dedicated environment, just call:

make ftio PYTHON=python3

Automated installation from PYPI

FTIO is available on PYPI and can be easily installed via pip:

pip install ftio-hpc

This instals FTIO in the most recently stable version (main branch).

[!note] Note there are currently issues with pyDarshan on Mac and windows, that can be solved as mentioned here

Manual installation from GitHub

Create a virtual environment if needed and activate it:

git clone https://github.com/tuda-parallel/FTIO.git
cd  FTIO
python3 -m venv .venv
source .venv/bin/activate

Install all tools provided in this repository simply by using pip:

pip install .

#Or with external dependencies for improved performance
pip install '.[external-libs]'

#Or with external dependencies and style tools 
pip install '.[external-libs,development-libs]'

#Or with external dependencies, style tools, and plot libs (to call `ioplot` with dash support) 
pip install '.[external-libs,development-libs,plot-libs]'

[!note] You need to activate the environment to use ftio and the other tools using:

source path/to/venv/bin/activate

[!note] Note there are currently issues with pyDarshan on Mac and windows, that can be solved as mentioned here

Automated Installation: Developer Environment Setup

By default, FTIO installs into an isolated virtual environment. The following steps guide you through retrieving and configuring the latest development version with debug symbols and editable instal using the make debug target:

# 1. Clone the FTIO repository
git clone https://github.com/tuda-parallel/FTIO.git
cd FTIO

# 2. Switch to the development branch
git checkout development

# 3. Install in editable/debug mode (defaults to current python)
make debug

# To specify a different Python interpreter (e.g., on an HPC cluster):
make debug PYTHON=python3.12

This process establishes a development environment that:

  • Instantiates a virtual environment (.venv/) in the project directory.
  • Activates the environment by sourcing the .venv/bin/activate script (i.e., source .venv/bin/activate).
  • Installs FTIO in “editable” mode, ensuring that any modifications to the source code are immediately reflected upon import.

Usage

For installation instructions see installation.

To call ftio on a file execute:

ftio filename.extension

There are three options to use ftio and predictor:

  1. Provide a supported file format to the tool. Supported extensions are json, jsonLines, msgpack, and darshan. For recorder, you provide the path to the folder instead of filename.extension. For more on the input format see supported file formats. There is also an option to provide a custom format.
  2. Use the API. This is particularly good if you just want to experiment with the tool, or directly jump into using it with as little effort as possible.
  3. Send TCP messages over ZeroMQ (ZMQ) to the tools as described here. There is also an API example with ZMQ and GekkoFS here. Usually, predictor is used with ZMQ, as it makes little sense to use ftio with this option.

In all cases, various options can be provided to ftio and predictor. To see all available command line arguments, call:

ftio -h

  
usage: ftio [-h] [-m MODE] [-r RENDER] [-f FREQ] [-ts TS] [-te TE] [-tr TRANSFORMATION] [-e ENGINE]
            [-o OUTLIER] [-le LEVEL] [-t TOL] [-d] [-nd] [-re] [--no-reconstruction] [-p] [-np] [-c] [-w]
            [-fh FREQUENCY_HITS] [-v] [-s] [-ns] [-a] [-na] [-i] [-ni] [-x DXT_MODE] [-l LIMIT]
            files [files ...]

There are several options available to enhance the frequency predictions from ftio. In the standard mode, the DFT is used in combination with an outlier detection method. Additionally, autocorrelation can be used to further increase the confidence in the results:

  1. DFT + outlier detection (Z-score, DB-Scan, Isolation forest, peak detection, or LOF)​
  2. Optionally: Autocorrelation + Peak detection (-c flag)
  3. If step 2. is performed, the results from both predictions are merged automatically

See offline detection for more details.

Several flags can be specified. The most relevant settings are:

Flag Description
file file, file list (file 0 ... file n), folder, or folder list (folder 0.. folder n) containing traces (positional argument)
-h, --help show this help message and exit
-m MODE, --mode MODE if the trace file contains several I/O modes, a specific mode can be selected. Supported modes are: write_async, read_async, write_sync, read_sync
-r RENDER, --render RENDER specifies how the plots are rendered. Either dynamic (default) or static
-f FREQ, --freq FREQ specifies the sampling rate with which the continuous signal is discretized (default=10Hz). This directly affects the highest captured frequency (Nyquist). The value is specified in Hz. In case this value is set to -1, the auto mode is launched which sets the sampling frequency automatically to the smallest change in the bandwidth detected. Note that the lowest allowed frequency in the auto mode is 2000 Hz
-ts TS, --ts TS Modifies the start time of the examined time window
-te TE, --te TE Modifies the end time of the examined time window
-tr TRANSFORMATION, --transformation TRANSFORMATION specifies the frequency technique to use. Supported modes are: dft (default), wave_disc, and wave_cont
-e ENGINE, --engine ENGINE specifies the engine used to display the figures. Either plotly (default) or mathplotlib can be used. Plotly is used to generate interactive plots as HTML files. Set this value to no if you do not want to generate plots
-o OUTLIER, --outlier OUTLIER outlier detection method: Z-score (default), DB-Scan, Isolation_forest, or LOF
-le LEVEL, --level LEVEL specifies the decomposition level for the discrete wavelet transformation (default=3). If specified as auto, the maximum decomposition level is automatic calculated
-t TOL, --tol TOL tolerance value
-d, --dtw performs dynamic time warping on the top 3 frequencies (highest contribution) calculated using the DFT if set (default=False)
-re, --reconstruction plots reconstruction of top 10 signals on figure
-np, --no-psd if set, replace the power density spectrum (a*a/N) with the amplitude spectrum (a)
-au, --autocorrelation if set, autocorrelation is calculated in addition to DFT. The results are merged to a single prediction at the end
-p, --periodicity Activate calculation of new periodicity score. Options are recurrence period density entropy (RPDE), spectral flatness (SF), correlation (corr) and individual period correlation (ind)
-w, --window_adaptation online time window adaptation. If set to true, the time window is shifted on X hits to X times the previous phases from the current instance. X corresponds to frequency_hits
-fh FREQUENCY_HITS, --frequency_hits FREQUENCY_HITS specifies the number of hits needed to adapt the time window. A hit occurs once a dominant frequency is found
-v, --verbose sets verbose on or off (default=False)
-x DXT_MODE, --dxt_mode DXT_MODE select data to extract from Darshan traces (DXT_POSIX or DXT_MPIIO (default))
-l LIMIT, --limit LIMIT max ranks to consider when reading a folder

predictor has the same syntax as ftio. All arguments that are available for ftio are also available for predictor.

Testing

There is a 8.jsonl file provided for testing under examples. On your system, navigate to the folder examples/tmio/JSONL and call:

ftio 8.jsonl

Examples

Several examples are provided under examples. See also the examples provided here for the different file formats.

Alternatively, the artifact folder contains several instructions and examples traces from the FTIO paper that can be simply downloaded as described here.

As ftio supports Darshan traces, you could download also traces from https://hpcioanalysis.zdv.uni-mainz.de/ and execute FTIO on them as described here.

For an online example with predictor, you can follow the instructions here for HACC-IO.

Contributing

Kindly see the instructions provided under docs/contributing.md.

[!note] If you are a student from TU Darmstadt, kindly see these instructions.

Contact

License

license

Distributed under the BSD 3-Clause License. See LICENCE for more information.

Acknowledgments

Authors:

  • Ahmad Tarraf

This work is a result of cooperation between the Technical University of Darmstadt and INRIA in the scope of the EuroHPC ADMIRE project.

Citation

 @inproceedings{AT24_ftio, 
  author={Tarraf, Ahmad and Bandet, Alexis and Boito, Francieli and Pallez, Guillaume and Wolf, Felix},
  booktitle={2024 IEEE International Parallel and Distributed Processing Symposium (IPDPS)}, 
  title={Capturing periodic {I/O} using frequency techniques}, 
  month=may, 
  year={2024},
  pages={465-478},
  publisher = {IEEE},
  doi={10.1109/IPDPS57955.2024.00048}
 }

Publications

  1. A. Tarraf, A. Bandet, F. Boito, G. Pallez, and F. Wolf, “Capturing Periodic I/O Using Frequency Techniques,” in 2024 IEEE International Parallel and Distributed Processing Symposium (IPDPS), San Francisco, CA, USA, May 2024, pp. 1–14.

  2. A. Tarraf, A. Bandet, F. Boito, G. Pallez, and F. Wolf, “FTIO: Detecting I/O periodicity using frequency techniques.” arXiv preprint arXiv:2306.08601 (2023).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ftio_hpc-0.0.8.tar.gz (305.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ftio_hpc-0.0.8-py3-none-any.whl (403.0 kB view details)

Uploaded Python 3

File details

Details for the file ftio_hpc-0.0.8.tar.gz.

File metadata

  • Download URL: ftio_hpc-0.0.8.tar.gz
  • Upload date:
  • Size: 305.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.25

File hashes

Hashes for ftio_hpc-0.0.8.tar.gz
Algorithm Hash digest
SHA256 6d3090a1daa15919dcebd728424c7b8d85272158404f710113520dea74736d48
MD5 84ac94c5484e39c90ea442cd7740a820
BLAKE2b-256 43c0dbd07c10e5f4a5a54566bb1451b3835d7bf5728ef481073b51ad5d500438

See more details on using hashes here.

File details

Details for the file ftio_hpc-0.0.8-py3-none-any.whl.

File metadata

  • Download URL: ftio_hpc-0.0.8-py3-none-any.whl
  • Upload date:
  • Size: 403.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.25

File hashes

Hashes for ftio_hpc-0.0.8-py3-none-any.whl
Algorithm Hash digest
SHA256 a53720dbd2d11e6ae39aafacbf9cbd223a1bb1a5020e42cb917400b9ab39f75d
MD5 7518c05db2ed736263e86352954c580d
BLAKE2b-256 cbe149e688f5f9a95aa2d719b4a1174a5078ca6483e4b673f88adaba993488e5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page