Frequency Techniques for I/O
Project description
FTIO captures periodic I/O using frequency techniques. Many high-performance computing (HPC) applications perform their I/O in bursts following a periodic pattern. Predicting such patterns can be very efficient for I/O contention avoidance strategies, including burst buffer management, for example. FTIO allows offline detection and online prediction of periodic I/O phases. FTIO uses the discrete Fourier transform (DFT), combined with outlier detection methods to extract the dominant frequency in the signal. Additional metrics gauge the confidence in the output and tell how far from being periodic the signal is. A complete description of the approach is provided here.
This repository provides two main Python-based tools:
ftio
: uses frequency techniques and outlier detection methods to find the period of I/O phasespredictor
: implements the online version of FTIO. It reinvokes FTIO whenever new traces are appended to the monitored file. See online prediction for more details. We recommend using TMIO to generate the file with the I/O traces.
Other tools:
ioplot
generates interactive plots in HTMLioparse
parses and merges several traces to an Extra-P supported format. This allows one to examine the scaling behavior of the monitored metrics. Traces generated by FTIO (frequency modls), TMIO (msgpack, json and jsonl) and other tools (Darshan, Recorder, and TAU Metric Proxy) are supported.
Table of Contents
Join the Slack channel or see the latest updates here: Latest News
Installation
FTIO is available on PYPI and can be easily installed via pip:
pip install ftio-hpc
For the latest GitHub version, FTIO can be installed either automatically or manually. As a prerequisite,
for the virtual environment, python3.11-venv
is needed, which can be installed on Ubuntu, for example, with:
apt install python3.11-venv
Note there are currently issues with pyDarshan on Mac, that can be solved as mentioned here
Automated installation
Simply call the make command:
make install
This generates a virtual environment in the current directory, sources .venv/bin/activate
, and installs FTIO as a module.
If you don't need a dedicated environment, just call:
make ftio PYTHON=python3
Manual installation
Create a virtual environment if needed and activate it:
python3 -m venv .venv
source .venv/bin/activate
Install all tools provided in this repository simply by using pip:
pip install .
Note: you need to activate the environment to use ftio
and the other tools using:
source path/to/venv/bin/activate
Usage
For installation instructions see installation.
To call ftio
on a single file, use:
ftio filename.extension
Supported extensions are json
, jsonLines
, msgpack
, and darshan
. For recorder, you provide the path to the folder instead of filename.extension
. For more on the input format including a custom format see supported file formats.
FTIO provides various options and extensions. To see all available command line arguments, call:
ftio -h
usage: ftio [-h] [-m MODE] [-r RENDER] [-f FREQ] [-ts TS] [-te TE] [-tr TRANSFORMATION] [-e ENGINE]
[-o OUTLIER] [-le LEVEL] [-t TOL] [-d] [-nd] [-re] [--no-reconstruction] [-p] [-np] [-c] [-w]
[-fh FREQUENCY_HITS] [-v] [-s] [-ns] [-a] [-na] [-i] [-ni] [-x DXT_MODE] [-l LIMIT]
files [files ...]
ftio
generates frequency predictions. There are several options available to enhance the predictions. In the standard mode, the DFT is used in combination with an outlier detection method. Additionally, autocorrelation can be used to further increase the confidence in the results:
- DFT + outlier detection (Z-score, DB-Scan, Isolation forest, peak detection, or LOF)
- Optionally: Autocorrelation + Peak detection (
-c
flag) - If step 2. is performed, the results from both predictions aer merged automatically
Several flags can be specified. The most relevant settings are:
Flag | Description |
---|---|
file | file, file list (file 0 ... file n), folder, or folder list (folder 0.. folder n) containing traces (positional argument) |
-h, --help | show this help message and exit |
-m MODE, --mode MODE | if the trace file contains several I/O modes, a specific mode can be selected. Supported modes are: write_async, read_async, write_sync, read_sync |
-r RENDER, --render RENDER | specifies how the plots are rendered. Either dynamic (default) or static |
-f FREQ, --freq FREQ | specifies the sampling rate with which the continuous signal is discretized (default=10Hz). This directly affects the highest captured frequency (Nyquist). The value is specified in Hz. In case this value is set to -1, the auto mode is launched which sets the sampling frequency automatically to the smallest change in the bandwidth detected. Note that the lowest allowed frequency in the auto mode is 2000 Hz |
-ts TS, --ts TS | Modifies the start time of the examined time window |
-te TE, --te TE | Modifies the end time of the examined time window |
-tr TRANSFORMATION, --transformation TRANSFORMATION | specifies the frequency technique to use. Supported modes are: dft (default), wave_disc, and wave_cont |
-e ENGINE, --engine ENGINE | specifies the engine used to display the figures. Either plotly (default) or mathplotlib can be used. Plotly is used to generate interactive plots as HTML files. Set this value to no if you do not want to generate plots |
-o OUTLIER, --outlier OUTLIER | outlier detection method: Z-score (default), DB-Scan, Isolation_forest, or LOF |
-le LEVEL, --level LEVEL | specifies the decomposition level for the discrete wavelet transformation (default=3). If specified as auto, the maximum decomposition level is automatic calculated |
-t TOL, --tol TOL | tolerance value |
-d, --dtw | performs dynamic time wrapping on the top 3 frequencies (highest contribution) calculated using the DFT if set (default=False) |
-re, --reconstruction | plots reconstruction of top 10 signals on figure |
-np, --no-psd | if set, replace the power density spectrum (a*a/N) with the amplitude spectrum (a) |
-c, --autocorrelation | if set, autocorrelation is calculated in addition to DFT. The results are merged to a single prediction at the end |
-w, --window_adaptation | online time window adaptation. If set to true, the time window is shifted on X hits to X times the previous phases from the current instance. X corresponds to frequency_hits |
-fh FREQUENCY_HITS, --frequency_hits FREQUENCY_HITS | specifies the number of hits needed to adapt the time window. A hit occurs once a dominant frequency is found |
-v, --verbose | sets verbose on or off (default=False) |
-x DXT_MODE, --dxt_mode DXT_MODE | select data to extract from Darshan traces (DXT_POSIX or DXT_MPIIO (default)) |
-l LIMIT, --limit LIMIT | max ranks to consider when reading a folder |
predictor
has the same syntax as ftio
.
All arguments that are available for ftio
are also available for predictor
.
Testing
There is a 8.jsonl
file provided for testing under examples. On your system, navigate to the folder and call:
ftio 8.jsonl
Contributing
If you have a suggestion that would make this better, please fork the repository and create a pull request.
- Fork the project
- Create your feature branch (
git checkout -b feature/AmazingFeature
) - Commit your changes (
git commit -m 'Add some AmazingFeature'
) - Push to the branch (
git push origin feature/AmazingFeature
) - Open a pull request
Contact
- Ahmad Tarraf: ahmad.tarraf@tu-darmstadt.de
License
Distributed under the BSD 3-Clause License. See LICENCE for more information.
Acknowledgments
Authors:
- Ahmad Tarraf
This work is a result of cooperation between the Technical University of Darmstadt and INRIA in scope of the EuroHPC ADMIRE project.
Citation
@inproceedings{Tarraf_Bandet_Boito_Pallez_Wolf_2024,
author={Tarraf, Ahmad and Bandet, Alexis and Boito, Francieli and Pallez, Guillaume and Wolf, Felix},
title={Capturing Periodic I/O Using Frequency Techniques},
booktitle={2024 IEEE International Parallel and Distributed Processing Symposium (IPDPS)},
address={San Francisco, CA, USA},
year={2024},
month=may,
pages={1–14},
notes = {(accepted)}
}
Publications
-
A. Tarraf, A. Bandet, F. Boito, G. Pallez, and F. Wolf, “Capturing Periodic I/O Using Frequency Techniques,” in 2024 IEEE International Parallel and Distributed Processing Symposium (IPDPS), San Francisco, CA, USA, May 2024, pp. 1–14.
-
A. Tarraf, A. Bandet, F. Boito, G. Pallez, and F. Wolf, “FTIO: Detecting I/O periodicity using frequency techniques.” arXiv preprint arXiv:2306.08601 (2023).
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file ftio_hpc-0.0.4.tar.gz
.
File metadata
- Download URL: ftio_hpc-0.0.4.tar.gz
- Upload date:
- Size: 105.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.9.19
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 00f1eec0c393bbbf3bd8a5a608bced4e979f6028d24b917f79ef8a67f009ad80 |
|
MD5 | 580121fa0f5b90dfbceea498b9ca3d14 |
|
BLAKE2b-256 | 495dae60878177b0d8458142b4766c650678dfd6352e82fff32a441305b273d5 |
File details
Details for the file ftio_hpc-0.0.4-py3-none-any.whl
.
File metadata
- Download URL: ftio_hpc-0.0.4-py3-none-any.whl
- Upload date:
- Size: 128.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.9.19
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a6f4d5f0cb52274cf52c0aa0a5ed7fe542e9104674d166d36eae2740199f3afb |
|
MD5 | 1b1a7ffaa40ca4cb6b63a83d87f34ba3 |
|
BLAKE2b-256 | 311d4e499dafd3d5887ba0151e4701d8ca7512e425140fa28475f78e7f78a6d4 |