Extract and project fundamental factors in MPI applications.

## Project Description

Basic Analysis and Projection of Fundamental Factors

=========================================================================================

Pre-requisites:

=========================================================================================

- Point local variable AUTOMATIC_ANALYSIS to this folder (more information in setup.sh)

- Dimemas installation (scripts has been evaluated with version 5.2.5)

- Python 2.7.3 (other versions had not been evaluated yet)

- The following python modules:

- numpy 1.6.4

- scipy 0.11.1 (or greater)

- lmfit 0.7.2 (or greater) -- http://cars9.uchicago.edu/software/python/lmfit/

numpy and scipy can be installed from the package manager.

To install lmfit, decompress the file included here, and as a root user type:

# python setup.py install

- Traces to be analyzed (some example traces are included in nekbone_bgq)

NOTE: To verify the versions of the modules installed, you can run

> python basicanalysis/share/install/run_this_first.py

=========================================================================================

1. Extracting model factors (Load Balance, Serialization, Transfer, Parallel Efficiency):

=========================================================================================

To obtain a summary with information for performance factors:

$ model_factors.py -i indat.cfg -sim {time|cycles} -lat {latency} -bw {bandwitdh}

-sc {strong|weak} -phase <name_defined_by_user>

-t <list_of_traces>.prv

Parameters can also be fed by only passing the indat.cfg (there's an example in the

example folder inside this directory named indat_modelfactors.cfg).

For example:

$ model_factors.py -i indat.cfg -sim time -sc weak -phase nekbone_example

-t nekbone_bgq/*.prv

Extracts the performance factors from all the traces included in nekbone_bgq.

Traces were obtained using a weak scaling approach. That is the main reason of

choosing 'weak or strong' when running the script. Results are shown in a

model_factors_<name_defined_by_user>.csv file and a gnuplot file.

To see the resulting graph:

$ gnuplot model_factors_timeBased_nekbone_example.gnuplot

1.1 Current available graphs:

- Fundamental Factors: Serialization, Transfer, Load Balance and Parallel

Efficiency.

- Speedup: Specific of weak or strong scaling executions.

- Point-to-Point Communications: some metrics about P2P communications in the

traces. Zero if there are any.

- Collective Communications: some metrics about bytes sends and calls performed

at collective level (Allreduce, Bcast, etc)

- Load Balances: Instruction, IPC, and Time Load Imbalances.

- Instruction rate vs. IPC: observed instruction rate and IPC, it also depends

on the type of scaling (strong or weak).

- Cycles per microsecons: Observed cycles per usec per evaluated point, useful to

identify changes among the processes.

- Elapsed time: execution time of each trace.

- Other Efficiency Factors: A summary of cycles per usec, load imbalance at instruction

or IPC level. Helpful for sanity check.

=========================================================================================

2. Projection of performance factors based on the knowledge of the application:

=========================================================================================

To extrapolate the collected performance factors (from a very small number of core counts

to larger core counts), user must indicate the appropiate fitting model to each one of

the factors.

From the measured values, Serialization and Transfer can be extrapolated based on an

Amdahl's Law-based model or on a Pipeline-based fitting model, under this form:

Amdalh_fit = elem_0 / (f_elem - (1-f_elem) * P)

Pipeline_fit = (elem_0 * P) / ((1-f_elem) + f_elem*(2*P-1))

*** elem_0 and f_elem are estimated using the least squares method over the collected

measurements, and P the is number of processes used.

In this version, it has been included the option to fit Serialization and Transfer using a

logaritmical function, under the form:

Logaritmical fit = f_elem * log(procs) + elem_0 (logarithm base 10)

While Load Balance supports Amdahl's-based fitting, it also supports the use of constants:

the minimum (min) --or worst value from collected measurements--, the average (avg) value,

and the same logarithmical function described above (log).

In addition, the efficiency loss may not be directly influenced by the number of processes.

Therefore, it has been considered to implement several scenarios for the evolution of

efficiency. As a default value, applications are expected to reduce their efficiency while

increasing the number of processes, thus indicating a linear relation between efficiency

and processes.

In some parallel applications, it can be observed that processes may not interact with

all their partners. They may exchange data with, lets say a number of processes close

to the cubic root of the total number of processes, or following a logaritmical function

of base 2.

Is for this reason, there are 3 parameters (linear, cubic, and log) that may alter

the interpretation of the number of processes for each one of the fundamental factors.

Therefore, the framework is called as following:

$ projection_efficiency.py -P_ser {linear|cubic|log}

-ser_fit <Serialization>{amdahl|pipeline|log}

-P_trf {linear|cubic|log} -trf_fit <Transfer>{amdahl|pipeline|log}

-P_lb {linear|cubic|log} -lb_fit <LoadBalance> {min|avg|amdahl|log}

-f <name_of_csv_file>.csv

From previous example:

$ projection_efficiency.py -f model_factors_timeBased_nekbone_example.csv

or

$ projection_efficiency.py -P_ser linear -ser_fit pipeline -lb_fit min

-f model_factors_timeBased_nekbone_example.csv

Generates the extrapolation of performance factors, with the comparison between

measurements and projected variables using amdahl's model to fit only Transfer

(Serialization is fitted with the pipeline model, and Load Balance using the minimum

value measured as a constant. All is summarized in three gnuplot files and one.csv,

the last is generated to facilitate porting data to a spreadsheet.

For this example the number of processes has been considered linear. The cubic

option was implemented for applications where the total data is distributed among

the processes under a cubic shape (e.g. HACC from Coral Benchmark has this

characteristic); where processes mainly interact with only a reduced group

of the total number of processes.

Projection_Efficiency.py has by default the values of linear for the number of

processes, and Amdahl's model to fit all performance factors. To change these

values the framework can be called using pipeline or min instead of amdahl as

parameter (if the performance factors has this option for fitting). For example:

$ projection_efficiency.py -i indat.cfg -P_ser linear -ser_fit pipeline

-ser_trf pipeline -lb_trf min

-f model_factors_timeBased_nekbone_example.csv

Parameters can also be fed by only passing the indat_projection.cfg (there's a

copy in the example folder inside this directory).

New fitting modules and additional enhancements are still under development.

=========================================================================================

Any further questions or doubts, please contact: crosas@bsc.es

=========================================================================================

=========================================================================================

Pre-requisites:

=========================================================================================

- Point local variable AUTOMATIC_ANALYSIS to this folder (more information in setup.sh)

- Dimemas installation (scripts has been evaluated with version 5.2.5)

- Python 2.7.3 (other versions had not been evaluated yet)

- The following python modules:

- numpy 1.6.4

- scipy 0.11.1 (or greater)

- lmfit 0.7.2 (or greater) -- http://cars9.uchicago.edu/software/python/lmfit/

numpy and scipy can be installed from the package manager.

To install lmfit, decompress the file included here, and as a root user type:

# python setup.py install

- Traces to be analyzed (some example traces are included in nekbone_bgq)

NOTE: To verify the versions of the modules installed, you can run

> python basicanalysis/share/install/run_this_first.py

=========================================================================================

1. Extracting model factors (Load Balance, Serialization, Transfer, Parallel Efficiency):

=========================================================================================

To obtain a summary with information for performance factors:

$ model_factors.py -i indat.cfg -sim {time|cycles} -lat {latency} -bw {bandwitdh}

-sc {strong|weak} -phase <name_defined_by_user>

-t <list_of_traces>.prv

Parameters can also be fed by only passing the indat.cfg (there's an example in the

example folder inside this directory named indat_modelfactors.cfg).

For example:

$ model_factors.py -i indat.cfg -sim time -sc weak -phase nekbone_example

-t nekbone_bgq/*.prv

Extracts the performance factors from all the traces included in nekbone_bgq.

Traces were obtained using a weak scaling approach. That is the main reason of

choosing 'weak or strong' when running the script. Results are shown in a

model_factors_<name_defined_by_user>.csv file and a gnuplot file.

To see the resulting graph:

$ gnuplot model_factors_timeBased_nekbone_example.gnuplot

1.1 Current available graphs:

- Fundamental Factors: Serialization, Transfer, Load Balance and Parallel

Efficiency.

- Speedup: Specific of weak or strong scaling executions.

- Point-to-Point Communications: some metrics about P2P communications in the

traces. Zero if there are any.

- Collective Communications: some metrics about bytes sends and calls performed

at collective level (Allreduce, Bcast, etc)

- Load Balances: Instruction, IPC, and Time Load Imbalances.

- Instruction rate vs. IPC: observed instruction rate and IPC, it also depends

on the type of scaling (strong or weak).

- Cycles per microsecons: Observed cycles per usec per evaluated point, useful to

identify changes among the processes.

- Elapsed time: execution time of each trace.

- Other Efficiency Factors: A summary of cycles per usec, load imbalance at instruction

or IPC level. Helpful for sanity check.

=========================================================================================

2. Projection of performance factors based on the knowledge of the application:

=========================================================================================

To extrapolate the collected performance factors (from a very small number of core counts

to larger core counts), user must indicate the appropiate fitting model to each one of

the factors.

From the measured values, Serialization and Transfer can be extrapolated based on an

Amdahl's Law-based model or on a Pipeline-based fitting model, under this form:

Amdalh_fit = elem_0 / (f_elem - (1-f_elem) * P)

Pipeline_fit = (elem_0 * P) / ((1-f_elem) + f_elem*(2*P-1))

*** elem_0 and f_elem are estimated using the least squares method over the collected

measurements, and P the is number of processes used.

In this version, it has been included the option to fit Serialization and Transfer using a

logaritmical function, under the form:

Logaritmical fit = f_elem * log(procs) + elem_0 (logarithm base 10)

While Load Balance supports Amdahl's-based fitting, it also supports the use of constants:

the minimum (min) --or worst value from collected measurements--, the average (avg) value,

and the same logarithmical function described above (log).

In addition, the efficiency loss may not be directly influenced by the number of processes.

Therefore, it has been considered to implement several scenarios for the evolution of

efficiency. As a default value, applications are expected to reduce their efficiency while

increasing the number of processes, thus indicating a linear relation between efficiency

and processes.

In some parallel applications, it can be observed that processes may not interact with

all their partners. They may exchange data with, lets say a number of processes close

to the cubic root of the total number of processes, or following a logaritmical function

of base 2.

Is for this reason, there are 3 parameters (linear, cubic, and log) that may alter

the interpretation of the number of processes for each one of the fundamental factors.

Therefore, the framework is called as following:

$ projection_efficiency.py -P_ser {linear|cubic|log}

-ser_fit <Serialization>{amdahl|pipeline|log}

-P_trf {linear|cubic|log} -trf_fit <Transfer>{amdahl|pipeline|log}

-P_lb {linear|cubic|log} -lb_fit <LoadBalance> {min|avg|amdahl|log}

-f <name_of_csv_file>.csv

From previous example:

$ projection_efficiency.py -f model_factors_timeBased_nekbone_example.csv

or

$ projection_efficiency.py -P_ser linear -ser_fit pipeline -lb_fit min

-f model_factors_timeBased_nekbone_example.csv

Generates the extrapolation of performance factors, with the comparison between

measurements and projected variables using amdahl's model to fit only Transfer

(Serialization is fitted with the pipeline model, and Load Balance using the minimum

value measured as a constant. All is summarized in three gnuplot files and one.csv,

the last is generated to facilitate porting data to a spreadsheet.

For this example the number of processes has been considered linear. The cubic

option was implemented for applications where the total data is distributed among

the processes under a cubic shape (e.g. HACC from Coral Benchmark has this

characteristic); where processes mainly interact with only a reduced group

of the total number of processes.

Projection_Efficiency.py has by default the values of linear for the number of

processes, and Amdahl's model to fit all performance factors. To change these

values the framework can be called using pipeline or min instead of amdahl as

parameter (if the performance factors has this option for fitting). For example:

$ projection_efficiency.py -i indat.cfg -P_ser linear -ser_fit pipeline

-ser_trf pipeline -lb_trf min

-f model_factors_timeBased_nekbone_example.csv

Parameters can also be fed by only passing the indat_projection.cfg (there's a

copy in the example folder inside this directory).

New fitting modules and additional enhancements are still under development.

=========================================================================================

Any further questions or doubts, please contact: crosas@bsc.es

=========================================================================================

## Release history Release notifications

## Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Filename, size & hash SHA256 hash help | File type | Python version | Upload date |
---|---|---|---|

basicanalysis-0.2b1.tar.gz (9.6 MB) Copy SHA256 hash SHA256 | Source | None | Apr 9, 2015 |