emewscreator simplifies the creation of emews workflows through the use of templates
Project description
EMEWS Creator
EMEWS Creator is a Python application for creating workflow projects for EMEWS (Extreme-scale Model Exploration with Swift). The EMEWS framework enables the direct integration of multi-language model exploration (ME) algorithms while scaling dynamic computational experiments to very large numbers (millions) of models on all major HPC platforms. EMEWS has been designed for any "black box" application code, such as agent-based and microsimulation models or training of machine learning models, that require multiple runs as part of heuristic model explorations. One of the main goals of EMEWS is to democratize the use of large-scale computing resources by making them accessible to more researchers in many more science domains. EMEWS is built on the Swift/T parallel scripting language.
See the EMEWS Site for more information.
Installation
EMEWS Creator can be downloaded and installed from PyPI using pip.
pip install emewscreator
Using EMEWS Creator
The following provides an overview of how to use EMEWS Creator to create workflow projects. For a more comprehensive explanation see the EMEWS Tutorial.
EMEWS Creator is run from the command line.
$ emewscreator -h
Usage: emewscreator [OPTIONS] COMMAND [ARGS]...
Options:
-V, --version Show the version and exit.
-o, --output-dir PATH Directory into which the project template will be
generated. Defaults to the current directory
-m, --model-name TEXT Name of the model application. Defaults to "model".
-w, --overwrite Overwrite existing files
-h, --help Show this message and exit.
Commands:
eqpy create an eqpy workflow
eqr create an eqr workflow
eqsql create an eqsql workflow
init_db initialize an eqsql database
sweep create a sweep workflow
The sweep
, eqpy
, eqr
, and eqsql
commands create a particular type of workflow: a sweep, an eqpy-based workflow, an eqr-based workflow, or an eqsql-based workflow. Each of the commands has its own arguments specific to that
workflow type. Those arguments will be covered in the Workflow Templates section
below.
The options supplied to emewscreator
are common to all the workflow types.
--output-dir
- the root directory of the directory structure and files created by EMEWS Creator.--model-name
- the name of the model that will be run during the workflow. This will be used in the model execution bash script. Spaces will be replaced by underscores.--overwrite
- if present, EMEWS Creator will overwrite any existing files in theoutput-dir
directory when creating the workflow. By default, existing files will not be overwritten.
These values can also be supplied in a yaml format configuration file. Sample
configuration files can be found here
in the example_cfgs
directory in the EMEWS Creator github repository. See the
Workflow Templates section for more information.
The final command init_db
creates and initializes the postgresql database required for
running an esql workflow. Its arguments will also be covered in the Workflow Templates section below. When executing the init_db
command, no arguments to emewscreator
are required.
EMEWS Project Structure
Each of the workflow types will create the default EMEWS project structure
in the directory specified by the -o, --output-dir
argument.
EMEWS Creator is designed such that multiple workflows can be run in the same directory.
For example, you can begin with the sweep
and then create an eqr
or eqpy
workflow in the same output directory. When multiple workflows are created
in the same output directory, it is crucial that the workflow_name
configuration template argument is unique to each individual workflow. See
the Workflow Templates section for more information on the workflow_name
argument.
Directories
Given an --output-dir
argument of my_emews_project
, the default directory structure
produced by all the workflow types is:
my_emews_project/
data/
etc/
ext/
python/
test/
R/
test/
scripts/
swift/
cfgs/
README.md
The directories are intended to contain the following:
data
- date required by the model and algorithm (inputs, etc.).etc
- additional code used by EMEWSext
- Swift/T extensions, including the default EMEWS utility code extension as well as the EQ/R and EQ/Py extensionspython
- Python code (e.g., model exploration algorithms written in Python)python\test
- tests of the Python codeR
- R code (e.g., model exploration algorithms written R)R\test
- tests of the R codescripts
- any necessary scripts (e.g., scripts to launch a model), excluding scripts used to run the workflowswift
- Swift/T code and scripts used to submit and run the workflow
Files
Each of the workflow types will generate the following files. The file names are derived from parameters specified in the workflow template configuration arguments. The names of those parameters are included in curly brackets in the file names below.
swift/run_{workflow_name}.sh
- a bash script used to launch the workflowswift/{workflow_name}.swift
- the swift script that implements the workflow.scripts/run_{model_name}_{workflow_name}.sh
- a bash script used to run the model application.swift/cfgs/{workflow_name}.cfg
- a configuration file for running the workflow
These files may require some user customization before they can be used. The
relevant sections are marked with TODO
.
Once any edits have been completed, the workflows can be run with:
$ run_{workflow_name}.sh <experiment_name> cfgs/{workflow_name}.cfg
Workflow Templates
Each workflow template has its own set of command line arguments, but all have the following in common:
-n, --workflow-name
- the name of the workflow. This will be used as the file name for the workflow configuration, submission, and swift script files. Spaces will be replaced by underscores. Theworkflow_name
should be unique among all the workflows in the output directory.-c, --config
- path to the workflow template configuration file, optional if all the required arguments are specified on the command line
The workflow template configuration file can be used to specify any of a
workflow template's configuration parameters when those parameters are
not specified on the command line. This file is in yaml format.
Sample configuration files can be found
here
in the example_cfgs
directory in the EMEWS Creator github repository. Arguments
supplied on the command line will override those supplied in a configuration file.
If any required arguments are missing from the command line, then the
configuration file is required to supply the missing arguments.
Sweep
The sweep command creates a sweep workflow in which EMEWS reads an input file, and runs an application using each line of the input file as input to an application run.
Usage:
$ emewscreator sweep -h
Usage: emewscreator sweep [OPTIONS]
Options:
-c, --config PATH Path to the template configuration file
[required if any command line arguments are
missing]
-n, --workflow-name TEXT Name of the workflow
-h, --help Show this message and exit.
A sample sweep configuration file can be found here.
For a more thorough explanation of the sweep workflow, see the EMEWS Tutorial.
EQPy
The EQPy workflow template creates a workflow that uses EMEWS Queues for Python (EQPy) to run an application using input parameters provided by a Python model exploration (ME) algorithm. The workflow will start the Python ME which then iteratively provides json format input parameters for model execution.
Usage:
$ emewscreator eqpy -h
Usage: emewscreator eqpy [OPTIONS]
Options:
-c, --config PATH Path to the template configuration file
[required if any command line arguments are
missing]
-n, --workflow-name TEXT Name of the workflow
--module-name TEXT Python model exploration algorithm module
name
--me-cfg-file PATH Configuration file for the model exploration
algorithm
--trials INTEGER Number of trials / replicates to perform for
each model run. Defaults to 1
--model-output-file-name TEXT Model output base file name, file name only
(e.g., "output.csv")
--eqpy-dir PATH Directory where the eqpy extension is
located. If the extension does not exist at
this location it will be installed there.
Defaults to {output_dir}/ext/EQ-Py
-h, --help Show this message and exit.
In addition to the common configuration arguments described above, the eqpy template also has the following:
--module-name
- the Python module implementing the ME algorithm--me-cfg-file
- the path to a configuration file for the Python ME algorithm. This path will be passed to the Python ME when it is initialized. This is relative to the directory specified in--output-dir
.--trials
- the number of trials or replicates to perform for each model run. Defaults to 1.model-output-file-name
- each model run is passed a file path for writing its output. This is the name of that file.
In addition to the default set of files described in the
EMEWS Project Structure section, the eqpy workflow template will also
install the EQPy EMEWS Swift-t extension. By default, the extension will be installed in
in ext/EQ-Py
. An alternative location can be specified with the --eqpy-dir
configuration parameter.
--eqpy-dir
- specifies the location of the eqpy extension (defaults toext/EQ-Py
)
You can set this to use an existing EQ-Py extension, or if the specified location doesn't exist, the extension will be installed there.
The extension consists of the following files.
eqpy.py
EQPy.swift
These should not be edited by the user.
A sample eqpy
configuration file can be found here.
For a more thorough explanation of Python-based ME workflows, see the EMEWS Tutorial.
EQR
The EQR template creates a workflow that uses EMEWS Queues for R (EQR) to run an application using input parameters provided by a R model exploration (ME) algorithm. The workflow will start the R ME which then iteratively provides json format input parameters for model execution.
Note: The EQR extension requires an additional compilation step. Once the template has been run,
see {eqr_dir}/src/README.md
for compilation instructions.
Usage:
$ emewscreator eqr -h
Usage: emewscreator eqr [OPTIONS]
Options:
-c, --config PATH Path to the template configuration file
[required if any command line arguments are
missing]
-n, --workflow-name TEXT Name of the workflow
--script-file TEXT Path to the R model exploration algorithm
--me-cfg-file PATH Configuration file for the model exploration
algorithm
--trials INTEGER Number of trials / replicates to perform for
each model run
--model-output-file-name TEXT Model output base file name, file name only
(e.g., "output.csv")
--eqr-dir PATH Directory where the eqr extension is located.
If the extension does not exist at this
location it will be installed there. Defaults
to {output_dir}/ext/EQ-R
-h, --help Show this message and exit.
In addition to the common configuration parameters described above,
the eqr
template also has the following:
--script-file
- the path to the R script implementing the ME algorithm--me-cfg-file
- the path to a configuration file for the R ME algorithm. This path will be passed to the R ME when it is initialized. This path is relative to the directory specified by--output-dir
.--trials
- the number of trials or replicates to perform for each model run--model_output_file_name
- each model run is passed a file path for writing its output. This is the name of that file.
In addition to the default set of files described in the
EMEWS Project Structure section, the eqr workflow template will also
install the source for the EQ/R EMEWS Swift-t extension. By default, the extension will be installed
in ext/EQ-R
. An alternative location can be specified with the --eqr-dir
configuration argument.
--eqr-dir
- specifies the location of the eqr extension (defaults toext/EQ-R
)
You can set this to use an existing EQ-R extension, or if the specified location doesn't exist, the extension will be installed there.
The extension needs to be compiled before it can be used. See {eqr_dir}/src/README.md
for compilation instructions.
A sample EQR configuration file can be found here.
For a more thorough explanation of R-based ME workflows, see the EMEWS Tutorial.
INIT DB
The init_db
command creates the EQSQL database in a user specified directory. It assumes that the postgresql
binaries are availble in the user PATH, and that the eqsql package has been installed. The database name will
default to EQ_SQL
, and the database user to eqsql_user
. Database log messages will be written to
a db.log
file in the database directory.
Usage:
emewscreator init_db -h
Usage: emewscreator init_db [OPTIONS]
Options:
-d, --db-path PATH Database directory path. The database will be created in
this directory. [required]
-p, --port INTEGER The database port, if any.
-h, --help Show this message and exit.
init_db
takes the following arguments:
--db-path
- the directory in which to create the database. This must not already exist, and will be created by the running template.--port
- an optional port number for the database to listen for connections on. This is not required for a local database.
EQSQL
The EQSQL workflow template creates a workflow that submits tasks (such as application runs) to a queue implemented in a database. Worker pools pop tasks off this queue for evaluation, and push the results back to a database input queue. The tasks can be provided by a Python or R language model exploration (ME) algorithm.
Usage:
$emewscreator eqsql -h
Usage: emewscreator eqsql [OPTIONS]
Options:
-c, --config PATH Path to the template configuration file.
[required if any command line arguments are
missing]
--pool-id TEXT The name of the task worker pool.
--task-type INTEGER The task type id for the tasks consumed by
the worker pool.
-n, --workflow-name TEXT Name of the workflow.
--trials INTEGER Number of trials / replicates to perform for
each model run. Defaults to 1.
--model-output-file-name TEXT Model output base file name, file name only
(e.g., "output.csv").
--me-language [python|R|None] Model exploration algorithm programming
language: Python, R, or None.
--me-file-name TEXT The name of the model exploration algorithm
template file to generate. Omit the extension
(e.g., "algo", not "algo.py").
--me-cfg-file-name TEXT The name of the model exploration algorithm
configuration file.
--esql-db-path PATH The path to the eqsql database.
-h, --help Show this message and exit.
In addition to the common configuration arguments described above, the eqsql template also has the following:
--pool-id
- a unique identifier for the swift-t worker pool created by the template.--task-type
- an integer identifying the type of task the worker pool will consume.--trials
- the number of trials or replicates to perform for each task evalution. Defaults to 1.--model-output-file-name
- each task evaulation is passed a file path for writing its output. This is the name of that file.--me-language
- the ME programming language (R, Python, None). The template will create an example ME written in this language. If the value isNone
, then no ME example will be created.--me-cfg-file-name
- the name of the yaml format configuration file that gets passed to the example ME to configure it.--esql-db-path
- the path to the eqsql database. This is used in the example ME to start the database.
A sample eqsql
configuration file can be found here.
The swift file created by the eqsql
template is a worker pool that polls the database
for tasks of the specified type to evaluate. The results of those task evaluations are
pushed back to the database together with the pool id. The example ME contains example
code for submitting tasks to the database and working with the completed tasks.
For a more thorough explanation of EQSQL-based ME workflows, see the EMEWS Tutorial.
HPC Parameters
The workflow templates' configuration file (specified with the --config
argument)
can also contain optional entries for running the workflow on an HPC system
where a job is submitted via an HPC scheduler (e.g., the slurm scheduler).
See your HPC resource's documentation for details on how to set these.
walltime
- the estimated duration of the workflow job. The value must be surrounded by single quotes.queue
- the queue to run the workflow job onproject
- the project to run the workflow job withnodes
- the number of nodes to allocate to the workflow jobppn
- the number of processes per node to allocate to the workflow job
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file emewscreator-1.0.1.tar.gz
.
File metadata
- Download URL: emewscreator-1.0.1.tar.gz
- Upload date:
- Size: 35.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b360120a6d5c8bc38ec6eb23eefa1ea65e314c96d34b4e403d66a6725084f995 |
|
MD5 | 6eb32e357afb14c76fb305e998ec47cf |
|
BLAKE2b-256 | ef9902aa188ff9bf79dc17aad3b23ab18be067af08f23a0ba68d83cc5154d405 |
File details
Details for the file emewscreator-1.0.1-py3-none-any.whl
.
File metadata
- Download URL: emewscreator-1.0.1-py3-none-any.whl
- Upload date:
- Size: 49.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 70cdd01da9d65750240830fcdba3eb9a7072d7d8ab4fef3963eeeed330dd0aa2 |
|
MD5 | f36006e8f46750d1ff681f13702ab26e |
|
BLAKE2b-256 | 1963e6d4ded53efea8028d29a224176ce6d6d6fa1aef739cd13e1f71f298d18e |