Autosubmit: a versatile tool for managing Global Climate Coupled Models in Supercomputing Environments
Project description
Autosubmit is a tool to create, manage and monitor experiments by using
configured Computing Clusters, HPC's and Supercomputers remotely via ssh.
HOW TO DEPLOY/SETUP AUTOSUBMIT FRAMEWORK
========================================
- Autosubmit has been tested:
with the following Operating Systems:
* Linux Debian
on the following HPC's/Clusters:
* Ithaca (IC3 machine)
* MareNostrum (BSC machine)
* MareNostrum3 (BSC machine)
* HECToR (EPCC machine)
* Lindgren (PDC machine)
* C2A (ECMWF machine)
* ARCHER (EPCC machine)
- Pre-requisties: These packages (bash, python2, sqlite3, git-scm > 1.8.2, subversion) must be available at local
machine. These packages (argparse, dateutil, pyparsing, numpy, pydotplus, matplotlib) must be available for
python runtime. And the machine is also able to access HPC's/Clusters via password-less ssh.
- Install Autosubmit
> pip install autosubmit
or download, unpack and "python setup.py install"
- Create a repository for experiments: Say for example "/cfu/autosubmit" then
edit the repository path (LOCAL_ROOT_DIR) into autosubmit/config/dir_config.py
- Create a blank database: Say for example "autosubmit.db" at above created repository:
> cp autosubmit/database/data/autosubmit.sql /cfu/autosubmit/
> cd /cfu/autosubmit
> sqlite3 autosubmit.db
sqlite3>.read autosubmit.sql
> chmod 775 autosubmit.db
then edit the database file path and name (DB_DIR, DB_FILE, DB_NAME) into autosubmit/config/dir_config.py
HOW TO USE AUTOSUBMIT
=====================
To run AUTOSUBMiT experiments at CFU a production environment is set up at the local virtual machine "enterprise".
> cd bin
> python expid.py -h
> python expid.py --new --HPC ithaca --description "experiment is about..."
Say for example, "cxxx" is 4 character based expid generated by system automatically.
First character "c" represents the platform such as "i" for ithaca, "b" for bsc,
"h" for hector, "l" for lindgren, "e" for ecmwf and "m" for marenostrum3 etc. While rest
of three characters "xxx" are to represent unique alphanumeric identity for the experiment.
> vi /cfu/autosubmit/cxxx/conf/expdef_cxxx.conf
> vi /cfu/autosubmit/cxxx/conf/autosubmit_cxxx.conf
> python create_exp.py cxxx
> ssh enterprise
> cd bin
> nohup python autosubmit.py cxxx >& cxxx_01.log &
Cautions:
- Before launching autosubmit check the following stuff:
> ssh ithaca # say for example similarly check other HPC's where password-less ssh is feasible
- After launching autosubmit, one must be aware of login expeiry limit and policy (if applicable for any HPC)
and renew the login access accordingly (by using token/key etc) before expiry.
HOW TO MONITOR EXPERIMENT
=========================
> cd bin
> python monitor.py -h
> python monitor.py -e cxxx -j job_list -o pdf
or
> python monitor.py -e cxxx -j job_list -o png
Above generated plot with date & time stamp can be found at:
/cfu/autosubmit/cxxx/plot/cxxx_date_time.pdf
or
/cfu/autosubmit/cxxx/plot/cxxx_date_time.png
HOW TO RESTART EXPERIMENT
=========================
> cd bin
> python recovery.py -h
> python recovery.py -e cxxx -j job_list -g # getting/fetching completed files
> python recovery.py -e cxxx -j job_list -s # saving the pickle file
> nohup python autosubmit.py cxxx >& cxxx_02.log &
HOW TO RERUN/EXTEND EXPERIMENT
==============================
> ssh enterprise
> cd bin
> vi /cfu/autosubmit/cxxx/conf/expdef_cxxx.conf # modify RERUN, CHUNKLIST
> python create_exp.py cxxx
> nohup python autosubmit.py cxxx >& cxxx_03.log &
Monitor for RERUN
------------------
> python monitor.py -e cxxx -j rerun_job_list -o pdf
Recovery for RERUN
-------------------
> python recovery.py -e cxxx -j rerun_job_list -g
> python recovery.py -e cxxx -j rerun_job_list -s
configured Computing Clusters, HPC's and Supercomputers remotely via ssh.
HOW TO DEPLOY/SETUP AUTOSUBMIT FRAMEWORK
========================================
- Autosubmit has been tested:
with the following Operating Systems:
* Linux Debian
on the following HPC's/Clusters:
* Ithaca (IC3 machine)
* MareNostrum (BSC machine)
* MareNostrum3 (BSC machine)
* HECToR (EPCC machine)
* Lindgren (PDC machine)
* C2A (ECMWF machine)
* ARCHER (EPCC machine)
- Pre-requisties: These packages (bash, python2, sqlite3, git-scm > 1.8.2, subversion) must be available at local
machine. These packages (argparse, dateutil, pyparsing, numpy, pydotplus, matplotlib) must be available for
python runtime. And the machine is also able to access HPC's/Clusters via password-less ssh.
- Install Autosubmit
> pip install autosubmit
or download, unpack and "python setup.py install"
- Create a repository for experiments: Say for example "/cfu/autosubmit" then
edit the repository path (LOCAL_ROOT_DIR) into autosubmit/config/dir_config.py
- Create a blank database: Say for example "autosubmit.db" at above created repository:
> cp autosubmit/database/data/autosubmit.sql /cfu/autosubmit/
> cd /cfu/autosubmit
> sqlite3 autosubmit.db
sqlite3>.read autosubmit.sql
> chmod 775 autosubmit.db
then edit the database file path and name (DB_DIR, DB_FILE, DB_NAME) into autosubmit/config/dir_config.py
HOW TO USE AUTOSUBMIT
=====================
To run AUTOSUBMiT experiments at CFU a production environment is set up at the local virtual machine "enterprise".
> cd bin
> python expid.py -h
> python expid.py --new --HPC ithaca --description "experiment is about..."
Say for example, "cxxx" is 4 character based expid generated by system automatically.
First character "c" represents the platform such as "i" for ithaca, "b" for bsc,
"h" for hector, "l" for lindgren, "e" for ecmwf and "m" for marenostrum3 etc. While rest
of three characters "xxx" are to represent unique alphanumeric identity for the experiment.
> vi /cfu/autosubmit/cxxx/conf/expdef_cxxx.conf
> vi /cfu/autosubmit/cxxx/conf/autosubmit_cxxx.conf
> python create_exp.py cxxx
> ssh enterprise
> cd bin
> nohup python autosubmit.py cxxx >& cxxx_01.log &
Cautions:
- Before launching autosubmit check the following stuff:
> ssh ithaca # say for example similarly check other HPC's where password-less ssh is feasible
- After launching autosubmit, one must be aware of login expeiry limit and policy (if applicable for any HPC)
and renew the login access accordingly (by using token/key etc) before expiry.
HOW TO MONITOR EXPERIMENT
=========================
> cd bin
> python monitor.py -h
> python monitor.py -e cxxx -j job_list -o pdf
or
> python monitor.py -e cxxx -j job_list -o png
Above generated plot with date & time stamp can be found at:
/cfu/autosubmit/cxxx/plot/cxxx_date_time.pdf
or
/cfu/autosubmit/cxxx/plot/cxxx_date_time.png
HOW TO RESTART EXPERIMENT
=========================
> cd bin
> python recovery.py -h
> python recovery.py -e cxxx -j job_list -g # getting/fetching completed files
> python recovery.py -e cxxx -j job_list -s # saving the pickle file
> nohup python autosubmit.py cxxx >& cxxx_02.log &
HOW TO RERUN/EXTEND EXPERIMENT
==============================
> ssh enterprise
> cd bin
> vi /cfu/autosubmit/cxxx/conf/expdef_cxxx.conf # modify RERUN, CHUNKLIST
> python create_exp.py cxxx
> nohup python autosubmit.py cxxx >& cxxx_03.log &
Monitor for RERUN
------------------
> python monitor.py -e cxxx -j rerun_job_list -o pdf
Recovery for RERUN
-------------------
> python recovery.py -e cxxx -j rerun_job_list -g
> python recovery.py -e cxxx -j rerun_job_list -s
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
autosubmit-3.0.0a24.tar.gz
(50.1 kB
view details)
File details
Details for the file autosubmit-3.0.0a24.tar.gz
.
File metadata
- Download URL: autosubmit-3.0.0a24.tar.gz
- Upload date:
- Size: 50.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 45d9fe3940317aa2093a7b7c80ce82e34727e029c6b20e93dbad170c400dea8e |
|
MD5 | 25747214d468004f4ed9d38ef5cee2ad |
|
BLAKE2b-256 | 4d2924f84b964d2c301d658293cb435d8fe80a27719e07ec69c441b4e40839da |