Skip to main content

Autosubmit: a versatile tool for managing Global Climate Coupled Models in Supercomputing Environments

Project description

Autosubmit is a tool to create, manage and monitor experiments by using
configured Computing Clusters, HPC's and Supercomputers remotely via ssh.


HOW TO DEPLOY/SETUP AUTOSUBMIT FRAMEWORK
========================================

- Autosubmit has been tested:

with the following Operating Systems:
* Linux Debian

on the following HPC's/Clusters:
* Ithaca (IC3 machine)
* MareNostrum (BSC machine)
* MareNostrum3 (BSC machine)
* HECToR (EPCC machine)
* Lindgren (PDC machine)
* C2A (ECMWF machine)
* ARCHER (EPCC machine)

- Pre-requisties: These packages (bash, python2, sqlite3, git-scm > 1.8.2, subversion) must be available at local
machine. These packages (argparse, dateutil, pyparsing, numpy, pydotplus, matplotlib) must be available for
python runtime. And the machine is also able to access HPC's/Clusters via password-less ssh.

- Install Autosubmit
> pip install autosubmit
or download, unpack and "python setup.py install"

- Create a repository for experiments: Say for example "/cfu/autosubmit" then
edit the repository path (LOCAL_ROOT_DIR) into autosubmit/config/dir_config.py

- Create a blank database: Say for example "autosubmit.db" at above created repository:
> cp autosubmit/database/data/autosubmit.sql /cfu/autosubmit/
> cd /cfu/autosubmit
> sqlite3 autosubmit.db
sqlite3>.read autosubmit.sql
> chmod 775 autosubmit.db
then edit the database file path and name (DB_DIR, DB_FILE, DB_NAME) into autosubmit/config/dir_config.py


HOW TO USE AUTOSUBMIT
=====================

To run AUTOSUBMiT experiments at CFU a production environment is set up at the local virtual machine "enterprise".

> cd bin

> python expid.py -h

> python expid.py --new --HPC ithaca --description "experiment is about..."

Say for example, "cxxx" is 4 character based expid generated by system automatically.
First character "c" represents the platform such as "i" for ithaca, "b" for bsc,
"h" for hector, "l" for lindgren, "e" for ecmwf and "m" for marenostrum3 etc. While rest
of three characters "xxx" are to represent unique alphanumeric identity for the experiment.

> vi /cfu/autosubmit/cxxx/conf/expdef_cxxx.conf

> vi /cfu/autosubmit/cxxx/conf/autosubmit_cxxx.conf

> python create_exp.py cxxx

> ssh enterprise

> cd bin

> nohup python autosubmit.py cxxx >& cxxx_01.log &

Cautions:
- Before launching autosubmit check the following stuff:
> ssh ithaca # say for example similarly check other HPC's where password-less ssh is feasible
- After launching autosubmit, one must be aware of login expeiry limit and policy (if applicable for any HPC)
and renew the login access accordingly (by using token/key etc) before expiry.

HOW TO MONITOR EXPERIMENT
=========================

> cd bin

> python monitor.py -h

> python monitor.py -e cxxx -j job_list -o pdf
or
> python monitor.py -e cxxx -j job_list -o png

Above generated plot with date & time stamp can be found at:

/cfu/autosubmit/cxxx/plot/cxxx_date_time.pdf
or
/cfu/autosubmit/cxxx/plot/cxxx_date_time.png


HOW TO RESTART EXPERIMENT
=========================

> cd bin

> python recovery.py -h

> python recovery.py -e cxxx -j job_list -g # getting/fetching completed files

> python recovery.py -e cxxx -j job_list -s # saving the pickle file

> nohup python autosubmit.py cxxx >& cxxx_02.log &


HOW TO RERUN/EXTEND EXPERIMENT
==============================

> ssh enterprise

> cd bin

> vi /cfu/autosubmit/cxxx/conf/expdef_cxxx.conf # modify RERUN, CHUNKLIST

> python create_exp.py cxxx

> nohup python autosubmit.py cxxx >& cxxx_03.log &

Monitor for RERUN
------------------

> python monitor.py -e cxxx -j rerun_job_list -o pdf

Recovery for RERUN
-------------------

> python recovery.py -e cxxx -j rerun_job_list -g

> python recovery.py -e cxxx -j rerun_job_list -s


Project details


Release history Release notifications

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Filename, size & hash SHA256 hash help File type Python version Upload date
autosubmit-3.0.0a24.tar.gz (50.1 kB) Copy SHA256 hash SHA256 Source None

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN SignalFx SignalFx Supporter DigiCert DigiCert EV certificate StatusPage StatusPage Status page