A tool to orchestrate branch-based workflows and automate job submission for ACCESS experiments.
Project description
access-experiment-runner
About
The main role of the ACCESS experiment runner is to manage and monitor experiment job runs on the supercomputing environment (e.g., Gadi). It builds on Payu, handling the orchestration of multiple configuration branches, experiment setup, and job lifecycle.
Key features
-
Leverages
Payuand run multiple experiments from different configuration branches. -
Submits and tracks PBS jobs on
Gadi; oversees job lifecycle from submission through completion.- When a job completes within expected run times, the tool prints a confirmation and stops further submissions.
- If a job fails, users may choose to inspect the working directory to diagnose the root cause. The tool will detect the failure and pause further actions, giving the user control over whether to resubmit.
- Detects already running or queued jobs and avoids redundant submissions—quickly skips duplicates with a user notification.
Installation
User setup
The experiment-runner is installed in the payu-dev conda environment, hence loading payu/dev would directly make experiment-runner available for use.
module use /g/data/vk83/prerelease/modules && module load payu/dev
Alternatively, create and activate a python virtual environment, then install via pip,
python3 -m venv <path/to/venv> --system-site-packages
source <path/to/venv>/bin/activate
pip install experiment-runner
Development setup
For contributors and developers, setup a development environment,
git clone https://github.com/ACCESS-NRI/access-experiment-runner.git
cd access-experiment-runner
# under a virtual environment
pip install -e .
Usage
experiment-runner -i --help
usage: experiment-runner [-h] [-i INPUT_YAML_FILE]
Manage ACCESS experiments using configurable YAML input.
If no YAML file is specified, the tool will look for 'Experiment_runner.yaml' in the current directory.
If that file is missing, you must specify one with -i / --input-yaml-file.
options:
-h, --help show this help message and exit
-i INPUT_YAML_FILE, --input-yaml-file INPUT_YAML_FILE
Path to the YAML file specifying parameter values for experiment runs.
Defaults to 'Experiment_runner.yaml' if present in the current directory.
One YAML example is provided in example/Experiment_runner_example.yaml
test_path: /g/data/{PROJECT}/{USER}/prototype-0.1.0
repository_directory: 1deg_jra55_ryf
running_branches: [ctrl, perturb_1, perturb_2]
keep_uuid: True
nruns: [1,1,1]
where,
test_path: The base path to the experiment repository on the filesystem. In this case, it points to a prototype experiment runner checkout.
repository_directory: The specific experiment configuration directory inside test_path. Here it is the 1deg_jra55_ryf setup.
running_branches: A list of git branches representing experiments to run.
keep_uuid: Preserve unique identifiers (UUIDs) across runs.
nruns: A list indicating how many runs to perform for each branch listed in running_branches.
Workflow example
- Trigger the experiment
experiment-runner -i example/Experiment_runner_example.yaml
- The tool then checks status:
- Completed:
... already completed " {doneruns}, hence no new runs.
- Failed:
Clean up a failed job {work_dir} and prepare it for resubmission.
- Running/Queued:
You have duplicated runs for in the same folder hence not submitting this job!
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file experiment_runner-0.0.1.tar.gz.
File metadata
- Download URL: experiment_runner-0.0.1.tar.gz
- Upload date:
- Size: 21.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
16ea6545cb97882a12fcb600faa0b49ff6744f9694aa7dad2c323d01d7de6e8b
|
|
| MD5 |
a698dc652d0c81828e30fa0663287a95
|
|
| BLAKE2b-256 |
b0db35a3858154ea598e34ad99172dba0e04c6ca1b8705ba90720bcf036ddcde
|
Provenance
The following attestation bundles were made for experiment_runner-0.0.1.tar.gz:
Publisher:
cd.yml on ACCESS-NRI/access-experiment-runner
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
experiment_runner-0.0.1.tar.gz -
Subject digest:
16ea6545cb97882a12fcb600faa0b49ff6744f9694aa7dad2c323d01d7de6e8b - Sigstore transparency entry: 431968532
- Sigstore integration time:
-
Permalink:
ACCESS-NRI/access-experiment-runner@35fa1691a6e7ce69625534efa22e5a0ea086666b -
Branch / Tag:
refs/tags/v0.0.1 - Owner: https://github.com/ACCESS-NRI
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
cd.yml@35fa1691a6e7ce69625534efa22e5a0ea086666b -
Trigger Event:
push
-
Statement type:
File details
Details for the file experiment_runner-0.0.1-py3-none-any.whl.
File metadata
- Download URL: experiment_runner-0.0.1-py3-none-any.whl
- Upload date:
- Size: 13.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9c396ffb0ed86209b3b3818422706bcb7480e7e18080b1e9275e635aaa119dc4
|
|
| MD5 |
534a31c7747485fb6e2a58580b2f2483
|
|
| BLAKE2b-256 |
acd9f038759af2baebdc4d4c562177a9d98b5c4fd2c9c4f20a036d58e7dc4d9a
|
Provenance
The following attestation bundles were made for experiment_runner-0.0.1-py3-none-any.whl:
Publisher:
cd.yml on ACCESS-NRI/access-experiment-runner
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
experiment_runner-0.0.1-py3-none-any.whl -
Subject digest:
9c396ffb0ed86209b3b3818422706bcb7480e7e18080b1e9275e635aaa119dc4 - Sigstore transparency entry: 431968544
- Sigstore integration time:
-
Permalink:
ACCESS-NRI/access-experiment-runner@35fa1691a6e7ce69625534efa22e5a0ea086666b -
Branch / Tag:
refs/tags/v0.0.1 - Owner: https://github.com/ACCESS-NRI
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
cd.yml@35fa1691a6e7ce69625534efa22e5a0ea086666b -
Trigger Event:
push
-
Statement type: