FSL Cluster Submission Plugin for Slurm
Project description
fsl_sub_plugin_slurm
Job submission to SLURM variant cluster queues.
Copyright 2018-2024 University of Oxford, Oxford, UK.
Introduction
fsl_sub provides a consistent interface to various cluster backends, with a fall back to running tasks locally where no cluster is available. This fsl_sub plugin provides support for submitting tasks to SLURM clusters.
For installation instructions please see INSTALL.md; for building packages see BUILD.md and for changes see CHANGES.md.
Configuration
Use the command:
fsl_sub_config slurm > fsl_sub.yml
to generate an example configuration, including queue definitions gleaned from the SLURM software - check these, paying attention to any warnings generated.
Use the fsl_sub.yml
file as per the main fsl_sub documentation.
The configuration for the SLURM plugin is in the method_opts section, under the key slurm.
Method options
Key | Values (default/recommended in bold) | Description |
---|---|---|
queues | True | Does this method use queues/partitions (should be always be True). |
memory_in_gb | True/False | Whether SLURM reports memory in GB - normally false. |
copy_environment | True/False | Whether to replicate the environment variables in the shell that called fsl_sub into the job's shell. |
has_parallel_envs | False/True | SLURM does not provide parallel environments so this should always be false. |
script_conf | True/False | Whether --usesscript option to fsl_sub is available via this method. This option allows you to define the grid options as comments in a shell script and then provide this to the cluster for running. Should be set to True. |
mail_support | True/False | Whether the grid installation is configured to send email on job events. |
mail_modes | Dictionary of option lists | If the grid has email notifications turned on, this option configures the submission options for different verbosity levels, 'b' = job start, 'e' = job end, 'a' = job abort, 'f' = all events, 'n' = no mail. Each event type should then have a list of submission mail arguments that will be applied to the submitted job. Typically, these should not be edited. |
mail_mode | b/e/a/f/n | Which of the above mail_modes to use by default. |
notify_ram_usage | True/False | Whether to notify SLURM of the RAM you have requested. SLURM is typically configured to give jobs a small RAM allocation so you will invariably need this set to true. This helps the scheduler allocate resources efficiently, however, this RAM limit will be enforced. If you are unsure of your RAM requirements, this can be disabled on a per-job basis by setting the environment variable FSLSUB_NORAMLIMIT to '1' (or 'True'). |
set_time_limit | True/False | Whether to notify SLURM of the expected maximum run-time of your job. This helps the scheduler fill in reserved slots (for e.g. parallel environment jobs), however, this time limit will be enforced, resulting in a job being killed if it is exceeded, even if this is less than the queue run-time limit. This can be disabled on a per-job basis by setting the environment variable FSLSUB_NOTIMELIMIT to '1' (or 'True'). |
array_holds | True/False | Enable support array holds, e.g. sub-task 1 waits for parent sub-task 1. |
array_limit | True/False | Enable limiting number of concurrent array tasks. |
job_resources | True/False | Enable additional job resource specification support. |
projects | True/False | Enable support for projects typically used auditing/charging purposes. |
preseve_modules | True/False | Whether to re-load shell modules on the compute node. Required if you have multiple CPU generations and per-generation optimised libraries configured with modules. |
add_module_paths | []/ a list | List of file system paths to search for modules in addition to the system defined ones. Useful if you have your own shell modules directory but need to allow the compute node to auto-set it's MODULEPATH environment variable (e.g. to a architecture specific folder). Only used when preserve_modules is True. |
export_vars | []/List | List of environment variables that should transfered with the job to the compute node. |
keep_jobscript | True/False | Whether to preserve the generated wrapper in a file wrapper_<jobid>.sh . This file contains sufficient information to resubmit this job in the future. If the job fails to submit, this will be preserved in the file wrapper_failed_<DD>-<MMM>-<YYYY>_<HH><MM><SS>.sh . |
extra_args | []/List | List of additional SLURM arguments to pass through to the sheduler. |
strict_dependencies | True/False | Whether to use 'afterok' (True) or 'afterany' (False) when specifying simple job dependencies. This can also be controlled by the environment variable FSLSUB_STRICTDEPS (True="1", False="0"). |
allow_nested_queuing | True/False | Whether fsl_sub, when called from within a cluster job, should be able to submit further jobs (True) or run subsequent jobs with the shell plugin. You can override this on a per-job or session basis using the environmet |
Coprocessor Configuration
This plugin is not capable of automatically determining all the necessary information to configure your co-processors but will advise of the information it can find and propose queue definitions for these GPU resources.
SLURM typically selects GPU resources with a GRES (Generic RESource) that defines the type and quantity of the co-processor. Where multiple classes of co-processor are available this might be selectable via the GRES or you may need to provide a constraint. If you would like to be able to support running on a class and all superior devices you need to be able to use constraints as GRES requests do not support logical combinations. The automatically generated configuration should include useful information about your GRES and constraints, but should you wish to obtain this information yourself use the commands:
sinfo -p <partition> -o %G
- This will list all the GRES defined on <partition>.sinfo -p <partition> -o %f
- This will list all features selectable by a--constraint
as a comma-separated list.
Typically CUDA resources will be controlled using GRES or constraints with gpu in the name, so look for these.
For each coprocessor hardware type you need a sub-section given an identifier than will be used to request this type of coprocessor. For CUDA processors this sub-section must be called 'cuda' to ensure that FSL tools can auto-detect and use CUDA hardware/queues.
Key | Values (default/recommended in bold) | Description |
---|---|---|
resource | String | GRES that, when requested, selects machines with the hardware present, e.g. gpu. |
classes | True/False | Whether more than one type of this co-processor is available |
include_more_capable | True/False | Whether to automatically request all classes that are more capable than the requested class. This requires the class_constraints option to be set to True and for your SLURM cluster to be set up with GPU features/constraints |
class_types | Configuration dictionary | This contains the definition of the GPU classes... |
Key | ||
class selector | This is the letter (or word) that is used to select this class of co-processor from the fsl_sub commandline. For CUDA devices you may consider using the card name e.g. A100. | |
resource | This is the name of the SLURM GRES 'type' or contraint that will be used to select this GPU family. | |
doc | The description that appears in the fsl_sub help text about this device. | |
capability | An integer defining the feature set of the device, your most basic device should be given the value 1 and more capable devices higher values, e.g. GTX = 1, Kelper = 2, Pascal = 3, Volta = 4. | |
default_class | Class type key | The class selector for the class to assign jobs to where a class has not been specified in the fsl_sub call. For FSL tools that automatically submit to CUDA queues you should aim to select one that has good double-precision performance (K40|80, P100, V100, A30, A100, H100, H200 or B200) and ensure all higher capability devices also have good double-precision. |
class_constraint | False/string | Whether your SLURM cluster is configured to use constraints to select co-processor models/features. If so this should be set to the name of the feature that selects between the models and the co-processor class resource strings set appropriately to match the available values. |
presence_test | Program path (nvidia-smi for CUDA) | The name of a program that can be used to look for this coprocessor type, for example nvidia-smi for CUDA devices. Program needs to return non-zero exit status if there are no available coprocessors. |
GRES vs Constraints
When using GRES to select GPUs and types, it is only possible to select all GPUs or on class of card, not a mixture. To be able to support this, you have two options, one switch to using constraints to select the GPUs or advise users to add --constraint arguments using the --extra
fsl_sub argument.
Queue Definitions
Slurm refers to queues as partitions. The example configuration should contain definitions for the automatically discovered partitions but you should review these, in particular any warnings generated. To query SLURM for queue information you can use the following SLURM commands.
To get a list of all available partitions use:
sinfo -s -o %P
Then the details for a queue can be obtained with:
sinfo -p partitionname -O 'CPUs,MaxCPUsPerNode,Memory,Time,NodeHost'
This will return details for every node within that partition. The queue definition should then be setup as follows:
Key | Value type | Description |
---|---|---|
time | integer in minutes | The TIMELIMIT column reports in days-hours:minutes:seconds, this needs converting to minutes. Provide the maximum value observed, but if there are multiple values you should consider enabling job time notification so that SLURM can select the correct node. |
max_size | integer in GB | This is the maximum permitted memory on a node. This is usually reported by SLURM in MB, so for example 63000 should be configured as 63 (GB). It is equal to the maximum MEMORY value reported. Once again, if there are multiple node types you should turn on RAM nofitication so that nodes can be correctly selected. |
max_slots | integer | CPU contains the number of CPUs (threads) available on each node. Set this option to the maximum number reported. |
slot_size | Null/integer in GB | This is largely meaningless on SLURM and left at None. If you find that you need to get fsl_sub to split your job into multiple threads to achieve your memory requirements then set this to the figure provided by your cluster manager. |
group | integer | (Optional) All partitions with the same group number will be considered together when scheduling, typically this would be all queues with the same run time but differing memory/core counts. |
priority | integer | (Optional) Priority within a group - higher wins. |
default | True | Is this the default queue when no time/RAM details provided. |
copros | Co-processor dictionary | Optional If this queue has hosts with co-processors (e.g. CUDA devices), then provide this entry, with a key identical to the associated co-processor definition, e.g. cuda. |
max_quantity | An integer representing the maximum number of this coprocessor type available on a single compute node. This can be obtained by looking at the complexes entry of qconf -se <hostname> for all of the hosts in this queue. If the complex is gpu then an entry of gpu=2 would indicated that this value should be set to 2. |
|
classes | A list of coprocessor classes (as defined in the coprocessor configuration section) that this queue has hardware for. | |
exclusive | True/False - Whether this queue is only used for co-processor requiring tasks. |
Where a partition has obvious GRES or features that define GPUs a proposed GPU configuration will be added as comments to the start of the queue definition. You should review this, create/update the coproc_opts>cuda record with the information in the comments and then this section can be uncommented to enable GPU support.
Compound Queues
Some clusters may be configured with multiple variants of the same partition, e.g. short.a, short.b, with each queue having different hardware, perhaps CPU generation or maximum memory or memory available per slot. To maximise scheduling options you can define compound queues which have the configuration of the least capable constituent. To define a compound queue, the queue name (key of the YAML dictionary) should be a comma separated list of queue names (no space).
SLURM specific usage
Job dependencies
The default dependency handler (-j
) takes a job id and uses --dependency=afterany:\<jobid>
to control job hierarchies of non-array tasks. To switch to the old behaviour (pre-version 1.6.0) of using afterok
(ancestor job must complete successfully), either modify the method configuration to set strict_dependencies
to True or set the environment variable FSLSUB_STRICTDEPS
to "1".
Where you need to specify complex dependencies you may pass the raw SLURM dependency description (without the --dependency=
) in the -j
argument. For array tasks, the array_hold
argument is used instead of the -j
argument. This can take the same forms as the -j
argument (job ids or complex hold descriptions).
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file fsl_sub_plugin_slurm-1.6.5.tar.gz
.
File metadata
- Download URL: fsl_sub_plugin_slurm-1.6.5.tar.gz
- Upload date:
- Size: 38.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4078cc254ae381d40423c33ae65fbc0657a759db92affe994767159a22887a23 |
|
MD5 | 762ef704241d766c0d2d05512fb8a992 |
|
BLAKE2b-256 | c0cdda7243e534e1b4da74b283d1d3a6a02cbe7d07662723fae4c82215d52880 |
File details
Details for the file fsl_sub_plugin_slurm-1.6.5-py3-none-any.whl
.
File metadata
- Download URL: fsl_sub_plugin_slurm-1.6.5-py3-none-any.whl
- Upload date:
- Size: 31.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f35d4b9cad342f9d14818399f293ac9b7f280b509284ba66ba9d9989645e742a |
|
MD5 | 20cd0da18e4f7803865a251a982cdb37 |
|
BLAKE2b-256 | 7fd8c786811ae69e4a222955ef14b54d198cf35ac45cfa6daea16410a4a8bf99 |