Skip to main content

Submit jobs to SLURM seamlessly

Project description

SLURM emission

For those of you who make heavy use of High Performance Computing (HPC) clusters that depend on SLURM, you might have noticed that submitting jobs to the cluster can be a bit of a hassle. This is especially true when you have to submit multiple jobs with similar scripts but different parameters. Fortunately, slurm_emission comes for the rescue. In fact,

  • it automates the creation of the sh file
  • and it simplifies the submission of jobs to the cluster when the scripts to reuse are similar, and only the parameters change

You can install it with

pip install slurm-emission

I use it constantly so I thought it might be useful for you as well.

Example

Here we go in detail through what you can find in the example_1 script. Let's define the parameters of the jobs, the number of gpus, cpus and memory we'll need. Also, we want to repeat the experiments for several settings, in this case, we have two datasets, two models, and four seeds. Remember to adapt the script.py code to be able to receive those arguments as argparse arguments. We define also the script location and the name of the script to run.

from slurm_emission import run_experiments

script_path = 'path/to/your/script'
script_name = 'script.py'

sbatch_args = {
    'job-name': 'example_1',
    'partition': 'gpu',
    'gres': 'gpu:1',
    'cpus-per-task': 4,
    'mem': '40G',
    'account': '1230e98kal',
    'time': '23:00:00',
}

id = 'llms'

experiments = []

datasets = ['cifar', 'mnist']
models = ['transformer', 'lstm']

experiment = {
    'seed': list(range(4)),
    'epochs': [300], 'model': models, 'dataset': datasets
}
experiments.append(experiment)

Finally, we define the bash lines that will go in the sh, which are the lines that will be executed before the script, and will ask the system to load the necessary modules and activate the conda environment. Then we submit the jobs with run_experiments function, which will create the sh file and submit the jobs to the cluster.

load_modules = 'module load conda'
activate_env = 'conda activate llms'
py_location = f'cd {script_path}'
bash_prelines = f'{load_modules}\n{activate_env}\n{py_location}'

run_experiments(
    experiments,
    init_command=f'python {script_name} ',
    sbatch_args=sbatch_args,
    bash_prelines=bash_prelines,
    id=id,
)

The output of this script will be a .sh file with the following inside

#!/bin/bash
#SBATCH --job-name=example_1
#SBATCH --partition=gpu
#SBATCH --gres=gpu:1
#SBATCH --cpus-per-task=4
#SBATCH --mem=40G
#SBATCH --account=1230e98kal
#SBATCH --time=23:00:00

module load conda
conda activate llms
cd path/to/your/script
$1

that will be used by all the jobs that will be submitted:

Number jobs: 16/16
1/16 sbatch cdir\sh\llms--2024-06-07_11-49-47OukHy.sh 'python script.py --seed=2 --epochs=300 --model=lstm --dataset=cifar '
2/16 sbatch cdir\sh\llms--2024-06-07_11-49-47OukHy.sh 'python script.py --seed=3 --epochs=300 --model=lstm --dataset=cifar '
3/16 sbatch cdir\sh\llms--2024-06-07_11-49-47OukHy.sh 'python script.py --seed=1 --epochs=300 --model=transformer --dataset=mnist '
4/16 sbatch cdir\sh\llms--2024-06-07_11-49-47OukHy.sh 'python script.py --seed=0 --epochs=300 --model=transformer --dataset=mnist '
5/16 sbatch cdir\sh\llms--2024-06-07_11-49-47OukHy.sh 'python script.py --seed=2 --epochs=300 --model=transformer --dataset=mnist '
6/16 sbatch cdir\sh\llms--2024-06-07_11-49-47OukHy.sh 'python script.py --seed=0 --epochs=300 --model=lstm --dataset=cifar '
7/16 sbatch cdir\sh\llms--2024-06-07_11-49-47OukHy.sh 'python script.py --seed=0 --epochs=300 --model=lstm --dataset=mnist '
8/16 sbatch cdir\sh\llms--2024-06-07_11-49-47OukHy.sh 'python script.py --seed=2 --epochs=300 --model=lstm --dataset=mnist '
9/16 sbatch cdir\sh\llms--2024-06-07_11-49-47OukHy.sh 'python script.py --seed=3 --epochs=300 --model=transformer --dataset=mnist '
10/16 sbatch cdir\sh\llms--2024-06-07_11-49-47OukHy.sh 'python script.py --seed=1 --epochs=300 --model=lstm --dataset=mnist '
11/16 sbatch cdir\sh\llms--2024-06-07_11-49-47OukHy.sh 'python script.py --seed=2 --epochs=300 --model=transformer --dataset=cifar '
12/16 sbatch cdir\sh\llms--2024-06-07_11-49-47OukHy.sh 'python script.py --seed=1 --epochs=300 --model=transformer --dataset=cifar '
13/16 sbatch cdir\sh\llms--2024-06-07_11-49-47OukHy.sh 'python script.py --seed=0 --epochs=300 --model=transformer --dataset=cifar '
14/16 sbatch cdir\sh\llms--2024-06-07_11-49-47OukHy.sh 'python script.py --seed=1 --epochs=300 --model=lstm --dataset=cifar '
15/16 sbatch cdir\sh\llms--2024-06-07_11-49-47OukHy.sh 'python script.py --seed=3 --epochs=300 --model=lstm --dataset=mnist '
16/16 sbatch cdir\sh\llms--2024-06-07_11-49-47OukHy.sh 'python script.py --seed=3 --epochs=300 --model=transformer --dataset=cifar '
Number jobs: 16/16

Hope it helps!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

slurm_emission-0.1.0.tar.gz (5.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

slurm_emission-0.1.0-py3-none-any.whl (6.0 kB view details)

Uploaded Python 3

File details

Details for the file slurm_emission-0.1.0.tar.gz.

File metadata

  • Download URL: slurm_emission-0.1.0.tar.gz
  • Upload date:
  • Size: 5.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.13

File hashes

Hashes for slurm_emission-0.1.0.tar.gz
Algorithm Hash digest
SHA256 428695a3e666b39b1f3c648ed53d82ad7d1665eec1ee841c21bfce9a39cf5b7f
MD5 15706d0a82698c22b772a2f58edd5528
BLAKE2b-256 640b597ab0faed6d21825d50ecf4ffae1d3ae1ae33495b2537c6b6c7586b2363

See more details on using hashes here.

File details

Details for the file slurm_emission-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: slurm_emission-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 6.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.13

File hashes

Hashes for slurm_emission-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e3ec63b24e55c7dc8b9aa752c76a7cc7db17f5c73eaaf7ec91c58fd9c903fa08
MD5 d0c8a48e79d7af74f85509148d6fd90a
BLAKE2b-256 926b6f369ca5ac875bfdb2b9529fec03e679fb1b275cf5cce61b4770d88738f4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page