Easily manage and submit robust jobs to Slurm using Python and Bash.
Project description
Easy Slurm
Easily manage and submit robust jobs to Slurm using Python and Bash.
Features
- Freezes source code and assets by copying to separate
$JOB_DIR
. - Auto-submits another job if current job times out.
- Exposes hooks for custom bash code:
setup
/setup_resume
,on_run
/on_run_resume
, andteardown
. - Format job names using parameters from config files.
- Interactive jobs supported for easy debugging.
Installation
pip install easy-slurm
Usage
To submit a job, simply fill in the various parameters shown in the example below.
import easy_slurm
easy_slurm.submit_job(
job_dir="$HOME/jobs/{date}-{job_name}",
src="./src",
assets="./assets",
dataset="./data.tar.gz",
setup="""
virtualenv "$SLURM_TMPDIR/env"
source "$SLURM_TMPDIR/env/bin/activate"
pip install -r "$SLURM_TMPDIR/src/requirements.txt"
""",
setup_resume="""
# Runs only on subsequent runs. Call setup and do anything else needed.
setup
""",
on_run="python main.py",
on_run_resume="python main.py --resume",
teardown="""
# Do any cleanup tasks here.
""",
sbatch_options={
"job-name": "example-simple",
"account": "your-username",
"time": "3:00:00",
"nodes": "1",
},
resubmit_limit=64, # Automatic resubmission limit.
)
All job files will be kept in the job_dir
directory. Provide directory paths to src
and assets
-- these will be archived and copied to the job_dir
directory. Provide a file path to an archive containing the dataset
. Also provide Bash code in the hooks, which will be run in the following order:
First run: | Subsequent runs: |
---|---|
setup |
setup_resume |
on_run |
on_run_resume |
teardown |
teardown |
Full examples can be found here, including a simple example to run "training epochs" on a cluster.
Jobs can also be fully configured using YAML files. See examples/simple_yaml
.
job_dir: "$HOME/jobs/{date}-{job_name}"
src: "./src"
assets: "./assets"
dataset: "./data.tar.gz"
setup: |
virtualenv "$SLURM_TMPDIR/env"
source "$SLURM_TMPDIR/env/bin/activate"
pip install -r "$SLURM_TMPDIR/src/requirements.txt"
setup_resume: |
# Runs only on subsequent runs. Call setup and do anything else needed.
setup
on_run: "python main.py"
on_run_resume: "python main.py --resume"
teardown: |
# Do any cleanup tasks here.
sbatch_options:
job-name: "example-simple"
account: "your-username"
time: "3:00:00"
nodes: 1
resubmit_limit: 64 # Automatic resubmission limit.
Formatting
One useful feature is formatting paths using custom template strings:
easy_slurm.submit_job(
job_dir="$HOME/jobs/{date:%Y-%m-%d}-{job_name}",
)
The job names can be formatted using a config dictionary:
job_name = easy_slurm.format.format_with_config(
"bs={hp.batch_size:04},lr={hp.lr:.1e}",
config={"hp": {"batch_size": 32, "lr": 1e-2}},
)
easy_slurm.submit_job(
job_dir="$HOME/jobs/{date:%Y-%m-%d}-{job_name}",
sbatch_options={
"job-name": job_name, # equals "bs=0032,lr=1.0e-02"
...
},
...
)
This helps in automatically creating descriptive, human-readable job names.
See the documentation for more information and examples.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file easy_slurm-0.2.2.tar.gz
.
File metadata
- Download URL: easy_slurm-0.2.2.tar.gz
- Upload date:
- Size: 10.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.3.2 CPython/3.10.9 Linux/6.1.9-arch1-1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f76be9a247b747d3dee031af5269622827dd28f670a6963adb57b7db82caecc2 |
|
MD5 | 4a831f2f52c0b6b2bc332d905d93aad7 |
|
BLAKE2b-256 | e286f72e9caf252738e96f7d823f130c86a8a1ebee7274bb29c132140329e486 |
File details
Details for the file easy_slurm-0.2.2-py3-none-any.whl
.
File metadata
- Download URL: easy_slurm-0.2.2-py3-none-any.whl
- Upload date:
- Size: 11.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.3.2 CPython/3.10.9 Linux/6.1.9-arch1-1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | dbe908cedc30b4fbe07b04f2e939e05d321ad9d2e1474efbf3804ea6b7d732e6 |
|
MD5 | 889c72e06997221bcf2bf5d917656c28 |
|
BLAKE2b-256 | 7a9e313b8546c163439f838c058368ec60bff7f84edde0d4d91420491255ac8e |