Easily manage and submit robust jobs to Slurm using Python and Bash.
Project description
Easy Slurm
Easily manage and submit robust jobs to Slurm using Python and Bash.
Features
- Freezes source code and assets by copying to separate
$JOB_DIR. - Auto-submits another job if current job times out.
- Exposes hooks for custom bash code:
setup/setup_resume,on_run/on_run_resume, andteardown. - Format job names using parameters from config files.
- Interactive jobs supported for easy debugging.
Installation
pip install easy-slurm
Usage
To submit a job, simply fill in the various parameters shown in the example below.
import easy_slurm
easy_slurm.submit_job(
job_dir="$HOME/jobs/{date}-{job_name}",
src="./src",
assets="./assets",
dataset="./data.tar.gz",
setup="""
virtualenv "$SLURM_TMPDIR/env"
source "$SLURM_TMPDIR/env/bin/activate"
pip install -r "$SLURM_TMPDIR/src/requirements.txt"
""",
setup_resume="""
# Runs only on subsequent runs. Call setup and do anything else needed.
setup
""",
on_run="python main.py",
on_run_resume="python main.py --resume",
teardown="""
# Do any cleanup tasks here.
""",
sbatch_options={
"job-name": "example-simple",
"account": "your-username",
"time": "3:00:00",
"nodes": "1",
},
resubmit_limit=64, # Automatic resubmission limit.
)
All job files will be kept in the job_dir directory. Provide directory paths to src and assets -- these will be archived and copied to the job_dir directory. Provide a file path to an archive containing the dataset. Also provide Bash code in the hooks, which will be run in the following order:
| First run: | Subsequent runs: |
|---|---|
setup |
setup_resume |
on_run |
on_run_resume |
teardown |
teardown |
Full examples can be found here, including a simple example to run "training epochs" on a cluster.
Jobs can also be fully configured using YAML files. See examples/simple_yaml.
job_dir: "$HOME/jobs/{date}-{job_name}"
src: "./src"
assets: "./assets"
dataset: "./data.tar.gz"
setup: |
virtualenv "$SLURM_TMPDIR/env"
source "$SLURM_TMPDIR/env/bin/activate"
pip install -r "$SLURM_TMPDIR/src/requirements.txt"
setup_resume: |
# Runs only on subsequent runs. Call setup and do anything else needed.
setup
on_run: "python main.py"
on_run_resume: "python main.py --resume"
teardown: |
# Do any cleanup tasks here.
sbatch_options:
job-name: "example-simple"
account: "your-username"
time: "3:00:00"
nodes: 1
resubmit_limit: 64 # Automatic resubmission limit.
Formatting
One useful feature is formatting paths using custom template strings:
easy_slurm.submit_job(
job_dir="$HOME/jobs/{date:%Y-%m-%d}-{job_name}",
)
The job names can be formatted using a config dictionary:
job_name = easy_slurm.format.format_with_config(
"bs={hp.batch_size:04},lr={hp.lr:.1e}",
config={"hp": {"batch_size": 32, "lr": 1e-2}},
)
easy_slurm.submit_job(
job_dir="$HOME/jobs/{date:%Y-%m-%d}-{job_name}",
sbatch_options={
"job-name": job_name, # equals "bs=0032,lr=1.0e-02"
...
},
...
)
This helps in automatically creating descriptive, human-readable job names.
See the documentation for more information and examples.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file easy_slurm-0.2.2.tar.gz.
File metadata
- Download URL: easy_slurm-0.2.2.tar.gz
- Upload date:
- Size: 10.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.3.2 CPython/3.10.9 Linux/6.1.9-arch1-1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f76be9a247b747d3dee031af5269622827dd28f670a6963adb57b7db82caecc2
|
|
| MD5 |
4a831f2f52c0b6b2bc332d905d93aad7
|
|
| BLAKE2b-256 |
e286f72e9caf252738e96f7d823f130c86a8a1ebee7274bb29c132140329e486
|
File details
Details for the file easy_slurm-0.2.2-py3-none-any.whl.
File metadata
- Download URL: easy_slurm-0.2.2-py3-none-any.whl
- Upload date:
- Size: 11.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.3.2 CPython/3.10.9 Linux/6.1.9-arch1-1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
dbe908cedc30b4fbe07b04f2e939e05d321ad9d2e1474efbf3804ea6b7d732e6
|
|
| MD5 |
889c72e06997221bcf2bf5d917656c28
|
|
| BLAKE2b-256 |
7a9e313b8546c163439f838c058368ec60bff7f84edde0d4d91420491255ac8e
|