Easily manage and submit robust jobs to Slurm using Python and Bash.
Project description
Easy Slurm
Easily manage and submit robust jobs to Slurm using Python and Bash.
Features
- Freezes source code and assets by copying to separate
JOB_DIR
. - Applies performance tweaks like copying data to local filesystem of compute node (
SLURM_TMPDIR
) for fast I/O. - Exposes hooks for custom bash code:
setup
/setup_resume
,on_run
/on_run_resume
, andteardown
. - Interrupts running worker process before job time runs out.
- Auto-saves results back to
JOB_DIR
. - Auto-submits another job if current job times out.
- Restores intermediate results and resumes running the
*_resume
hooks. - Supports interactive jobs for easy debugging.
Installation
pip install easy-slurm
Usage
To submit a job, simply fill in the various parameters shown in the example below.
import easy_slurm
easy_slurm.submit_job(
job_root="$HOME/.local/share/easy_slurm/example-simple",
src="./src",
assets="./assets",
dataset="./data.tar.gz",
setup="""
virtualenv "$SLURM_TMPDIR/env"
source "$SLURM_TMPDIR/env/bin/activate"
pip install -r "$SLURM_TMPDIR/src/requirements.txt"
""",
setup_resume="""
# Runs only on subsequent runs. Call setup and do anything else needed.
setup
""",
on_run="python main.py",
on_run_resume="python main.py --resume",
teardown="""
# Copy files to results directory.
cp "$SLURM_TMPDIR/src/*.log" "$SLURM_TMPDIR/results/"
""",
sbatch_options={
"job-name": "example-simple",
"account": "your-username",
"time": "3:00:00",
"nodes": "1",
},
)
All job files will be kept in the job_root
directory. Provide directory paths to src
and assets
-- these will be archived and copied to the job_root
directory. Provide a file path to an archive containing the dataset
. Also provide Bash code in the hooks, which will be run in the following order:
setup / setup_resume
on_run / on_run_resume
teardown
Full examples can be found here, including a simple example to run "training epochs" on a cluster.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
easy-slurm-0.1.1.tar.gz
(7.8 kB
view hashes)
Built Distribution
Close
Hashes for easy_slurm-0.1.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c1a22be5d169d64e008cb7235b4d8a746dad79c1098e069633975ce91ab8bb64 |
|
MD5 | a76fe462f5c2c2eb748d4c2eea5defc2 |
|
BLAKE2b-256 | b9ffab0b78d75b20b6538e1ddce8dc72d9b908afc8cae5b21b213326485bbcce |