Skip to main content

Easily manage and submit robust jobs to Slurm using Python and Bash.

Project description

Easy Slurm

License: MIT PyPI

Easily manage and submit robust jobs to Slurm using Python and Bash.

Features

  • Freezes source code and assets by copying to separate $JOB_DIR.
  • Auto-submits another job if current job times out.
  • Exposes hooks for custom bash code: setup/setup_resume, on_run/on_run_resume, and teardown.
  • Format job names using parameters from config files.
  • Interactive jobs supported for easy debugging.

Installation

pip install easy-slurm

Usage

To submit a job, simply fill in the various parameters shown in the example below.

import easy_slurm

easy_slurm.submit_job(
    job_dir="$HOME/jobs/{date}-{job_name}",
    src="./src",
    assets="./assets",
    dataset="./data.tar.gz",
    setup="""
        virtualenv "$SLURM_TMPDIR/env"
        source "$SLURM_TMPDIR/env/bin/activate"
        pip install -r "$SLURM_TMPDIR/src/requirements.txt"
    """,
    setup_resume="""
        # Runs only on subsequent runs. Call setup and do anything else needed.
        setup
    """,
    on_run="python main.py",
    on_run_resume="python main.py --resume",
    teardown="""
        # Do any cleanup tasks here.
    """,
    sbatch_options={
        "job-name": "example-simple",
        "account": "your-username",
        "time": "3:00:00",
        "nodes": "1",
    },
    resubmit_limit=64,  # Automatic resubmission limit.
)

All job files will be kept in the job_dir directory. Provide directory paths to src and assets -- these will be archived and copied to the job_dir directory. Provide a file path to an archive containing the dataset. Also provide Bash code in the hooks, which will be run in the following order:

First run: Subsequent runs:
setup setup_resume
on_run on_run_resume
teardown teardown

Full examples can be found here, including a simple example to run "training epochs" on a cluster.

Jobs can also be fully configured using YAML files. See examples/simple_yaml.

job_dir: "$HOME/jobs/{date}-{job_name}"
src: "./src"
assets: "./assets"
dataset: "./data.tar.gz"
setup: |
  virtualenv "$SLURM_TMPDIR/env"
  source "$SLURM_TMPDIR/env/bin/activate"
  pip install -r "$SLURM_TMPDIR/src/requirements.txt"
setup_resume: |
  # Runs only on subsequent runs. Call setup and do anything else needed.
  setup
on_run: "python main.py"
on_run_resume: "python main.py --resume"
teardown: |
  # Do any cleanup tasks here.
sbatch_options:
  job-name: "example-simple"
  account: "your-username"
  time: "3:00:00"
  nodes: 1
resubmit_limit: 64  # Automatic resubmission limit.

Formatting

One useful feature is formatting paths using custom template strings:

easy_slurm.submit_job(
    job_dir="$HOME/jobs/{date:%Y-%m-%d}-{job_name}",
)

The job names can be formatted using a config dictionary:

job_name = easy_slurm.format.format_with_config(
    "bs={hp.batch_size:04},lr={hp.lr:.1e}",
    config={"hp": {"batch_size": 32, "lr": 1e-2}},
)

easy_slurm.submit_job(
    job_dir="$HOME/jobs/{date:%Y-%m-%d}-{job_name}",
    sbatch_options={
        "job-name": job_name,  # equals "bs=0032,lr=1.0e-02"
        ...
    },
    ...
)

This helps in automatically creating descriptive, human-readable job names.

See the documentation for more information and examples.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

easy_slurm-0.2.2.tar.gz (10.7 kB view hashes)

Uploaded Source

Built Distribution

easy_slurm-0.2.2-py3-none-any.whl (11.4 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page