Skip to main content

Easily manage and submit robust jobs to Slurm using Python and Bash.

Project description

Easy Slurm

License: MIT PyPI

Easily manage and submit robust jobs to Slurm using Python and Bash.

Features

  • Freezes source code and assets by copying to separate $JOB_DIR.
  • Auto-submits another job if current job times out.
  • Exposes hooks for custom bash code: setup/setup_resume, on_run/on_run_resume, and teardown.
  • Format job names using parameters from config files.
  • Interactive jobs supported for easy debugging.

Installation

pip install easy-slurm

Usage

To submit a job, simply fill in the various parameters shown in the example below.

import easy_slurm

easy_slurm.submit_job(
    job_dir="$HOME/jobs/{date}-{job_name}",
    src="./src",
    assets="./assets",
    dataset="./data.tar.gz",
    setup="""
        virtualenv "$SLURM_TMPDIR/env"
        source "$SLURM_TMPDIR/env/bin/activate"
        pip install -r "$SLURM_TMPDIR/src/requirements.txt"
    """,
    setup_resume="""
        # Runs only on subsequent runs. Call setup and do anything else needed.
        setup
    """,
    on_run="python main.py",
    on_run_resume="python main.py --resume",
    teardown="""
        # Do any cleanup tasks here.
    """,
    sbatch_options={
        "job-name": "example-simple",
        "account": "your-username",
        "time": "3:00:00",
        "nodes": "1",
    },
    resubmit_limit=64,  # Automatic resubmission limit.
)

All job files will be kept in the job_dir directory. Provide directory paths to src and assets -- these will be archived and copied to the job_dir directory. Provide a file path to an archive containing the dataset. Also provide Bash code in the hooks, which will be run in the following order:

First run: Subsequent runs:
setup setup_resume
on_run on_run_resume
teardown teardown

Full examples can be found here, including a simple example to run "training epochs" on a cluster.

Jobs can also be fully configured using YAML files. See examples/simple_yaml.

job_dir: "$HOME/jobs/{date}-{job_name}"
src: "./src"
assets: "./assets"
dataset: "./data.tar.gz"
setup: |
  virtualenv "$SLURM_TMPDIR/env"
  source "$SLURM_TMPDIR/env/bin/activate"
  pip install -r "$SLURM_TMPDIR/src/requirements.txt"
setup_resume: |
  # Runs only on subsequent runs. Call setup and do anything else needed.
  setup
on_run: "python main.py"
on_run_resume: "python main.py --resume"
teardown: |
  # Do any cleanup tasks here.
sbatch_options:
  job-name: "example-simple"
  account: "your-username"
  time: "3:00:00"
  nodes: 1
resubmit_limit: 64  # Automatic resubmission limit.

Formatting

One useful feature is formatting paths using custom template strings:

easy_slurm.submit_job(
    job_dir="$HOME/jobs/{date:%Y-%m-%d}-{job_name}",
)

The job names can be formatted using a config dictionary:

job_name = easy_slurm.format.format_with_config(
    "bs={hp.batch_size:04},lr={hp.lr:.1e}",
    config={"hp": {"batch_size": 32, "lr": 1e-2}},
)

easy_slurm.submit_job(
    job_dir="$HOME/jobs/{date:%Y-%m-%d}-{job_name}",
    sbatch_options={
        "job-name": job_name,  # equals "bs=0032,lr=1.0e-02"
        ...
    },
    ...
)

This helps in automatically creating descriptive, human-readable job names.

See the documentation for more information and examples.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

easy_slurm-0.2.2.tar.gz (10.7 kB view details)

Uploaded Source

Built Distribution

easy_slurm-0.2.2-py3-none-any.whl (11.4 kB view details)

Uploaded Python 3

File details

Details for the file easy_slurm-0.2.2.tar.gz.

File metadata

  • Download URL: easy_slurm-0.2.2.tar.gz
  • Upload date:
  • Size: 10.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.3.2 CPython/3.10.9 Linux/6.1.9-arch1-1

File hashes

Hashes for easy_slurm-0.2.2.tar.gz
Algorithm Hash digest
SHA256 f76be9a247b747d3dee031af5269622827dd28f670a6963adb57b7db82caecc2
MD5 4a831f2f52c0b6b2bc332d905d93aad7
BLAKE2b-256 e286f72e9caf252738e96f7d823f130c86a8a1ebee7274bb29c132140329e486

See more details on using hashes here.

File details

Details for the file easy_slurm-0.2.2-py3-none-any.whl.

File metadata

  • Download URL: easy_slurm-0.2.2-py3-none-any.whl
  • Upload date:
  • Size: 11.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.3.2 CPython/3.10.9 Linux/6.1.9-arch1-1

File hashes

Hashes for easy_slurm-0.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 dbe908cedc30b4fbe07b04f2e939e05d321ad9d2e1474efbf3804ea6b7d732e6
MD5 889c72e06997221bcf2bf5d917656c28
BLAKE2b-256 7a9e313b8546c163439f838c058368ec60bff7f84edde0d4d91420491255ac8e

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page