Skip to main content

No project description provided

Project description

Slurm Longrun

Overview

Slurm Longrun is a python package that provides a simple command line interface (CLI) for submitting long-running jobs, that exceed the walltime, to a Slurm workload manager.

Usage

To use Slurm Longrun, you need to have python and pip installed on your system. You can install the package using pip:

pip install slurm-longrun

Command Line Interface (CLI)

The Slurm Longrun CLI provides a simple way to submit long-running jobs to a Slurm workload manager. The basic usage is as follows:

sbatch_longrun [OPTIONS] [SBATCH_ARGS ...]

Where OPTIONS are the options for the Slurm Longrun CLI and SBATCH_ARGS are the arguments for the sbatch command.

For example, assume the walltime is set to 30 minutes, but your job takes more than that. You can submit your job using the following command:

sbatch_longrun --time=30:00 --job-name=my_job my_script.sbatch

This will restart your job every 30 minutes until it completes.

Graceful Timeout

slurm_longrun also provides a simple way to register signal handlers, which can be used to checkpoint your model weights or save any progress upon receiving SIGTERM signal.

import slurm_longrun

slurm_longrun.register_signal_handler(
    signal.SIGTERM,
    save_checkpoint,
)

The type and time of signal slurm sends can be configured using the --signal option. For example --signal=SIGUSR1@90 will send SIGUSR1 signal 90 seconds before the job is terminated, similar to --signal=SIGTERM@90.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

slurm_longrun-0.1.1.tar.gz (5.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

slurm_longrun-0.1.1-py3-none-any.whl (7.7 kB view details)

Uploaded Python 3

File details

Details for the file slurm_longrun-0.1.1.tar.gz.

File metadata

  • Download URL: slurm_longrun-0.1.1.tar.gz
  • Upload date:
  • Size: 5.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for slurm_longrun-0.1.1.tar.gz
Algorithm Hash digest
SHA256 07281456665734d039cabe08b58378a16649db61e64c81d57609badde6c7e520
MD5 b9a7ef33050f0efc148ae30d06328185
BLAKE2b-256 cfe0096a3d383e64d26b5e52791984e50164bf19e32e35ab81e18d9664df752c

See more details on using hashes here.

Provenance

The following attestation bundles were made for slurm_longrun-0.1.1.tar.gz:

Publisher: pypi-publish.yml on alexthillen/slurm_longrun

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file slurm_longrun-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: slurm_longrun-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 7.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for slurm_longrun-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 b015e652e22eaa9f044b2214a5f1e53967b9dc163f0b8421b6f2ad9b9eeb781c
MD5 3504c55cfbc36980348e08301b2b99a1
BLAKE2b-256 384baf1f5bc6b13ddfde6729b1c70c34030bf2b677564f9e821f1f247f47d49a

See more details on using hashes here.

Provenance

The following attestation bundles were made for slurm_longrun-0.1.1-py3-none-any.whl:

Publisher: pypi-publish.yml on alexthillen/slurm_longrun

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page