No project description provided
Project description
Slurm Longrun
Slurm Longrun is a Python package that wraps Slurm’s sbatch command to automatically resubmit jobs that time out, allowing you to run workloads that exceed a single‐job walltime without manual intervention. It supports optional terminal detachment (so your monitor survives after you log out), configurable retry limits, and built-in logging via Loguru.
Installation
Prerequisites
- Python 3.10+
- Slurm workload manager (
sbatch,sacct,scontrolin yourPATH)
Install from PyPI:
pip install slurm-longrun
Quickstart
Instead of calling sbatch directly, use the sbatch_longrun wrapper:
sbatch_longrun [OPTIONS] [SBATCH_ARGS…]
Example: your job runs longer than 30 minutes, so you give it a 30 min walltime and let Longrun resubmit on timeout:
sbatch_longrun --max-restarts 999 --time=00:30:00 --job-name=my_job my_script.sbatch
#sbatch_longrun <thiswrapperargs> <=========sbatch args===========> <===script.sh==>
This will:
- Submit
my_script.sbatchwith a 30 min limit. - When it hits the 30 min walltime (
TIMEOUT), automatically resubmit (opens log file in append mode). - Resubmit up to 999 times or until the job completes successfully.
Command-Line Interface
Usage
sbatch_longrun [OPTIONS] [SBATCH_ARGS…]
Options
--use-verbosity [DEFAULT|VERBOSE|SILENT]
Logging level (DEFAULT = INFO, VERBOSE = DEBUG, SILENT = WARNING).--detached / --no-detached
Run the monitor loop in background (detached from your terminal).--max-restarts INTEGER
Maximum number of resubmissions onTIMEOUT. Default: 99.-h, --help
Show help and exit.
All other flags are forwarded to sbatch, they must be provided after the wrapper flags.
Examples
-
Basic, retry up to 3 times, verbose logging:
sbatch_longrun --use-verbosity VERBOSE --max-restarts 3 \ --time=02:00:00 --job-name=deep_train train.sbatch
--use-verbosity VERBOSE --max-restarts 3are passed to the monitor process.--time=02:00:00 --job-name=deep_trainare passed tosbatch. -
Detach the monitor so it survives logout:
sbatch_longrun --detached \ --time=01:00:00 --job-name=data_proc data_pipeline.sbatch # → prints “Monitor running in background PID: ”
How It Works
- Submit
Callssbatchwith your arguments; parses the returned job ID. - Monitor
- Polls
sacct+scontroluntil the job reaches a terminal state. - If
TIMEOUTand you haven’t exceeded--max-restarts, it immediately resubmits with--open-mode=appendto preserve logs.
- Polls
- Detach (optional)
If--detachedis passed, the process forks twice, detaches from the terminal (setsid), redirects stdio to/dev/null, and continues monitoring in background.
Environment Variables
SLURM_LONGRUN_INITIAL_JOB_ID
- Set internally to the first submission’s job ID.
- You can read it in your job script (e.g., to name checkpoints).
Dependencies
- click
- loguru
These are installed automatically via pip.
Summary of CLI Options
| Option | Default | Description |
|---|---|---|
--use-verbosity |
DEFAULT | Logging verbosity: DEFAULT (INFO), VERBOSE, SILENT (WARNING) |
--detached / --no-detached |
--no-detached |
Detach monitoring loop into background process |
--max-restarts |
99 | Max auto-resubmissions on TIMEOUT |
[SBATCH_ARGS…] |
/ | All subsequent flags passed directly to sbatch |
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file slurm_longrun-0.1.3.tar.gz.
File metadata
- Download URL: slurm_longrun-0.1.3.tar.gz
- Upload date:
- Size: 6.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7f42f7ed3f571c0ab077131872e393c0f3890b0ca322dc2992e6bb626c392f23
|
|
| MD5 |
c887dd97800e067e6b899f46b2112828
|
|
| BLAKE2b-256 |
a3dbf889c8fce45efe169541facae4c5e8beef5f8df77a9fd4e0b8f1962886a7
|
Provenance
The following attestation bundles were made for slurm_longrun-0.1.3.tar.gz:
Publisher:
pypi-publish.yml on alexthillen/slurm_longrun
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
slurm_longrun-0.1.3.tar.gz -
Subject digest:
7f42f7ed3f571c0ab077131872e393c0f3890b0ca322dc2992e6bb626c392f23 - Sigstore transparency entry: 212850463
- Sigstore integration time:
-
Permalink:
alexthillen/slurm_longrun@af94cfa38aea9d587699a8a1f69eb73aee5e1c45 -
Branch / Tag:
refs/tags/v0.1.3 - Owner: https://github.com/alexthillen
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pypi-publish.yml@af94cfa38aea9d587699a8a1f69eb73aee5e1c45 -
Trigger Event:
release
-
Statement type:
File details
Details for the file slurm_longrun-0.1.3-py3-none-any.whl.
File metadata
- Download URL: slurm_longrun-0.1.3-py3-none-any.whl
- Upload date:
- Size: 8.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
07b04c8c0733708589246459c8c949e7c129a2b9954b1545d0f480c51416071f
|
|
| MD5 |
9606e606041881d29d3cffa9c5bc0529
|
|
| BLAKE2b-256 |
dc29802369b721e962d2fdd4a27fd0bda9ab498f24245eaf7e74b98966386b17
|
Provenance
The following attestation bundles were made for slurm_longrun-0.1.3-py3-none-any.whl:
Publisher:
pypi-publish.yml on alexthillen/slurm_longrun
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
slurm_longrun-0.1.3-py3-none-any.whl -
Subject digest:
07b04c8c0733708589246459c8c949e7c129a2b9954b1545d0f480c51416071f - Sigstore transparency entry: 212850466
- Sigstore integration time:
-
Permalink:
alexthillen/slurm_longrun@af94cfa38aea9d587699a8a1f69eb73aee5e1c45 -
Branch / Tag:
refs/tags/v0.1.3 - Owner: https://github.com/alexthillen
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pypi-publish.yml@af94cfa38aea9d587699a8a1f69eb73aee5e1c45 -
Trigger Event:
release
-
Statement type: