Watches a SLURM job and aborts it if it stops outputting to a logfile
Project description
Usage
Include slurm-watchdog in your slurm script to monitor the job progress and kill the job if it gets stuck.
#!/bin/bash
#SBATCH -J watchdog_example
#SBATCH -N 1
#SBATCH -n 1
#SBATCH --output stdout.txt
#SBATCH --time 1-00:00:00
source .venv/bin/activate
slurm-watchdog stdout.txt &
srun python src/slurm_watchdog/dummyjob.py &
wait $! # wait for srun to finish, but do not wait for the watchdog to finish
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
slurm_watchdog-0.1.0.tar.gz
(2.2 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file slurm_watchdog-0.1.0.tar.gz.
File metadata
- Download URL: slurm_watchdog-0.1.0.tar.gz
- Upload date:
- Size: 2.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.4.18
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ea2060be650374f95c4392d8ef0f75f1c2312f61d1ee6e6f91d74e3450061ddb
|
|
| MD5 |
62fda20f709e75d0e91eed04fabb2202
|
|
| BLAKE2b-256 |
e280a81680473fa244863e72439710e18cf301469ebacf9431a04d79b40d8a60
|
File details
Details for the file slurm_watchdog-0.1.0-py3-none-any.whl.
File metadata
- Download URL: slurm_watchdog-0.1.0-py3-none-any.whl
- Upload date:
- Size: 2.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.4.18
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
52fbd6cae1c7e727a556aed1ec39d13d9e9139a8c5e8516348afe141e8e492cf
|
|
| MD5 |
68f057b16226762f9a4b5f95eb5cbdca
|
|
| BLAKE2b-256 |
9419d000ef4f3da41f17d64ea7b671ae8abcec2e259be99d019ca324f1ffefc2
|