Skip to main content

Watches a SLURM job and aborts it if it stops outputting to a logfile

Project description

Usage

Include slurm-watchdog in your slurm script to monitor the job progress and kill the job if it gets stuck.

#!/bin/bash
#SBATCH -J watchdog_example
#SBATCH -N 1
#SBATCH -n 1
#SBATCH --output stdout.txt
#SBATCH --time 1-00:00:00

source .venv/bin/activate
slurm-watchdog stdout.txt &
srun python src/slurm_watchdog/dummyjob.py &
wait $! # wait for srun to finish, but do not wait for the watchdog to finish

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

slurm_watchdog-0.1.0.tar.gz (2.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

slurm_watchdog-0.1.0-py3-none-any.whl (2.9 kB view details)

Uploaded Python 3

File details

Details for the file slurm_watchdog-0.1.0.tar.gz.

File metadata

  • Download URL: slurm_watchdog-0.1.0.tar.gz
  • Upload date:
  • Size: 2.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.4.18

File hashes

Hashes for slurm_watchdog-0.1.0.tar.gz
Algorithm Hash digest
SHA256 ea2060be650374f95c4392d8ef0f75f1c2312f61d1ee6e6f91d74e3450061ddb
MD5 62fda20f709e75d0e91eed04fabb2202
BLAKE2b-256 e280a81680473fa244863e72439710e18cf301469ebacf9431a04d79b40d8a60

See more details on using hashes here.

File details

Details for the file slurm_watchdog-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for slurm_watchdog-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 52fbd6cae1c7e727a556aed1ec39d13d9e9139a8c5e8516348afe141e8e492cf
MD5 68f057b16226762f9a4b5f95eb5cbdca
BLAKE2b-256 9419d000ef4f3da41f17d64ea7b671ae8abcec2e259be99d019ca324f1ffefc2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page