Skip to main content

Tools to supervise slurm jobs

Project description

Introduction

A tool to supervise jobs status via Feishu bot messages.

Currently supportted job states are : 'fail', 'complete', 'unavailable', 'cancel', 'running', 'pending'. If your job does not belong to those states, the reportted status will be 'unknown'.

Once you init a supervisor, it will be stopped if all its jobs are one of ['fail', 'complete', 'unavailable', 'cancel', 'unknown'], and will report all its jobs' status to you via your Feishu bot.

Setup Guide

  1. Get webhook from your Feishu Bot.

  2. Export and set webhook 'JOBS_SUPERVISOR_WEBHOOK' in your ~/.basrc. For example: export JOBS_SUPERVISOR_WEBHOOK=https://open.feishu.cn/your-webhook

  3. Export and set the directory 'JOBS_SUPERVISOR_LOGDIR' to save your sbatch output in your ~/.basrc. For example: export JOBS_SUPERVISOR_LOGDIR=path_to_log_files

  4. Run source ~/.basrc

  5. (Optional) Create an enviornment for the tools by conda create -n env_tools python=3.8

  6. Activate your enviornment by conda activate env_tools

  7. Install this python package in the enviornment by pip3 install -i https://pypi.org/project/ jobs-supervisor==0.3.7

Usage

Assume 'jobid_start' and 'jobid_end' are int.

  1. Activate your enviornment by conda activate env_tools

  2. In terminal: supervise_jobs jobid_start jobid_end

  3. You can run supervise_jobs --help for more configs. By default,

  • it will skip to report 'unavailable' jobs (jobs that you cannot get status). Set the option --show_unavailable to report unavailable jobs.

  • it will only report once all jobs are stopped. If you want to report every time a job stops, set the option --show_intermedia

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

jobs_supervisor-0.3.7.tar.gz (5.4 kB view details)

Uploaded Source

Built Distribution

jobs_supervisor-0.3.7-py3-none-any.whl (6.9 kB view details)

Uploaded Python 3

File details

Details for the file jobs_supervisor-0.3.7.tar.gz.

File metadata

  • Download URL: jobs_supervisor-0.3.7.tar.gz
  • Upload date:
  • Size: 5.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.3

File hashes

Hashes for jobs_supervisor-0.3.7.tar.gz
Algorithm Hash digest
SHA256 ed6002c0fcad55a2a694ef0c7cc5716b60b01c7f0185a6d3622939d86bb916b0
MD5 cdfbd7d528aab850f91203dc15d402d5
BLAKE2b-256 de3f2494776b2830be8e8cd033198d31aef9abf3826e5e54a8c1ab35ccb19523

See more details on using hashes here.

File details

Details for the file jobs_supervisor-0.3.7-py3-none-any.whl.

File metadata

File hashes

Hashes for jobs_supervisor-0.3.7-py3-none-any.whl
Algorithm Hash digest
SHA256 8c67a7a7f3dc6ed92d8023b6bf212f3799be6bc4308a6f168616bbebe6abfb32
MD5 7127ad5a6dc10015cb85f4d9d6aa3ba0
BLAKE2b-256 98445535feb6c9f63c1da0814601daeab99f0504245e841b90849372d459f870

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page