Skip to main content

Tools to supervise slurm jobs

Project description

Introduction

A tool to supervise jobs status via Feishu bot messages.

Currently supportted job states are : 'fail', 'complete', 'unavailable', 'cancel', 'running', 'pending'. If your job does not belong to those states, the reportted status will be 'unknown'.

Once you init a supervisor, it will be stopped if all its jobs are one of ['fail', 'complete', 'unavailable', 'cancel', 'unknown'], and will report all its jobs' status to you via your Feishu bot.

Setup Guide

  1. Get webhook from your Feishu Bot.

  2. Export and set webhook 'JOBS_SUPERVISOR_WEBHOOK' in your ~/.basrc. For example: export JOBS_SUPERVISOR_WEBHOOK=https://open.feishu.cn/your-webhook

  3. Export and set the directory 'JOBS_SUPERVISOR_LOGDIR' to save your sbatch output in your ~/.basrc. For example: export JOBS_SUPERVISOR_LOGDIR=path_to_log_files

  4. Run source ~/.basrc

  5. (Optional) Create an enviornment for the tools by conda create -n env_tools python=3.8

  6. Activate your enviornment by conda activate env_tools

  7. Install this python package in the enviornment by pip3 install -i https://pypi.org/project/ jobs-supervisor==0.3.5

Usage

Assume 'jobid_start' and 'jobid_end' are int.

  1. Activate your enviornment by conda activate env_tools

  2. In terminal: supervise_jobs jobid_start jobid_end

  3. You can run supervise_jobs --help for more configs. By default,

  • it will skip to report 'unavailable' jobs (jobs that you cannot get status). Set the option --show_unavailable to report unavailable jobs.

  • it will only report once all jobs are stopped. If you want to report every time a job stops, set the option --show_intermedia

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

jobs_supervisor-0.3.5.tar.gz (5.4 kB view details)

Uploaded Source

Built Distribution

jobs_supervisor-0.3.5-py3-none-any.whl (6.8 kB view details)

Uploaded Python 3

File details

Details for the file jobs_supervisor-0.3.5.tar.gz.

File metadata

  • Download URL: jobs_supervisor-0.3.5.tar.gz
  • Upload date:
  • Size: 5.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.3

File hashes

Hashes for jobs_supervisor-0.3.5.tar.gz
Algorithm Hash digest
SHA256 f5eaa63eed3423e86d5f57aeed6258906dc16786cf4a00a091b7955b65f2d79a
MD5 0e38e55f8e7afb57c47516f5fa1fa4a8
BLAKE2b-256 ccd95e388f1590826dffe34df2b9c514173575ad56d789dc51f6218fca6ad38b

See more details on using hashes here.

File details

Details for the file jobs_supervisor-0.3.5-py3-none-any.whl.

File metadata

File hashes

Hashes for jobs_supervisor-0.3.5-py3-none-any.whl
Algorithm Hash digest
SHA256 59ea4110ecf7735dd59c5be36905d9d3a7a0ea65f1149a284c2e4d5096b6aa52
MD5 209680e88a3031e24167546a5b9f6796
BLAKE2b-256 6ccb609bf81bc1c4cbdbdab760cf7a1bb5827369ab1903107ef2677ed64b5e88

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page