Tools to supervise slurm jobs
Project description
Introduction
A tool to supervise jobs status via Feishu bot messages.
Currently supportted job states are : 'fail', 'complete', 'unavailable', 'cancel', 'running', 'pending'. If your job does not belong to those states, the reportted status will be 'unknown'.
Once you init a supervisor, it will be stopped if all its jobs are one of ['fail', 'complete', 'unavailable', 'cancel', 'unknown'], and will report all its jobs' status to you via your Feishu bot.
Setup Guide
-
Get webhook from your Feishu Bot.
-
Export and set webhook 'JOBS_SUPERVISOR_WEBHOOK' in your ~/.basrc. For example:
export JOBS_SUPERVISOR_WEBHOOK=https://open.feishu.cn/your-webhook
-
Export and set the directory 'JOBS_SUPERVISOR_LOGDIR' to save your sbatch output in your ~/.basrc. For example:
export JOBS_SUPERVISOR_LOGDIR=path_to_log_files
-
Run
source ~/.basrc
-
(Optional) Create an enviornment for the tools by
conda create -n env_tools python=3.8
-
Activate your enviornment by
conda activate env_tools
-
Install this python package in the enviornment by
pip3 install -i https://pypi.org/project/ jobs-supervisor==0.3.4
Usage
Assume 'jobid_start' and 'jobid_end' are int.
-
Activate your enviornment by
conda activate env_tools
-
In terminal:
supervise_jobs jobid_start jobid_end
-
You can run
supervise_jobs --help
for more configs. By default,
-
it will skip to report 'unavailable' jobs (jobs that you cannot get status). Set the option
--show_unavailable
to report unavailable jobs. -
it will only report once all jobs are stopped. If you want to report every time a job stops, set the option
--show_intermedia
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file jobs_supervisor-0.3.4.tar.gz
.
File metadata
- Download URL: jobs_supervisor-0.3.4.tar.gz
- Upload date:
- Size: 5.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.8.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6759410aa0c73501916a3378d7f1fc557536879c240a178759d4405824e860de |
|
MD5 | 407f735cbd485b513395e1dd08c27dc1 |
|
BLAKE2b-256 | d2096391a92a77357ceacb1dd6ad6aa0b9efda7067fefe7404cdfb60fc0d3dda |
File details
Details for the file jobs_supervisor-0.3.4-py3-none-any.whl
.
File metadata
- Download URL: jobs_supervisor-0.3.4-py3-none-any.whl
- Upload date:
- Size: 6.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.8.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 267946b633bf293e001daca2ef46bc22b91d42a41d4dab126b1b52d5571e6af4 |
|
MD5 | 1cb68dd05ec78679ceeafbd7c0700335 |
|
BLAKE2b-256 | 865587d73c6eee53e9d7f550a04c4641dd529fc891454d0aabc1a753a8a38d31 |