View a SLURM cluster and inspect nodes and jobs.
Project description
Slurm Viewer
Introduction
View the status of a SLURM cluster, including nodes and queue. This application can be run on the cluster itself or any computer that can ssh into the cluster. Using it via a ssh connection, especially using a jump host can be slow.
Features:
- Overview of all nodes or just nodes in a set of partitions.
- Limit to nodes with GPUs / available GPUs.
- Show the running jobs on a selection of partitions and the jobs waiting to be scheduled.
- Show the GPU memory used over the last 4 weeks.
View the nodes in the selected partitions. View the queue of running and pending jobs. View the GPU utilization and memory usage
Installation
pip install slurm-viewer
Usage
Run slurm-viewer-init
to create a default settings file stored in ~/.config/slurm-viewer/settings.toml
.
Edit this to reflect your setup. Once you have finished run slurm-viewer
to start the UI.
Settings
The config files consist of several sections. You can add multiple slurm clusters.
[ui]
node_columns = ["node_name", "state", "gpu_tot", "gpu_alloc", "gpu_avail", "gpu_type", "gpu_mem", "cpu_tot", "cpu_alloc", "cpu_avail", "mem_tot", "mem_alloc", "mem_avail", "cpu_gpu", "mem_gpu", "cpuload", "partitions", "active_features"]
queue_columns = ["user", "job_id", "reason", "exec_host", "start_delay", "run_time", "time_limit", "command"]
priority_columns = ["user_name", "job_id", "job_priority_n", "age_n", "fair_share_n", "partition_name"]
[[clusters]]
name = "cluster_1"
partitions = ["cpu", "gpu"]
node_name_ignore_prefix = ["node"]
server = "cluster_1_logon_node"
[[clusters]]
name = "cluster_2"
partitions = ["cpu-short", "cpu-medium", "cpu-long", "gpu-short", "gpu-medium", "gpu-long"]
server = "cluster_2.logon.node"
If you need to connect using a jumphost/gateway use the ~/.ssh/config
to setup the connections and use the Host
name as
the server.
Example of a ssh config:
Host gateway_1
User my_user_name
HostName gateway.somewhere
Host cluster_1
User my_user_name
HostName logonnode.somewhere
ProxyCommand ssh -W %h:%p gateway_1
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Hashes for slurm_viewer-0.0.4-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5b87e098761561fa519e6c2f43f475f52dbb84af798f73411f93ae1e7f838b04 |
|
MD5 | 068385a4e2895bd07bbabfe2b57ad77b |
|
BLAKE2b-256 | 2e8d3329b6c47685fe343f881fbd594e7ec93558c991026213d9812163e79d7d |