Skip to main content

View a SLURM cluster and inspect nodes and jobs.

Project description

Slurm Viewer

Introduction

View the status of a SLURM cluster, including nodes and queue. This application can be run on the cluster itself or any computer that can ssh into the cluster. Using it via a ssh connection, especially using a jump host can be slow.

Features:

  • Overview of all nodes or just nodes in a set of partitions.
  • Limit to nodes with GPUs / available GPUs.
  • Show the running jobs on a selection of partitions and the jobs waiting to be scheduled.
  • Show the GPU memory used over the last 4 weeks.

View the nodes in the selected partitions. Slurmviewer Nodes View the queue of running and pending jobs. Slurmviewer Queue View the GPU utilization and memory usage Slurmviewer SPU

Installation

pip install slurm-viewer

Usage

Run slurm-viewer-init to create a default settings file stored in ~/.config/slurm-viewer/settings.toml. Edit this to reflect your setup. Once you have finished run slurm-viewer to start the UI.

Settings

The config files consist of several sections. You can add multiple slurm clusters.

[ui]
node_columns = ["node_name", "state", "gpu_tot", "gpu_alloc", "gpu_avail", "gpu_type", "gpu_mem", "cpu_tot", "cpu_alloc", "cpu_avail", "mem_tot", "mem_alloc", "mem_avail", "cpu_gpu", "mem_gpu", "cpuload", "partitions", "active_features"]
queue_columns = ["user", "job_id", "reason", "exec_host", "start_delay", "run_time", "time_limit", "command"]
priority_columns = ["user_name", "job_id", "job_priority_n", "age_n", "fair_share_n", "partition_name"]

[[clusters]]
name = "cluster_1"
partitions = ["cpu", "gpu"]
node_name_ignore_prefix = ["node"]
server = "cluster_1_logon_node"

[[clusters]]
name = "cluster_2"
partitions = ["cpu-short", "cpu-medium", "cpu-long", "gpu-short", "gpu-medium", "gpu-long"]
server = "cluster_2.logon.node"

If you need to connect using a jumphost/gateway use the ~/.ssh/config to setup the connections and use the Host name as the server.

Example of a ssh config:

Host gateway_1
  User my_user_name
  HostName gateway.somewhere
  
Host cluster_1
  User my_user_name
  HostName logonnode.somewhere
  ProxyCommand ssh -W %h:%p gateway_1

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

slurm_viewer-0.0.4-py3-none-any.whl (29.4 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page