Skip to main content

Interactive GPU monitoring TUI for SLURM clusters

Project description

See your GPUs. Know who's on them. Grab one in seconds.

Python Textual SLURM NVIDIA

evgltop in action


The Problem

You SSH into the lab server. You need a GPU. You run nvidia-smi. All 4 GPUs are taken. But by who? Are they actually using them? When will one free up? You run squeue. You cross-reference job IDs. You check tmux sessions. You piece it together manually.

evgltop does all of this in one screen, updated every second.

Install

pip install evgltop

Then just run:

evgltop

That's it. No config, no setup, no root access.

  • See all 4 GPUs at a glance with live utilization bars
  • See exactly who is running what process on each GPU
  • Press n to launch a new SLURM session with auto-optimized QOS
  • Click Cancel to kill a job (and optionally its tmux session)
  • Watch the pending queue with estimated wait times

Screenshots

GPU Dashboard + System Resources

Every GPU card shows: utilization, VRAM, temperature, power, the user, their SLURM job, runtime, and remaining time. Below: CPU, RAM, disk usage, and your tmux sessions.

Dashboard

One-Click Session Launch

Press n. Pick GPUs, partition, memory. evgltop auto-selects the best QOS so your job starts as fast as possible.

New session

Pending Queue with Wait Time Estimates

All GPUs taken? evgltop shows your pending jobs with estimated start times. Even when SLURM says N/A, evgltop calculates it from running jobs' remaining time — and chains estimates across multiple pending jobs.

Pending queue

Built-In Help

Press h for a quick reference of everything.

Help


Feature Highlights

GPU Waste Detection

If someone allocated a GPU but isn't using it (0% utilization for >10 min), the card turns red with IDLE status. Hard to ignore.

Smart QOS

SLURM QOS tiers have different GPU limits and priorities. evgltop checks what you're already using and picks the tier that maximizes your throughput:

Request 1 GPU:
  normal slot free?  -> normal  (priority 40000, fastest)
  normal full?       -> gpu02   (priority 20000)

Request 2 GPUs:      -> gpu02
Request 3-4 GPUs:    -> gpu04   (priority 10000)

No more QOSMaxGRESPerUser surprises.

Pending Queue with Time Estimates

SLURM says N/A for your pending job's start time? evgltop calculates it by looking at when running jobs will end, then chains the estimates if you have multiple pending jobs.

Job 64  gres:gpu:4  ~6d 23h (03/28 00:12)  QOS limit
Job 66  gres:gpu:4  ~7d 23h (03/29 00:12)  QOS limit   <- accounts for Job 64's runtime

Tmux Session Tracking

Sessions auto-named gpu{N}-{JOBID}. The Sessions panel shows which are running, waiting, or ended:

● gpu1-53    RUN  GPU 0  03/21 00:12   tmux attach -t gpu1-53
◌ gpu4-67    WAIT        03/21 00:36   tmux attach -t gpu4-67
○ gpu2-70    END         03/21 01:02   tmux attach -t gpu2-70

Keybindings

Key Action
n New SLURM session
h Help
r Force refresh
q Quit
Esc Close dialog
Click Cancel Cancel a running/pending job

Requirements

  • Python 3.10+
  • NVIDIA GPUs with nvidia-smi
  • SLURM (squeue, scontrol, scancel)
  • tmux

Emory Vision & Graphics Lab

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

evgltop-0.1.0.tar.gz (45.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

evgltop-0.1.0-py3-none-any.whl (17.8 kB view details)

Uploaded Python 3

File details

Details for the file evgltop-0.1.0.tar.gz.

File metadata

  • Download URL: evgltop-0.1.0.tar.gz
  • Upload date:
  • Size: 45.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for evgltop-0.1.0.tar.gz
Algorithm Hash digest
SHA256 54365049a9b8a4252e61152d3375b5bf2cb61bf2e56f99e16b73276a040b43b3
MD5 88e8c6a014b38a6e5d006424228f7193
BLAKE2b-256 84883a56b08c1ded6c2a63f2b65634abab4c574c00a98fa3fb6ac3630dcb8d97

See more details on using hashes here.

File details

Details for the file evgltop-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: evgltop-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 17.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for evgltop-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 215e53b28176459a1740b7b8980d87b58a9aa19e6780a4b87004ce54de5bf2e0
MD5 53a389e7800c6366ad0a52f9fe0946b8
BLAKE2b-256 d753d1e4453268962f867f16899b1909aaa183c4c9b945e5b67492970b45c9eb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page