Interactive GPU monitoring TUI for SLURM clusters
Project description
See your GPUs. Know who's on them. Grab one in seconds.
The Problem
You SSH into the lab server. You need a GPU. You run nvidia-smi. All 4 GPUs are taken. But by who? Are they actually using them? When will one free up? You run squeue. You cross-reference job IDs. You check tmux sessions. You piece it together manually.
evgltop does all of this in one screen, updated every second.
Install
pip install evgltop
Then just run:
evgltop
That's it. No config, no setup, no root access.
- See all 4 GPUs at a glance with live utilization bars
- See exactly who is running what process on each GPU
- Press
nto launch a new SLURM session with auto-optimized QOS - Click
Cancelto kill a job (and optionally its tmux session) - Watch the pending queue with estimated wait times
Screenshots
GPU Dashboard + System Resources
Every GPU card shows: utilization, VRAM, temperature, power, the user, their SLURM job, runtime, and remaining time. Below: CPU, RAM, disk usage, and your tmux sessions.
One-Click Session Launch
Press n. Pick GPUs, partition, memory. evgltop auto-selects the best QOS so your job starts as fast as possible.
Pending Queue with Wait Time Estimates
All GPUs taken? evgltop shows your pending jobs with estimated start times. Even when SLURM says N/A, evgltop calculates it from running jobs' remaining time — and chains estimates across multiple pending jobs.
Built-In Help
Press h for a quick reference of everything.
Feature Highlights
GPU Waste Detection
If someone allocated a GPU but isn't using it (0% utilization for >10 min), the card turns red with IDLE status. Hard to ignore.
Smart QOS
SLURM QOS tiers have different GPU limits and priorities. evgltop checks what you're already using and picks the tier that maximizes your throughput:
Request 1 GPU:
normal slot free? -> normal (priority 40000, fastest)
normal full? -> gpu02 (priority 20000)
Request 2 GPUs: -> gpu02
Request 3-4 GPUs: -> gpu04 (priority 10000)
No more QOSMaxGRESPerUser surprises.
Pending Queue with Time Estimates
SLURM says N/A for your pending job's start time? evgltop calculates it by looking at when running jobs will end, then chains the estimates if you have multiple pending jobs.
Job 64 gres:gpu:4 ~6d 23h (03/28 00:12) QOS limit
Job 66 gres:gpu:4 ~7d 23h (03/29 00:12) QOS limit <- accounts for Job 64's runtime
Tmux Session Tracking
Sessions auto-named gpu{N}-{JOBID}. The Sessions panel shows which are running, waiting, or ended:
● gpu1-53 RUN GPU 0 03/21 00:12 tmux attach -t gpu1-53
◌ gpu4-67 WAIT 03/21 00:36 tmux attach -t gpu4-67
○ gpu2-70 END 03/21 01:02 tmux attach -t gpu2-70
Keybindings
| Key | Action |
|---|---|
n |
New SLURM session |
h |
Help |
r |
Force refresh |
q |
Quit |
Esc |
Close dialog |
Click Cancel |
Cancel a running/pending job |
Requirements
- Python 3.10+
- NVIDIA GPUs with
nvidia-smi - SLURM (
squeue,scontrol,scancel) tmux
Emory Vision & Graphics Lab
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file evgltop-0.1.0.tar.gz.
File metadata
- Download URL: evgltop-0.1.0.tar.gz
- Upload date:
- Size: 45.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
54365049a9b8a4252e61152d3375b5bf2cb61bf2e56f99e16b73276a040b43b3
|
|
| MD5 |
88e8c6a014b38a6e5d006424228f7193
|
|
| BLAKE2b-256 |
84883a56b08c1ded6c2a63f2b65634abab4c574c00a98fa3fb6ac3630dcb8d97
|
File details
Details for the file evgltop-0.1.0-py3-none-any.whl.
File metadata
- Download URL: evgltop-0.1.0-py3-none-any.whl
- Upload date:
- Size: 17.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
215e53b28176459a1740b7b8980d87b58a9aa19e6780a4b87004ce54de5bf2e0
|
|
| MD5 |
53a389e7800c6366ad0a52f9fe0946b8
|
|
| BLAKE2b-256 |
d753d1e4453268962f867f16899b1909aaa183c4c9b945e5b67492970b45c9eb
|