A summary of GPU usage on a SLURM cluster
Project description
sgpustat
sgpustat is a simple command line utility that produces a summary of GPU usage on a SLURM cluster, following the naming convention of the other SLURM tools (squeue, sinfo, scontrol, ...). The tool can be used in two ways:
- To query the current usage of GPUs on the cluster.
- To launch a daemon which will log usage over time. This log can later be queried to provide usage statistics.
All data comes from exactly two scontrol calls per invocation, so it is fast even on busy clusters, and GPU accounting is exact — including on nodes with NVIDIA MIG instances and for jobs submitted with untyped --gres=gpu:N requests.
This project began as a fork of albanie/slurm_gpustat; the implementation has since been rewritten.
Installation
Install via pip install sgpustat. The pre-rename slurm_gpustat command is kept as an alias. The parsing/accounting logic lives in core.py, data collection in collect.py, rendering in render.py, the logging daemon in daemon.py, and the CLI entry point in cli.py.
Usage
To print a summary of current activity:
sgpustat
To print a summary of current activity on particular partitions, e.g. debug & normal:
sgpustat -p debug,normal or sgpustat --partition debug,normal
To include a per-node breakdown of available GPUs:
sgpustat --verbose
To output machine-readable CSV:
sgpustat --raw
Output is colorized when stdout is a terminal; --color 0 or the NO_COLOR environment variable disables it, --color 1 forces it (e.g. when piping to less -R).
To start the logging daemon:
sgpustat --action daemon-start
To view a summary of logged data:
sgpustat --action history
Example output
SLURM Cluster GPU Status
========================
GPU Summary
+----------------------------+-------+----------+-------------+
| GPU model | all | online | available |
+============================+=======+==========+=============+
| total | 214 | 193 | 51 |
+----------------------------+-------+----------+-------------+
| nvidia_geforce_rtx_3090 | 68 | 53 | 11 |
+----------------------------+-------+----------+-------------+
| nvidia_geforce_rtx_2080_ti | 54 | 54 | 22 |
+----------------------------+-------+----------+-------------+
| nvidia_a100-sxm4-80gb | 36 | 32 | 0 |
+----------------------------+-------+----------+-------------+
----------------------------------------------------------------------
Usage by User
+---------+------------------------+-------------------------------+
| User | Total GPUs Allocated | Count per GPU Type |
+=========+========================+===============================+
| user01 | 24 | nvidia_geforce_rtx_2080_ti:24 |
+---------+------------------------+-------------------------------+
With --verbose, each GPU type is broken down per node:
nvidia_geforce_rtx_3090: 11 available
-> gpunode14: 2 nvidia_geforce_rtx_3090 [cpu: 56/64, gpu: 6/8, mem: 376G/500G] [user02,user03]
-> gpunode15: 4 nvidia_geforce_rtx_3090 [cpu: 56/64, gpu: 4/8, mem: 180G/500G] [user02]
Notes on accounting
- "all" counts every configured GPU; "online" excludes nodes whose state contains DRAIN/DOWN/MAINT/etc.; "available" is unallocated GPUs on online nodes.
- GPU inventory is read from each node's
Gres=field (notCfgTRES, whose typed entries can be incomplete for MIG profiles). - Per-job allocations come from the per-node
GRES=...(IDX:...)detail lines ofscontrol show job -dd, falling back to the job's typedAllocTRESand then toTresPerNode.
Dependencies
Python >= 3.8tabulatetermcolor >= 2.1
Tests
python -m pytest tests/ — no SLURM installation required; tests run against recorded scontrol fixtures.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sgpustat-0.1.1.tar.gz.
File metadata
- Download URL: sgpustat-0.1.1.tar.gz
- Upload date:
- Size: 27.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8644bb38b758b959d9b9349b92c9e5932a59e08bc5d8bba384450c5642405b30
|
|
| MD5 |
d3d5b9d09932bd5bec2ce56f0a8b49d1
|
|
| BLAKE2b-256 |
a592ff71d56ccd8cbb9114f89ca8c75e72e3c51a3028a15217c836d3b34774a5
|
File details
Details for the file sgpustat-0.1.1-py3-none-any.whl.
File metadata
- Download URL: sgpustat-0.1.1-py3-none-any.whl
- Upload date:
- Size: 19.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4587a210383f44daa5539e8f2181dd97828d5dbadbb8a841983e7893f29e94b7
|
|
| MD5 |
25b0c005ceb73b473f6180fca49147f2
|
|
| BLAKE2b-256 |
d54b068dbfcaf15d3796e2a874e5fcb8f2d388586a5df73e229f0499841dbcc9
|