A simple SLURM gpu summary tool
Project description
slurm_gpustat
slurm_gpustat
is a simple command line utility that produces a summary of GPU usage on a slurm cluster. The tool can be used in two ways:
- To query the current usage of GPUs on the cluster.
- To launch a daemon which will log usage over time. This log can later be queried to provide usage statistics.
Installation
Install via pip install slurm_gpustat
. If you prefer to hack around with the source code, it's a single python file.
Usage
To print a summary of current activity:
slurm_gpustat
To print a summary of current activity on particular partitions, e.g. debug
& normal
:
slurm_gpustat -p debug,normal
or slurm_gpustat --partition debug,normal
To start the logging dameon:
slurm_gpustat --action daemon-start
To view a summary of logged data:
`slurm_gpustat --action history`
Example outputs
Running slurm_gpustat
will produce something like this:
----------------------
Under SLURM management
----------------------
There are a total of 12 gpus [up]
3 rtx6k gpus
2 v100 gpus
3 m40 gpus
4 p40 gpus
----------------------
There are a total of 10 gpus [accessible]
1 rtx6k gpus
2 v100 gpus
3 m40 gpus
4 p40 gpus
----------------------
Usage by user:
user1 [total: 1 ] m40: 1
user2 [total: 1 ] p40: 1
user3 [total: 3 ] p40: 3
----------------------
There are 5 gpus available:
p40: 0
rtx6k: 1
v100: 2
m40: 2
Adding --verbose
to this command will produce a more detailed breakdown for the section describing gpus that are still available. Example output:
There are 18 gpus available:
m40: 5 available
-> gnodeb3: 1 m40 [cpu: 38/40, gres/gpu: 3/4, mem: 68G/257669M] [user1,user2]
-> gnodeb4: 2 m40 [cpu: 38/40, gres/gpu: 2/4, mem: 60G/193161M] [user1,user2]
-> gnodec3: 2 m40 [cpu: 12/48, gres/gpu: 2/4, mem: 36G/257669M] [user1]
p40: 5 available
-> gnodec4: 1 p40 [cpu: 20/48, gres/gpu: 3/4, mem: 216G/257669M] [user1]
-> gnoded1: 1 p40 [cpu: 44/64, gres/gpu: 3/4, mem: 60G/385192M] [user4,user5]
-> gnoded3: 1 p40 [cpu: 28/64, gres/gpu: 3/4, mem: 70G/385192M] [user2,user4,user5]
-> gnoded4: 1 p40 [cpu: 28/64, gres/gpu: 3/4, mem: 112G/385192M] [user5,user4]
-> gnodee3: 1 p40 [cpu: 36/56, gres/gpu: 3/4, mem: 60G/385192M] [user4,user2]
rtx6k: 3 available
-> gnodef1: 1 rtx6k [cpu: 36/56, gres/gpu: 3/4, mem: 194G/257669M] [user2,user3]
-> gnodeh2: 2 rtx6k [cpu: 16/24, gres/gpu: 2/4, mem: 96G/385345M] [user2,user1]
v100: 5 available
-> gnodeg1: 2 v100 [cpu: 16/64, gres/gpu: 2/4, mem: 60G/191668M] [user5,user1]
-> gnodeg2: 3 v100 [cpu: 8/64, gres/gpu: 1/4, mem: 40G/191668M] [user2]
Running slurm_gpustat --action history
(after the daemon has run for a little while) will produce something like this:
Historical data contains 7 samples (2020-01-03 11:51:43 to 2020-01-03 11:51:45)
GPU usage for user1:
v100 > avg: 2, max: 4
m40 > avg: 1, max: 1
total > avg: 3
GPU usage for user2:
p40m > avg: 3, max: 4
total > avg: 3
Dependencies
Python >= 3.6
numpy
beartype
seaborn
colored
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file slurm_gpustat-0.0.15.tar.gz
.
File metadata
- Download URL: slurm_gpustat-0.0.15.tar.gz
- Upload date:
- Size: 12.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.17
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6ae06efc95b722b0076f048a82990dd6208138286ded6b4c8c3774970202a125 |
|
MD5 | eace0f1554d829dfc1d1ae0121fd4b5f |
|
BLAKE2b-256 | 7699982b4172be024f7ed5668a6c2e57b493a250951cc40ce9e8bb85884ff9b0 |
File details
Details for the file slurm_gpustat-0.0.15-py3-none-any.whl
.
File metadata
- Download URL: slurm_gpustat-0.0.15-py3-none-any.whl
- Upload date:
- Size: 11.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.17
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0ab1f75442a7d8832850e539009061e2c94d50dfb6f9c7d21051a514a9584e66 |
|
MD5 | 57c1a92143b64f594723f384111dd799 |
|
BLAKE2b-256 | fda5219a2c2f0ba3d418abce25755be3d62438b6f24a93727dafbd4556d15e82 |