NVIDIA GPU tools

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 3 - Alpha
Intended Audience
- Developers
- Science/Research
License
- OSI Approved :: MIT License
Operating System
- POSIX :: Linux
Programming Language
- Python :: 2
- Python :: 3

Project description

`nvgpu` - NVIDIA GPU tools

It provides information about GPUs and their availability for computation.

Often we want to train a ML model on one of GPUs installed on a multi-GPU machine. Since TensorFlow allocates all memory, only one such process can use the GPU at a time. Unfortunately nvidia-smi provides only a text interface with information about GPUs. This packages wraps it with an easier to use CLI and Python interface.

It's a quick and dirty solution calling nvidia-smi and parsing its output. We can take one or more GPUs availabile for computation based on relative memory usage, ie. it is OK with Xorg taking a few MB.

In addition we have a fancy table of GPU with more information taken by python binding to NVML.

For easier monitoring of multiple machines it's possible to deploy agents (that provide the GPU information in JSON over a REST API) and show the aggregated status in a web application.

Installing

For a user:

pip install nvgpu

or to the system:

sudo -H pip install nvgpu

Usage examples

Command-line interface:

# grab all available GPUs
CUDA_VISIBLE_DEVICES=$(nvgpu available)

# grab at most available GPU
CUDA_VISIBLE_DEVICES=$(nvgpu available -l 1)

Print pretty colored table of devices, availability, users, processes:

$ nvgpu list
    status    type                 util.      temp.    MHz  users    since    pids    cmd
--  --------  -------------------  -------  -------  -----  -------  ---------------  ------  --------
 0  [ ]       GeForce GTX 1070      0 %          44    139                          
 1  [~]       GeForce GTX 1080 Ti   0 %          44    139  alice    2 days ago       19028   jupyter
 2  [~]       GeForce GTX 1080 Ti   0 %          44    139  bob      14 hours ago     8479    jupyter
 3  [~]       GeForce GTX 1070     46 %          54   1506  bob      7 days ago       20883   train.py
 4  [~]       GeForce GTX 1070     35 %          64   1480  bob      7 days ago       26228   evaluate.py
 5  [!]       GeForce GTX 1080 Ti   0 %          44    139  ?                         9305
 6  [ ]       GeForce GTX 1080 Ti   0 %          44    139

Or shortcut:

$ nvl

Python API:

import nvgpu

nvgpu.available_gpus()
# ['0', '2']

nvgpu.gpu_info()
[{'index': '0',
  'mem_total': 8119,
  'mem_used': 7881,
  'mem_used_percent': 97.06860450794433,
  'type': 'GeForce GTX 1070',
  'uuid': 'GPU-3aa99ee6-4a9f-470e-3798-70aaed942689'},
 {'index': '1',
  'mem_total': 11178,
  'mem_used': 10795,
  'mem_used_percent': 96.57362676686348,
  'type': 'GeForce GTX 1080 Ti',
  'uuid': 'GPU-60410ded-5218-7b06-9c7a-124b77a22447'},
 {'index': '2',
  'mem_total': 11178,
  'mem_used': 10789,
  'mem_used_percent': 96.51994990159241,
  'type': 'GeForce GTX 1080 Ti',
  'uuid': 'GPU-d0a77bd4-cc70-ca82-54d6-4e2018cfdca6'},
  ...
]

Web application with agents

There are multiple nodes. Agents take info from GPU and provide it in JSON via REST API. Master gathers info from other nodes and displays it in a HTML page. Agents can also display their status by default.

Agent

FLASK_APP=nvgpu.webapp flask run --host 0.0.0.0 --port 1080

Master

Set agents into a config file. Agent is specified either via a URL to a remote machine or 'self' for direct access to local machine. Remove 'self' if the machine itself does not have any GPU. Default is AGENTS = ['self'], so that agents also display their own status. Set AGENTS = [] to avoid this.

# nvgpu_master.cfg
AGENTS = [
         'self', # node01 - master - direct access without using HTTP
         'http://node02:1080',
         'http://node03:1080',
         'http://node04:1080',
]

NVGPU_CLUSTER_CFG=/path/to/nvgpu_master.cfg FLASK_APP=nvgpu.webapp flask run --host 0.0.0.0 --port 1080

Open the master in the web browser: http://node01:1080.

Installing as a service

On Ubuntu with systemd we can install the agents/master as as service to be ran automatically on system start.

# create an unprivileged system user
sudo useradd -r nvgpu

Copy nvgpu-agent.service to:

sudo vi /etc/systemd/system/nvgpu-agent.service

Set agents to the configuration file for the master:

sudo vi /etc/nvgpu.conf

AGENTS = [
         # direct access without using HTTP
         'self',
         'http://node01:1080',
         'http://node02:1080',
         'http://node03:1080',
         'http://node04:1080',
]

Set up and start the service:

# enable for automatic startup at boot
sudo systemctl enable nvgpu-agent.service
# start
sudo systemctl start nvgpu-agent.service 
# check the status
sudo systemctl status nvgpu-agent.service

# check the service
open http://localhost:1080

Author

Bohumír Zámečník, Rossum, Ltd.
License: MIT

TODO

order GPUs by priority (decreasing power, decreasing free memory)

Project details

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 3 - Alpha
Intended Audience
- Developers
- Science/Research
License
- OSI Approved :: MIT License
Operating System
- POSIX :: Linux
Programming Language
- Python :: 2
- Python :: 3

Release history Release notifications | RSS feed

This version

0.10.0

Mar 30, 2023

0.9.0

Jul 31, 2020

0.8.0

May 9, 2019

0.7.0

Oct 21, 2018

0.6.0

Oct 16, 2018

0.5.2

Jul 2, 2018

0.5.1

Jun 2, 2018

0.5

Jun 2, 2018

0.4

May 24, 2018

0.3

May 23, 2018

0.2

May 23, 2018

0.1.1

Apr 25, 2018

0.1

Apr 25, 2018

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nvgpu-0.10.0.tar.gz (8.4 kB view details)

Uploaded Mar 30, 2023 Source

File details

Details for the file nvgpu-0.10.0.tar.gz.

File metadata

Download URL: nvgpu-0.10.0.tar.gz
Upload date: Mar 30, 2023
Size: 8.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.9.13

File hashes

Hashes for nvgpu-0.10.0.tar.gz
Algorithm	Hash digest
SHA256	`c415f757e0c375357f8904a6ea0cee084ab0ce97ed11e4840f2c8839196b3918`
MD5	`83b892a015995031111df47561962709`
BLAKE2b-256	`1a955b99a5798b366ab242fe0b2190f3814b9321eb98c6e1e9c6b599b2b4ce84`

See more details on using hashes here.

nvgpu 0.10.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

`nvgpu` - NVIDIA GPU tools

Installing

Usage examples

Web application with agents

Agent

Master

Installing as a service

Author

TODO

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes

nvgpu 0.10.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

nvgpu - NVIDIA GPU tools

Installing

Usage examples

Web application with agents

Agent

Master

Installing as a service

Author

TODO

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes

`nvgpu` - NVIDIA GPU tools