Skip to main content

A lightweight web dashboard for gpustat

Project description

gpuview

pypi Build Status license

GPU is an expensive resources and deep learning practitioners generally have to monitor the health and usage of their GPUs, such as the temperature, memory, utilization, and process users. This can be done with tools like nvidia-smi and gpustat from the terminal or command-line.

However, often times, it is not convenient to ssh into servers to just check the status, especially for a long running training that could last from hours to days. gpuview is meant to serve this exact purpose, it is lightweight web dashboard that runs on top of gpustat.

With gpuview one can monitor GPUs on the go through a web browser. What is more, multiple servers can be registered into one dashboard and their stats is aggregated and accessible from one place.

Screenshot: gpuview -cp

With gpuview you get the latest version of gpustat installed from pypi, so all the usual commands are directly available from the terminal. See gpustat -h and gpuview -h for all command-line options.

Setup

Install from PyPI:

pip install gpuview

[or] Install directly from repo:

pip install git+https://github.com/fgaim/gpuview.git@master

Usage

Once gpuview is installed, it can be started as follows:

$ gpuview start --safe-zone

This will start the dasboard at http://0.0.0.0:9988.

By default, gpuview listens to IP 0.0.0.0 and port 9988, but these can be changed using --host and --port. The safe-zone option implies reporting all detials including user names, but it can be turned off for security reasons.

Execute gpuview -h to see runtime options.

  • start : Start dashboard server
    • --host : Name or IP address of host (default: 0.0.0.0)
    • --port : Port number to listen to (default: 9988)
    • --safe-zone : Safe to report all details including user names
    • --exclude-self : Don't report to others but to self dashboard
    • -d, --debug : Run server in debug mode (for developers)
  • add : Add a GPU host to dashboard
    • --url : URL of host [IP:Port], eg. X.X.X.X:9988
    • --name : Optional readable name for the host, eg. Node101
  • remove : Remove a registered host from dashboard
    • --url : URL of host to remove, eg. X.X.X.X:9988
  • -v, --version : Print versions of gpuview and gpustat
  • -h, --help : Print help for command-line options

Run as Service

To permanently run gpuview it needs to be started as a background service. This can be done using nohup and & as follows:

sudo nohup gpuview start --safe-zone &

Better way of handling this is coming soon...

Monitoring multiple hosts

To aggregate the stats of multiple machines, they can be registered to one dashboard using their address and the port number running gpustat.

Add a host as follows:

gpuview add --url <ip:port> --name <name>

Remove a registered host as follows:

gpuview remove --url <ip:port> --name <name>

Note: gpuview should be run in all hosts in addition to the controller, which by itself can be a none GPU machine.

etc

Helpful tips related to the underlying performance are available at the gpustat repo.

For the sake of similicity, gpuview does not have user authentication feature, therefore, by default it does not report sensitive details such as user and process names as security measure. However, the service is being run in a trusted network then all information can be reported using the --safe-zone option of the start command. Similarly, the --exclude-self option can be used to prevent other dashboards from getting gpuview of the current machine. This way the stats of the host are only shown on its own dashboard.

Thumbnail view of GPUs across multiple hosts.

Screenshot: gpuview

Detailed view of GPUs across multiple hosts.

Screenshot: gpuview

License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gpuview-0.1.0.tar.gz (8.7 kB view details)

Uploaded Source

Built Distribution

gpuview-0.1.0-py3-none-any.whl (11.1 kB view details)

Uploaded Python 3

File details

Details for the file gpuview-0.1.0.tar.gz.

File metadata

  • Download URL: gpuview-0.1.0.tar.gz
  • Upload date:
  • Size: 8.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.18.4 setuptools/40.4.3 requests-toolbelt/0.8.0 tqdm/4.26.0 CPython/3.6.5

File hashes

Hashes for gpuview-0.1.0.tar.gz
Algorithm Hash digest
SHA256 974978727f49baf8ce5d9b251e43daa1f48732cf210da502141b859610306e66
MD5 251ad8884d61b0a505317b231d08017c
BLAKE2b-256 dd8b84763fd34eebe688b8ebb4657d9241ab86620ac3d11f987ca5c1d7b9ef08

See more details on using hashes here.

File details

Details for the file gpuview-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: gpuview-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 11.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.18.4 setuptools/40.4.3 requests-toolbelt/0.8.0 tqdm/4.26.0 CPython/3.6.5

File hashes

Hashes for gpuview-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c9d92d18cebd7cfd10831b0aade892f7472714ed251c6f2b78647481e31ecf6a
MD5 611ea73e8bee3b4c69225a5dc7ae9d18
BLAKE2b-256 a6d8eb9d454e94738f17163c536070013fbabf0e0d5646dfe920729bb0da6327

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page