Skip to main content

A lightweight web dashboard for monitoring GPU usage

Project description

gpuview

LICENSE GitHub issues Python Versions PyPI CircleCI

GPU is an expensive resource, and deep learning practitioners have to monitor the health and usage of their GPUs, such as the temperature, memory, utilization, and the users. This can be done with tools like nvidia-smi and gpustat from the terminal or command-line. Often times, however, it is not convenient to ssh into servers to just check the GPU status. gpuview is meant to mitigate this by running a lightweight web dashboard on top of gpustat.

With gpuview one can monitor GPUs on the go, through a web browser. Moreover, multiple GPU servers can be registered into one gpuview dashboard and all stats are aggregated and accessible from one place.

The dashboard features live auto-refresh (every 3 seconds) and includes interactive tooltips, temperature-based color coding, and pause/resume controls for real-time GPU monitoring.

Dashboard view of nine GPUs across multiple servers:

Screenshot of gpuview

Setup

Python 3.9 or higher is required.

Install from PyPI:

pip install gpuview

[or] Install directly from repo:

pip install git+https://github.com/fgaim/gpuview.git@master

gpuview installs the latest version of gpustat from pypi, therefore, its commands are available from the terminal.

Usage

gpuview can be used in two modes as a temporary process or as a background service.

Run gpuview

Once gpuview is installed, it can be started as follows:

gpuview run --safe-zone

This will start the dashboard at http://0.0.0.0:9988.

By default, gpuview runs at 0.0.0.0 and port 9988, but these can be changed using --host and --port. The safe-zone option means report all details including usernames, but it can be turned off for security reasons.

For testing and development purposes, you can run gpuview with synthetic data:

gpuview run --demo

This displays fake GPU statistics and is useful when developing on systems without NVIDIA GPUs or when showcasing the dashboard.

API Endpoints

gpuview provides REST API endpoints for programmatic access:

  • GET /api/gpustat/self - Returns GPU statistics for the main host
  • GET /api/gpustat/all - Returns aggregated GPU statistics for all registered hosts

Legacy endpoints:

  • GET /gpustat - Returns GPU statistics for the local host (backward compatibility)

Run as a Service

On Linux systems with systemd (which is standard on most modern distributions like Ubuntu, RHEL, and Fedora), you can install gpuview to run as a permanent background service. This requires sudo privileges.

1. Install & Start the Service:
Run the start command. The first time you run this, it will also install the service. For backward compatibility, gpuview service (with no subcommand) defaults to start.

# Install and start the service with defaults settings
gpuview service --safe-zone

# Or apply custom configurations
gpuview service start [--host <ip>] [--port <port>] [--safe-zone] [--exclude-self]

The service will be configured with the options you provide (like --port) and set to autostart on boot.

2. Manage the Service: You can easily control the service with these built-in commands:

  • gpuview service status: Check if the service is running and see its recent logs.
  • gpuview service logs: View real-time service logs using journalctl.
  • gpuview service stop: Stop the background service.
  • gpuview service start: Start the service if it's been stopped (it will not re-install).
  • gpuview service delete: Stop, disable, and uninstall the service from your system.

Runtime options

There are a few important options in gpuview, use gpuview --help to see them all.

gpuview -h
  • run : Start gpuview dashboard server
    • --host : URL or IP address of host (default: 0.0.0.0)
    • --port : Port number to listen to (default: 9988)
    • --safe-zone : Safe to report all details, eg. usernames
    • --exclude-self : Don't report to others but to self-dashboard
    • --demo : Run with fake data for testing purposes
    • -d, --debug : Run server in debug mode (for developers)
  • add : Add a GPU host to the dashboard
    • --url : URL of host [IP:Port], eg. X.X.X.X:9988
    • --name : Optional readable name for the host, eg. Node101
  • remove : Remove a registered host from dashboard
    • --url : URL of host to remove, eg. X.X.X.X:9988
  • hosts : Print out all registered hosts
  • service : Manage the gpuview systemd service (Linux only). Defaults to 'start'.
    • start : Install (if needed) and start the service.
      • --host : (Optional) Host to bind (default: 0.0.0.0)
      • --port : (Optional) Port to bind (default: 9988)
      • --safe-zone : (Optional) Report all details, eg. usernames
      • --exclude-self : (Optional) Don't report to others
    • status : Check the status of the gpuview service.
    • stop : Stop the gpuview service.
    • logs : View service logs using journalctl.
    • delete : Stop, disable, and uninstall the service.
  • -v, --version : Print versions of gpuview and gpustat
  • -h, --help : Print help for command-line options

Monitoring multiple hosts

To aggregate the stats of multiple machines, they can be registered to one dashboard using their address and the port number running gpustat.

Register a host to monitor as follows:

gpuview add --url <ip:port> --name <name>

Remove a registered host as follows:

gpuview remove --url <ip:port> --name <name>

Display all registered hosts/nodes as follows:

gpuview hosts

The gpuview service needs to run in all hosts that will be monitored.

Tip: gpuview can be setup on a none GPU machine, such as laptops, to monitor remote GPU servers.

etc

Helpful tips related to the underlying performance are available at the gpustat repo.

For the sake of simplicity, gpuview does not have a user authentication in place. As a security measure, it does not report sensitive details such as user names by default. This can be changed if the service is running in a trusted network, using the --safe-zone option to report all details.

The --exclude-self option of the run command can be used to prevent other dashboards from getting stats of the current machine. This way the stats are shown only on the host's own dashboard.

Detailed view of GPUs across multiple servers:

Screenshot of gpuview

License

gpuview is licensed under the MIT License, which is a permissive open-source license that allows you to freely use, modify, and distribute this software.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gpuview-1.1.1.tar.gz (18.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gpuview-1.1.1-py3-none-any.whl (17.5 kB view details)

Uploaded Python 3

File details

Details for the file gpuview-1.1.1.tar.gz.

File metadata

  • Download URL: gpuview-1.1.1.tar.gz
  • Upload date:
  • Size: 18.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for gpuview-1.1.1.tar.gz
Algorithm Hash digest
SHA256 648b8de19a2a75a35b15d10e7e0497d731dd874e13f3028745bd194376bca881
MD5 9c4b26efc03da25ab93af231d2e2871a
BLAKE2b-256 971746f1bb11b464ff25ba2e3ad129795cbb24a43c5db251b409a939b7e17ea3

See more details on using hashes here.

File details

Details for the file gpuview-1.1.1-py3-none-any.whl.

File metadata

  • Download URL: gpuview-1.1.1-py3-none-any.whl
  • Upload date:
  • Size: 17.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for gpuview-1.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 7a99f28b31fe3776c87e774bcd40f3c2de1970c7acb7add35d3fd9070aa040f0
MD5 97a97c0034d648f01336504b3db0e2a8
BLAKE2b-256 9d66d4c5948369663cdc89ad52b81adf266ff8057eaeeb9f4186b3d42ffe1f9a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page