Skip to main content

A beautiful terminal UI for monitoring NVIDIA GPU metrics with bottleneck analysis

Project description

GPU Stats CLI

A beautiful, nvtop-inspired terminal UI for monitoring NVIDIA GPU metrics with intelligent bottleneck analysis.

License Python

Features

  • Real-time GPU Monitoring - Live metrics from nvidia-smi
  • Bottleneck Detection - Automatically identifies memory-bound vs compute-bound workloads
  • Beautiful Terminal UI - Clean, informative interface built with Rich
  • Multiple View Modes - Single GPU or grid view for multiple GPUs
  • Historical Trends - 60-second sparkline graphs for all metrics
  • Interactive Controls - Keyboard navigation and togglable details
  • Demo Mode - Test the interface with simulated GPU data

Metrics Displayed

  • VRAM Usage - Memory consumption with historical trend
  • Memory Bandwidth - Data transfer speed (TB/s)
  • Compute Utilization - Tensor/CUDA core usage (TFLOP/s)
  • Temperature & Power - Real-time thermal and power draw
  • Bottleneck Analysis - Identifies performance limitations with suggestions

Installation

From PyPI (once published)

pip install gpu-stats-cli

From Source

git clone https://github.com/yourusername/gpu-stats-cli.git
cd gpu-stats-cli
pip install -e .

Requirements

  • Python 3.8 or higher
  • NVIDIA GPU with drivers installed (for real monitoring)
  • nvidia-smi in PATH (usually comes with NVIDIA drivers)

Usage

Monitor Real GPUs

gpu-stats

Demo Mode (No GPU Required)

gpu-stats --demo

This runs a simulation with three GPUs showing different workload patterns:

  • GPU 0: LLM inference (memory-bound with periodic spikes)
  • GPU 1: Training workload (compute-bound, steady)
  • GPU 2: Balanced workload

Keyboard Controls

Key Action
q Quit
g Toggle grid/single view
b Toggle bottleneck details
or p Previous GPU (single view)
or n Next GPU (single view)

Understanding Bottlenecks

Memory-Bound

Your GPU's tensor cores are waiting for data from HBM (High Bandwidth Memory). Common in:

  • LLM inference (autoregressive decoding)
  • Small batch sizes
  • Memory-intensive operations

Suggestions:

  • Increase batch size
  • Enable continuous batching
  • Use KV cache optimization

Compute-Bound

Your GPU's compute units are fully utilized. This typically means:

  • Well-optimized workload
  • Large batch sizes
  • Compute-heavy operations (e.g., training)

Example Output

Single GPU View

                              GPU 1/3

╭──────────────────────────────────────────────────────────────────────────────╮
│ GPU 0  NVIDIA A100-SXM4-80GB                                   67°C  324W    │
│                                                                              │
│ MEMORY                                                                       │
│ ──────────────────────────────────────────────────────────────────────────  │
│                                                                              │
│ VRAM                                                                         │
│                                                                              │
│ ████████████████████████░░░░░░░░ 78.0% | 62.4/80 GB                         │
│ ⣿⣿⣿⣶⣦⣤⣤⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀                                                 │
│ 60s                          now                                            │
│                                                                              │
│ THROUGHPUT                                                                   │
│ ──────────────────────────────────────────────────────────────────────────  │
│                                                                              │
│ Memory Bandwidth                                                             │
│ Speed of data transfer between HBM and processors                           │
│                                                                              │
│ ████████████████████████░░░░░░░░ 78.0% | 1.59/2.04 TB/s                     │
│ ⣿⣿⣿⣿⣶⣦⣤⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀                                               │
│ 60s                          now                                            │
│                                                                              │
│ Compute                                                                      │
│ Utilization of tensor cores and CUDA cores                                  │
│                                                                              │
│ ████████░░░░░░░░░░░░░░░░░░░░░░░░ 28.0% | 87/312 TFLOP/s                     │
│ ⣿⣿⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀                                               │
│ 60s                          now                                            │
│                                                                              │
│ ● MEMORY BOUND (78% mem, 28% compute)  [Press 'b' for details]             │
╰──────────────────────────────────────────────────────────────────────────────╯

        g: Grid View  |  b: Toggle Details  |  q: Quit

Grid View

Shows all GPUs side-by-side in a compact layout, perfect for monitoring multiple GPUs simultaneously.

Development

Setup Development Environment

git clone https://github.com/yourusername/gpu-stats-cli.git
cd gpu-stats-cli
pip install -e ".[dev]"

Run Tests

pytest

Code Formatting

black gpu_stats/
ruff check gpu_stats/

Architecture

  • gpu_stats/components.py - Reusable UI components (sparklines, progress bars, displays)
  • gpu_stats/monitor.py - Real GPU monitoring using nvidia-smi
  • gpu_stats/demo.py - Simulated GPU data for testing/demo
  • gpu_stats/cli.py - Command-line interface and entry point

License

MIT License - see LICENSE file for details

Author

Shikhar Gupta

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Acknowledgments

Inspired by nvtop - an excellent ncurses-based GPU monitor.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gpu_stats_cli-0.1.0.tar.gz (15.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gpu_stats_cli-0.1.0-py3-none-any.whl (15.8 kB view details)

Uploaded Python 3

File details

Details for the file gpu_stats_cli-0.1.0.tar.gz.

File metadata

  • Download URL: gpu_stats_cli-0.1.0.tar.gz
  • Upload date:
  • Size: 15.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.7

File hashes

Hashes for gpu_stats_cli-0.1.0.tar.gz
Algorithm Hash digest
SHA256 bb9ed0d04757edc15c284c16a437b47daf029a9a7c03b1126e6a05da7b17e87e
MD5 b12195787a83fde57f820ed1f40c970e
BLAKE2b-256 f856b57ef28499026bcc8e151f7c21d5deff95747c6f5afa1a6c3650b9fec4e8

See more details on using hashes here.

File details

Details for the file gpu_stats_cli-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: gpu_stats_cli-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 15.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.7

File hashes

Hashes for gpu_stats_cli-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a8e513dca0503db4fed72999b8a60a2c2d7645a62317bd95c1d5c52adc32359e
MD5 231dfef59602193a069b81697b2a259c
BLAKE2b-256 a1afdc489f0593423604b53c65c54c3a9025cba4893d47bc0c3c08ed5c1f44ad

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page