A beautiful terminal UI for monitoring NVIDIA GPU metrics with bottleneck analysis
Project description
GPU Stats CLI
A beautiful, nvtop-inspired terminal UI for monitoring NVIDIA GPU metrics with intelligent bottleneck analysis.
Features
- Real-time GPU Monitoring - Live metrics from
nvidia-smi - Bottleneck Detection - Automatically identifies memory-bound vs compute-bound workloads
- Beautiful Terminal UI - Clean, informative interface built with Rich
- Multiple View Modes - Single GPU or grid view for multiple GPUs
- Historical Trends - 60-second sparkline graphs for all metrics
- Interactive Controls - Keyboard navigation and togglable details
- Demo Mode - Test the interface with simulated GPU data
Metrics Displayed
- VRAM Usage - Memory consumption with historical trend
- Memory Bandwidth - Data transfer speed (TB/s)
- Compute Utilization - Tensor/CUDA core usage (TFLOP/s)
- Temperature & Power - Real-time thermal and power draw
- Bottleneck Analysis - Identifies performance limitations with suggestions
Installation
From PyPI (once published)
pip install gpu-stats-cli
From Source
git clone https://github.com/yourusername/gpu-stats-cli.git
cd gpu-stats-cli
pip install -e .
Requirements
- Python 3.8 or higher
- NVIDIA GPU with drivers installed (for real monitoring)
nvidia-smiin PATH (usually comes with NVIDIA drivers)
Usage
Monitor Real GPUs
gpu-stats
Demo Mode (No GPU Required)
gpu-stats --demo
This runs a simulation with three GPUs showing different workload patterns:
- GPU 0: LLM inference (memory-bound with periodic spikes)
- GPU 1: Training workload (compute-bound, steady)
- GPU 2: Balanced workload
Keyboard Controls
| Key | Action |
|---|---|
q |
Quit |
g |
Toggle grid/single view |
b |
Toggle bottleneck details |
← or p |
Previous GPU (single view) |
→ or n |
Next GPU (single view) |
Understanding Bottlenecks
Memory-Bound
Your GPU's tensor cores are waiting for data from HBM (High Bandwidth Memory). Common in:
- LLM inference (autoregressive decoding)
- Small batch sizes
- Memory-intensive operations
Suggestions:
- Increase batch size
- Enable continuous batching
- Use KV cache optimization
Compute-Bound
Your GPU's compute units are fully utilized. This typically means:
- Well-optimized workload
- Large batch sizes
- Compute-heavy operations (e.g., training)
Example Output
Single GPU View
GPU 1/3
╭──────────────────────────────────────────────────────────────────────────────╮
│ GPU 0 NVIDIA A100-SXM4-80GB 67°C 324W │
│ │
│ MEMORY │
│ ────────────────────────────────────────────────────────────────────────── │
│ │
│ VRAM │
│ │
│ ████████████████████████░░░░░░░░ 78.0% | 62.4/80 GB │
│ ⣿⣿⣿⣶⣦⣤⣤⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀ │
│ 60s now │
│ │
│ THROUGHPUT │
│ ────────────────────────────────────────────────────────────────────────── │
│ │
│ Memory Bandwidth │
│ Speed of data transfer between HBM and processors │
│ │
│ ████████████████████████░░░░░░░░ 78.0% | 1.59/2.04 TB/s │
│ ⣿⣿⣿⣿⣶⣦⣤⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀ │
│ 60s now │
│ │
│ Compute │
│ Utilization of tensor cores and CUDA cores │
│ │
│ ████████░░░░░░░░░░░░░░░░░░░░░░░░ 28.0% | 87/312 TFLOP/s │
│ ⣿⣿⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀ │
│ 60s now │
│ │
│ ● MEMORY BOUND (78% mem, 28% compute) [Press 'b' for details] │
╰──────────────────────────────────────────────────────────────────────────────╯
g: Grid View | b: Toggle Details | q: Quit
Grid View
Shows all GPUs side-by-side in a compact layout, perfect for monitoring multiple GPUs simultaneously.
Development
Setup Development Environment
git clone https://github.com/yourusername/gpu-stats-cli.git
cd gpu-stats-cli
pip install -e ".[dev]"
Run Tests
pytest
Code Formatting
black gpu_stats/
ruff check gpu_stats/
Architecture
gpu_stats/components.py- Reusable UI components (sparklines, progress bars, displays)gpu_stats/monitor.py- Real GPU monitoring using nvidia-smigpu_stats/demo.py- Simulated GPU data for testing/demogpu_stats/cli.py- Command-line interface and entry point
License
MIT License - see LICENSE file for details
Author
Shikhar Gupta
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
Acknowledgments
Inspired by nvtop - an excellent ncurses-based GPU monitor.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file gpu_stats_cli-0.1.0.tar.gz.
File metadata
- Download URL: gpu_stats_cli-0.1.0.tar.gz
- Upload date:
- Size: 15.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bb9ed0d04757edc15c284c16a437b47daf029a9a7c03b1126e6a05da7b17e87e
|
|
| MD5 |
b12195787a83fde57f820ed1f40c970e
|
|
| BLAKE2b-256 |
f856b57ef28499026bcc8e151f7c21d5deff95747c6f5afa1a6c3650b9fec4e8
|
File details
Details for the file gpu_stats_cli-0.1.0-py3-none-any.whl.
File metadata
- Download URL: gpu_stats_cli-0.1.0-py3-none-any.whl
- Upload date:
- Size: 15.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a8e513dca0503db4fed72999b8a60a2c2d7645a62317bd95c1d5c52adc32359e
|
|
| MD5 |
231dfef59602193a069b81697b2a259c
|
|
| BLAKE2b-256 |
a1afdc489f0593423604b53c65c54c3a9025cba4893d47bc0c3c08ed5c1f44ad
|