The htop for GPU memory. Beautiful. Zero-friction. NVIDIA-first.
Project description
vramtop
The htop for GPU memory. Beautiful. Zero-friction. NVIDIA-first.
A terminal UI for monitoring GPU memory usage across all your NVIDIA GPUs. See per-process VRAM consumption, memory phase detection, OOM prediction, and framework-aware enrichment -- all without instrumenting your code.
Install
pip install vramtop
Requires Python 3.10+ and NVIDIA drivers (make sure nvidia-smi works).
Optional extras
pip install vramtop[pelt] # PELT changepoint detection (requires ruptures + numpy)
Quick start
vramtop
That's it. vramtop discovers all NVIDIA GPUs and shows live memory usage.
Features
- Per-process VRAM tracking -- see exactly which processes use how much GPU memory
- Memory phase detection -- classifies each process as stable, growing, shrinking, or volatile
- OOM prediction -- warns before your GPU runs out of memory, with time-to-OOM estimates
- Survival verdicts -- per-process OK / TIGHT / OOM risk assessment with spike detection
- Framework detection -- automatically identifies PyTorch, JAX, vLLM, Ollama, SGLang, llama.cpp, TGI
- Model file scanning -- detects which model files are loaded from /proc/fd
- Container awareness -- shows Docker/Podman container IDs per process
- PELT changepoint analysis -- optional segment-level memory phase analysis with labeled segments
- Deep mode -- get PyTorch-internal memory stats (allocated vs reserved vs active) via IPC
- HTTP scraping -- pulls KV cache usage from vLLM, Ollama, SGLang, llama.cpp metrics endpoints
- 6 themes -- dark (GitHub-dark), light, nord, catppuccin, dracula, solarized
- CSV export -- log GPU and process data to CSV for offline analysis
- SVG screenshots -- save terminal screenshots as SVG
- Accessibility -- respects
NO_COLOR,--accessiblemode with text labels
Usage
Monitor (default)
vramtop # Launch the TUI
vramtop --refresh-rate 2 # Poll every 2 seconds (default: 1)
vramtop --no-kill # Disable process kill functionality
vramtop --accessible # Text labels instead of Unicode symbols
vramtop --export-csv gpu.csv # Log data to CSV while monitoring
vramtop --config path/to/config.toml
Wrap (deep mode)
Run any command with deep mode reporting enabled:
vramtop wrap -- python train.py
This sets VRAMTOP_REPORT=1 in the subprocess environment. If your training script calls vramtop.report(), it will send PyTorch memory internals to vramtop over a Unix socket.
Deep mode (in your code)
Add two lines to your PyTorch script to report allocator-level memory stats:
import vramtop
vramtop.report()
# ... your training code ...
vramtop's TUI will automatically discover the reporter socket and show allocated, reserved, and active memory breakdowns in the detail panel.
Keyboard shortcuts
| Key | Action |
|---|---|
q |
Quit |
d |
Open detail panel for selected process |
k |
Kill selected process (SIGTERM, then SIGKILL) |
t |
Cycle through themes |
s |
Save SVG screenshot |
? |
Show help |
| Up/Down | Navigate process list |
Configuration
Config file location: ~/.config/vramtop/config.toml
Reload without restarting: send SIGHUP (kill -HUP $(pgrep vramtop)).
[general]
refresh_rate = 1.0 # Poll interval in seconds (0.5-10)
history_length = 300 # Samples to keep for sparkline/analysis
[display]
theme = "dark" # dark, light, nord, catppuccin, dracula, solarized
layout = "auto" # auto, full, compact, mini
show_other_users = false
enable_kill = true
phase_indicator = true
[alerts]
oom_warning_seconds = 300
temp_warning_celsius = 85
oom_min_confidence = 0.3
[scraping]
enable = true
rate_limit_seconds = 5
timeout_ms = 500
allowed_ports = [8000, 8080, 11434, 3000]
[oom_prediction]
algorithm = "variance" # variance or pelt
min_sustained_samples = 10
min_rate_mb_per_sec = 1.0
Layout modes
vramtop auto-detects terminal size and picks the best layout:
| Mode | Min size | Shows |
|---|---|---|
| Full | 120x30 | GPU cards, memory bars, sparklines, process tables, phase badges |
| Compact | 80x20 | GPU cards, memory bars, process tables |
| Mini | 40x10 | Minimal GPU summary |
Override with layout = "full" in config or let "auto" handle it.
How it works
vramtop reads GPU data through NVML (nvidia-ml-py) -- the same library that powers nvidia-smi. No root access needed, no kernel modules, no LD_PRELOAD hacks.
Data sources by tier:
- NVML (always) -- total/used/free VRAM, per-process memory, temperature, power, clocks
/procenrichment (always) -- framework detection from cmdline/maps, model files from fd, container from cgroup- HTTP scraping (opt-in) -- KV cache metrics from inference server endpoints (localhost only, port-owner verified)
- Deep mode IPC (opt-in) -- PyTorch allocator stats via Unix domain socket
Security model:
- All
/procreads are same-UID only - HTTP scraping enforces: localhost-only, port-owner verification, rate limiting, no redirects, 64KB size cap
- Kill requires confirmation (SIGTERM first, SIGKILL only after 5s timeout)
- All external strings sanitized (ANSI/control char stripping, 256-char truncation)
Requirements
- Python 3.10+
- NVIDIA GPU with drivers installed
- Linux (uses
/procfor enrichment)
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file vramtop-0.1.0.tar.gz.
File metadata
- Download URL: vramtop-0.1.0.tar.gz
- Upload date:
- Size: 147.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
dc1f1f1d530604e1052f083c30e0aa4e773972a5ff10f70c30e4252958cc4144
|
|
| MD5 |
30fd739b7047d1c4a4769b0eb4d74b84
|
|
| BLAKE2b-256 |
c715662a79cbdd6eaa8cd15beac13eae8ffa1fb38f3475a31832065654c14dc9
|
File details
Details for the file vramtop-0.1.0-py3-none-any.whl.
File metadata
- Download URL: vramtop-0.1.0-py3-none-any.whl
- Upload date:
- Size: 90.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f47b9ad9472fba62d2354afef733ae398ea2992bb273fdc155dd2ce384a5942a
|
|
| MD5 |
5efd6a3b9656966ec10b836481b45ea7
|
|
| BLAKE2b-256 |
52f44072b15cf8002540099d76f5d5c46d4cc1ee77cffeca6da5fa6b4b850899
|