Skip to main content

A btop-style terminal UI for monitoring a vLLM instance and its GPU in real time.

Project description

vllmpytop

A btop-style terminal UI for monitoring a running vLLM instance and its GPU in real time. Hand-rolled braille charts, a responsive curses layout, and a non-blocking background poller so the UI never stalls on network or NVML latency.

╭─┐¹ gpu ┌──────────────────────────────────────────────────────────────────╮
│ NVIDIA GeForce RTX 5090   util  86%   50°C  319/600W  SM 2857MHz  fan 31%   │
│ ⣿⣿⣿⣿ … braille utilisation chart …                                          │
│ VRAM 27.3GB/31.8GB ████████████████████░░░  86%                             │
│ PWR  ████████████░░░░░░░░░░░░░░░░░░░░░░░░░░  53%                             │
╰─────────────────────────────────────────────────────────────────────────────╯
╭─┐² throughput ┌────────────╮╭─┐³ requests ┌──────────────╮
│ gen 149 tok/s   ⣀⣤⣶⣿        ││ running ████░░ 1            │
│ prompt 3.5k tok/s          ││ waiting ██████ 2            │
╰─┘ tok/s └──────────────────╯╰─┘ queue └───────────────────╯
╭─┐⁴ latency ┌───────────────╮╭─┐⁵ cache ┌──────────────────╮
│ TTFT 964ms  TPOT 6ms …     ││ KV █░░░  3%  prefix 0.0%    │
╰─┘ recent avg └─────────────╯╰─────────────────────────────╯

Rounded corners, superscript panel numbers in the title tabs, and a secondary label on the bottom edge — matching btop's box style.

What it shows

  • GPU (via NVML / pynvml): utilisation %, VRAM used/total, temperature, power draw vs. limit, SM clock, fan — with green/yellow/red thresholds.
  • Throughput: generation tok/s and prompt tok/s (rates derived from vLLM counters), as big numbers + braille charts.
  • Requests / Queue: running vs. waiting requests and preemptions.
  • Latency (recent average over the last poll interval — far more useful live than the cumulative average): TTFT, inter-token (TPOT), end-to-end, queue time.
  • Cache: KV-cache usage % and prefix-cache hit rate.

Data comes from vLLM's Prometheus /metrics endpoint plus in-process NVML polling. If vLLM goes away (e.g. a container restart) the UI shows a disconnect banner and keeps the GPU panel live, then reconnects automatically.

Install

Requires Python 3.10+ on Linux (curses is stdlib). A working NVIDIA driver is needed for the GPU panel.

pip install vllmpytop
# or from a checkout:
pip install .
# or, for development:
pip install -e ".[dev]"

This installs two equivalent commands — vllmpytop and the shorter alias vllmtop.

Dependencies: nvidia-ml-py (NVML bindings) and prometheus-client (exposition parser). The /metrics fetch uses stdlib urllib.

Usage

vllmpytop                            # monitor http://localhost:8000
vllmpytop --url http://host:8000     # a remote vLLM server
vllmpytop --interval 0.5             # poll twice a second
vllmpytop --no-gpu                   # skip the GPU panel
vllmtop                              # 'vllmtop' is an equivalent alias
python -m vllmpytop                  # same thing, without the entry point

The server URL can also be set via the VLLMTOP_URL environment variable.

Options

Flag Default Description
--url http://localhost:8000 vLLM base URL (env VLLMTOP_URL)
--interval 1.0 poll interval in seconds
--gpu-index 0 NVML GPU index
--no-gpu off disable the GPU panel
--dump-json off collect one snapshot, print JSON, exit (no TTY)

Keybindings

Key Action
q / Esc quit
+ / - faster / slower refresh
p pause / resume polling
15 toggle a panel on/off (¹gpu ²throughput ³requests ⁴latency ⁵cache)
h / ? toggle help overlay

Each panel's title carries a superscript number (btop-style) showing the key that toggles it. Hiding panels reflows the rest to fill the freed space.

Headless smoke test

--dump-json collects two snapshots an interval apart (so rates are populated), prints the result as JSON, and exits. Works without a TTY — handy for CI or verifying connectivity:

python -m vllmpytop --dump-json --url http://localhost:8000

How it works

  • A background poller thread scrapes /metrics and polls NVML every interval seconds, storing the latest combined snapshot under a lock. This keeps all I/O latency off the render path.
  • The UI loop wakes on a short tick (250 ms), reads the latest snapshot, appends derived values (rates, recent-average latencies) to per-series ring buffers, and redraws — so render cadence is independent of poll cadence.
  • Counters → rates: Δvalue / Δt, guarded against Δt ≤ 0 and counter resets. Histograms → recent average: Δsum / Δcount between polls.
  • Braille charts: each cell is a 2×4 Unicode braille dot matrix, giving 2w × 4h-dot resolution for the smooth btop look.

Development

pytest        # parser-against-fixture, rate math, braille rendering

License

MIT — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vllmpytop-0.1.1.tar.gz (27.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vllmpytop-0.1.1-py3-none-any.whl (27.7 kB view details)

Uploaded Python 3

File details

Details for the file vllmpytop-0.1.1.tar.gz.

File metadata

  • Download URL: vllmpytop-0.1.1.tar.gz
  • Upload date:
  • Size: 27.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for vllmpytop-0.1.1.tar.gz
Algorithm Hash digest
SHA256 f64ce024e6720cdf38cd272b42ec1f9fd68eb3ec12863e38d6a901142ea721d7
MD5 c6d58c5e1b508f7795500ec5d0e92401
BLAKE2b-256 f7b465dd22a35225997e06c4f643197768338b5e3183cfa1f86e91d1e3103ac2

See more details on using hashes here.

File details

Details for the file vllmpytop-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: vllmpytop-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 27.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for vllmpytop-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 ed06d3c6a586615599f1ae22642bbd72f1be36f8285536b7938274b71d42e996
MD5 4ed112675ff07adf60f1c1f4872f3f70
BLAKE2b-256 2f75ff607a3edf4e16c7c5b20b113a48ca17b7b8c9700a829ccea451a9b3a5c9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page