Skip to main content

A btop-style terminal UI for monitoring a vLLM instance and its GPU in real time.

Project description

vllmtop/vllmpytop

PyPI Python versions License: MIT

Inspired by the excellent tui style and functionallity of btop, vllmtop is a cli resource montior for monitoring a vLLM instance and its GPU in real time. Simple braille charts, a responsive curses layout, and a non-blocking background poller so the UI never stalls on network or NVML latency.

tui

Quickstart

pip install vllmpytop    # install from pypi
vllmtop                  # or vllmpytop

What it shows

  • GPU (via NVML / pynvml): utilisation %, VRAM used/total, temperature, power draw vs. limit, SM clock, fan — with green/yellow/red thresholds. Its chart is a btop-style mirrored graph: GPU utilisation grows up from a centre line, and the request count grows down from it as a stacked two-band series — running (green) nearest the centre, waiting (magenta) beyond. The same panel folds in a compact vLLM summary: the served model, uptime, KV-cache precision (cache_dtype), requests served, prefix-caching on/off, KV blocks, GPU-memory target, and engine awake/sleeping state.
  • Throughput: generation tok/s and prompt tok/s (rates derived from vLLM counters), as a mirrored chart in btop's network colours — gen (purple) grows up from the centre line, prompt/prefill (pink) grows down — each labelled with its current value.
  • Requests / Queue: running, waiting, and KV-cache-usage bars, and — when a log source is configured (--docker <container> or --log-file <path>) — a live feed beneath them of the HTTP calls vLLM serves: age, status, method, endpoint, client, newest first (like btop's process list). This is the request envelope only — vLLM doesn't log prompt/response text unless started with --enable-log-requests, so no prompt text is shown.
  • Perf (recent average over the last poll interval — far more useful live than the cumulative average): TTFT, inter-token (TPOT), end-to-end, and queue latencies, plus the KV-cache usage chart and prefix-cache hit rate.

Data comes from vLLM's Prometheus /metrics endpoint plus in-process NVML polling. If vLLM goes away (e.g. a container restart) the UI shows a disconnect banner and keeps the GPU panel live, then reconnects automatically.

Install

Available on PyPI: pypi.org/project/vllmpytop.

Requires Python 3.10+ on Linux (curses is stdlib). A working NVIDIA driver is needed for the GPU panel.

install from pypi

pip install vllmpytop

install locally

# locally from a checkout:
pip install .

# / for development:
pip install -e ".[dev]"

This installs two equivalent commands — vllmpytop and the shorter alias vllmtop.

Dependencies: nvidia-ml-py (NVML bindings) and prometheus-client (exposition parser). The /metrics fetch uses stdlib urllib.

Usage

vllmtop                            # monitor http://localhost:8000
vllmtop --url http://host:8000     # a remote vLLM server
vllmtop --interval 0.5             # poll twice a second
vllmtop --no-gpu                   # skip the GPU panel
vllmtop --docker vllm-server       # + call feed in the requests panel (docker logs)
vllmtop --log-file /var/log/vllm.log   # + call feed from a log file
python -m vllmpytop                # same thing, without the entry point

The server URL can also be set via the VLLMTOP_URL environment variable.

Options

Flag Default Description
--url http://localhost:8000 vLLM base URL (env VLLMTOP_URL)
--interval 1.0 poll interval in seconds
--gpu-index 0 NVML GPU index
--no-gpu off disable the GPU panel
--docker stream docker logs -f <container> for the requests call feed
--log-file tail this access-log file for the requests call feed (env VLLMTOP_LOG_FILE)
--dump-json off collect one snapshot, print JSON, exit (no TTY)

Keybindings

Key Action
q / Esc quit
+ / - faster / slower refresh
p pause / resume polling
14 toggle a panel on/off (¹gpu ²throughput ³requests ⁴perf)
h / ? toggle help overlay

Each panel's title carries a superscript number (btop-style) showing the key that toggles it. Hiding panels reflows the rest to fill the freed space.

Headless smoke test

--dump-json collects two snapshots an interval apart (so rates are populated), prints the result as JSON, and exits. Works without a TTY — handy for CI or verifying connectivity:

python -m vllmpytop --dump-json --url http://localhost:8000

How it works

  • A background poller thread scrapes /metrics and polls NVML every interval seconds, storing the latest combined snapshot under a lock. This keeps all I/O latency off the render path.
  • The UI loop wakes on a short tick (250 ms), reads the latest snapshot, appends derived values (rates, recent-average latencies) to per-series ring buffers, and redraws — so render cadence is independent of poll cadence.
  • Counters → rates: Δvalue / Δt, guarded against Δt ≤ 0 and counter resets. Histograms → recent average: Δsum / Δcount between polls.
  • Braille charts: each cell is a 2×4 Unicode braille dot matrix, giving 2w × 4h-dot resolution for the smooth btop look.

Development

pytest        # parser-against-fixture, rate math, braille rendering

License

MIT — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vllmpytop-0.2.0.tar.gz (37.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vllmpytop-0.2.0-py3-none-any.whl (37.3 kB view details)

Uploaded Python 3

File details

Details for the file vllmpytop-0.2.0.tar.gz.

File metadata

  • Download URL: vllmpytop-0.2.0.tar.gz
  • Upload date:
  • Size: 37.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for vllmpytop-0.2.0.tar.gz
Algorithm Hash digest
SHA256 8e4c0bf18bcadc75f7716a6a6d30816876ecc77c01b0e245f47e2533e144f587
MD5 3da6cb0c65c516b823cb0f0b06852556
BLAKE2b-256 53c32bbe9c346238d03192a40984207828eda434e26e05b161edec6700078ced

See more details on using hashes here.

File details

Details for the file vllmpytop-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: vllmpytop-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 37.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for vllmpytop-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 07c9b7ff2155fd7539026c03084513207d9c3e426ac90c9894b81049f7cb1207
MD5 bfbe87d12b4974e0cca6bcba6ac14938
BLAKE2b-256 6fb51f1d96844396306ac16f45f7059ddaf7abe4d0510413d6b99c7aa6135ad2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page