Skip to main content

Production-grade vLLM metrics monitoring TUI with persistent storage and Grafana-style visualizations

Project description

vllmtop

Production-grade TUI dashboard for monitoring vLLM inference servers. Real-time metrics, persistent storage, and Grafana-style visualizations in your terminal.

Features

  • Live Dashboard - Real-time KPI cards, gauge bars, and sparklines with 60s rolling history
  • Historical Explorer - Time-range analysis with interactive charts
  • Request Breakdown - Request completion outcomes and statistics
  • Cache & System Analytics - KV cache usage, prefix cache hit rates, and system metrics
  • Configurable Alerts - Threshold-based alert rules with persistent alert history
  • Multiple Graph Styles - Line, braille, and block charts (press g to cycle)
  • Multi-Server Monitoring - Monitor multiple vLLM instances from a single dashboard
  • Persistent Storage - All metrics saved to SQLite with automatic retention management

Installation

Requires Python 3.11+.

pip install .

For development:

pip install -e ".[dev]"

Quick Start

# Connect to a local vLLM server (default: http://localhost:8000)
vllmtop

# Connect to a remote server
vllmtop --url http://gpu-server:8000

# Use a config file
vllmtop --config config.yaml

# Custom poll interval and retention
vllmtop --url http://localhost:8000 --interval 2 --retention 60

Configuration

Copy the example config and customize:

cp config.example.yaml config.yaml
targets:
  - url: http://localhost:8000
    name: "GPU Server 1"

graph_style: "line"       # line, braille, or block
poll_interval: 1.0        # seconds
db_path: "./vllm_metrics.db"
retention_days: 30

alert_rules:
  - name: "KV Cache Critical"
    metric: "vllm:kv_cache_usage_perc"
    operator: ">"
    threshold: 90.0
    enabled: true

See config.example.yaml for the full configuration reference.

CLI Options

Option Default Description
--url http://localhost:8000 vLLM server URL
--db ./vllm_metrics.db SQLite database path
--retention 30 Data retention in days
--interval 1.0 Poll interval in seconds
--config - Path to YAML config file
--graph-style line Graph style: line, braille, or block

Keyboard Shortcuts

Key Action
1-5 Switch tabs
g Cycle graph style
s Screenshot
ctrl+p Command palette
q Quit

Metrics Tracked

  • Requests - Running, waiting, swapped, queue time
  • Cache - KV cache usage, GPU cache usage, prefix cache hit rate
  • Tokens - Prompt tokens, generation tokens, totals
  • Latency - Time-to-first-token (TTFT), time-per-output-token (TPOT), end-to-end latency
  • Throughput - Prompt and generation throughput (tok/s)

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vllm_mon-0.1.0.tar.gz (18.1 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vllm_mon-0.1.0-py3-none-any.whl (45.4 kB view details)

Uploaded Python 3

File details

Details for the file vllm_mon-0.1.0.tar.gz.

File metadata

  • Download URL: vllm_mon-0.1.0.tar.gz
  • Upload date:
  • Size: 18.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for vllm_mon-0.1.0.tar.gz
Algorithm Hash digest
SHA256 2b082fb5742b118f0eae8ce4ae3d93801706593c93e258de71f3fbe1d2836d4b
MD5 cdca5d44f7d57c9ddb81e2058ee06776
BLAKE2b-256 39e1f8bce1a464a4933d4941523ca968253ca006eaf652811ca228fce71cc8e9

See more details on using hashes here.

File details

Details for the file vllm_mon-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: vllm_mon-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 45.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for vllm_mon-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 13bc0e833a37804b48253ed664dd82585cd2aa2505bea778e13b53d5aa3c8193
MD5 05345b102d91c42e572728765e09a4bd
BLAKE2b-256 85f81701069546315d18409956659309d7e7e2ca30a29313e8f2355424a9f148

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page