Production-grade vLLM metrics monitoring TUI with persistent storage and Grafana-style visualizations
Project description
vllmtop
Production-grade TUI dashboard for monitoring vLLM inference servers. Real-time metrics, persistent storage, and Grafana-style visualizations in your terminal.
Features
- Live Dashboard - Real-time KPI cards, gauge bars, and sparklines with 60s rolling history
- Historical Explorer - Time-range analysis with interactive charts
- Request Breakdown - Request completion outcomes and statistics
- Cache & System Analytics - KV cache usage, prefix cache hit rates, and system metrics
- Configurable Alerts - Threshold-based alert rules with persistent alert history
- Multiple Graph Styles - Line, braille, and block charts (press
gto cycle) - Multi-Server Monitoring - Monitor multiple vLLM instances from a single dashboard
- Persistent Storage - All metrics saved to SQLite with automatic retention management
Installation
Requires Python 3.11+.
pip install .
For development:
pip install -e ".[dev]"
Quick Start
# Connect to a local vLLM server (default: http://localhost:8000)
vllmtop
# Connect to a remote server
vllmtop --url http://gpu-server:8000
# Use a config file
vllmtop --config config.yaml
# Custom poll interval and retention
vllmtop --url http://localhost:8000 --interval 2 --retention 60
Configuration
Copy the example config and customize:
cp config.example.yaml config.yaml
targets:
- url: http://localhost:8000
name: "GPU Server 1"
graph_style: "line" # line, braille, or block
poll_interval: 1.0 # seconds
db_path: "./vllm_metrics.db"
retention_days: 30
alert_rules:
- name: "KV Cache Critical"
metric: "vllm:kv_cache_usage_perc"
operator: ">"
threshold: 90.0
enabled: true
See config.example.yaml for the full configuration reference.
CLI Options
| Option | Default | Description |
|---|---|---|
--url |
http://localhost:8000 |
vLLM server URL |
--db |
./vllm_metrics.db |
SQLite database path |
--retention |
30 |
Data retention in days |
--interval |
1.0 |
Poll interval in seconds |
--config |
- | Path to YAML config file |
--graph-style |
line |
Graph style: line, braille, or block |
Keyboard Shortcuts
| Key | Action |
|---|---|
1-5 |
Switch tabs |
g |
Cycle graph style |
s |
Screenshot |
ctrl+p |
Command palette |
q |
Quit |
Metrics Tracked
- Requests - Running, waiting, swapped, queue time
- Cache - KV cache usage, GPU cache usage, prefix cache hit rate
- Tokens - Prompt tokens, generation tokens, totals
- Latency - Time-to-first-token (TTFT), time-per-output-token (TPOT), end-to-end latency
- Throughput - Prompt and generation throughput (tok/s)
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file vllm_mon-0.1.0.tar.gz.
File metadata
- Download URL: vllm_mon-0.1.0.tar.gz
- Upload date:
- Size: 18.1 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2b082fb5742b118f0eae8ce4ae3d93801706593c93e258de71f3fbe1d2836d4b
|
|
| MD5 |
cdca5d44f7d57c9ddb81e2058ee06776
|
|
| BLAKE2b-256 |
39e1f8bce1a464a4933d4941523ca968253ca006eaf652811ca228fce71cc8e9
|
File details
Details for the file vllm_mon-0.1.0-py3-none-any.whl.
File metadata
- Download URL: vllm_mon-0.1.0-py3-none-any.whl
- Upload date:
- Size: 45.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
13bc0e833a37804b48253ed664dd82585cd2aa2505bea778e13b53d5aa3c8193
|
|
| MD5 |
05345b102d91c42e572728765e09a4bd
|
|
| BLAKE2b-256 |
85f81701069546315d18409956659309d7e7e2ca30a29313e8f2355424a9f148
|