Skip to main content

Lightweight monitoring dashboard for vLLM inference servers

Project description

⚡ vLLM Metrics Monitor

Real-time monitoring dashboard for vLLM inference servers.

PyPI License: MIT

Dashboard Preview

✨ Features

  • 📊 9 real-time time-series charts — Running/Waiting Requests, Requests/s, Output/Input Tokens/s, KV Cache, Cache Hit Rate, Latency (TTFT/ITL/E2E), Per-Engine Requests
  • 🃏 Live status cards — Key metrics at a glance
  • 🕐 Selectable time range — 15m / 1h / 6h / 24h
  • 💾 SQLite persistence — Data stored at ~/.vmm/data.db, survives restarts
  • Zero external Python dependencies — Standard library only
  • 🐳 Per-engine breakdown — Individual engine status table and chart

🚀 Installation

Recommended — uv:

uv tool install vllm-metrics-monitor

Or with pip:

pip install vllm-metrics-monitor

📖 Usage

# Start monitoring
vmm http://your-vllm:8000/metrics

Open http://localhost:8080 in your browser.

vmm [URL] [OPTIONS]

Positional:
  URL                   vLLM Prometheus metrics endpoint
                        (default: http://localhost:8000/metrics)

Options:
  -p, --port PORT       Dashboard HTTP port (default: 8080)
  -i, --interval SEC    Scrape interval in seconds (default: 3)
  --retention HOURS     Data retention period (default: 24)
  --db PATH             SQLite database path (default: ~/.vmm/data.db)
  --reset               Delete existing database and start fresh
  --debug               Enable debug logging

Examples

# Basic
vmm http://vllm-server:8000/metrics

# Custom port and slower scrape
vmm http://vllm-server:8000/metrics -p 9090 -i 5

# Fresh start
vmm http://vllm-server:8000/metrics --reset

# Longer retention with custom db path
vmm http://vllm-server:8000/metrics --retention 72 --db /data/vmm.db

🏗️ Architecture

graph LR
    A[vLLM /metrics] -->|scrape every 3s| B[vmm]
    B --> C[Scraper Thread]
    C --> D[(SQLite<br/>~/.vmm/data.db)]
    B --> E[HTTP Server]
    E -->|JSON API| F[Browser<br/>Chart.js Dashboard]

🔌 API

Endpoint Description
GET / Dashboard UI
GET /api/current Latest metrics snapshot with computed rates
GET /api/history?minutes=N Time-series data for the last N minutes

📈 Monitored Metrics

Metric Source Type
Running Requests vllm:num_requests_running Gauge
Waiting Requests vllm:num_requests_waiting Gauge
KV Cache Usage vllm:kv_cache_usage_perc Gauge
Cache Hit Rate prompt_tokens_cached / prompt_tokens Derived
Requests/s vllm:request_success_total delta Counter rate
Output Tokens/s vllm:generation_tokens_total delta Counter rate
Input Tokens/s vllm:prompt_tokens_total delta Counter rate
TTFT time_to_first_token_seconds Histogram avg
ITL inter_token_latency_seconds Histogram avg
E2E Latency e2e_request_latency_seconds Histogram avg
Uptime process_start_time_seconds Gauge

🛠️ Development

git clone https://github.com/zjxszzzcb/vllm-metrics-monitor.git
cd vllm-metrics-monitor
uv venv && uv pip install -e .

# Run in dev mode
vmm http://your-vllm:8000/metrics --debug

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vllm_metrics_monitor-0.1.0.tar.gz (57.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vllm_metrics_monitor-0.1.0-py3-none-any.whl (16.2 kB view details)

Uploaded Python 3

File details

Details for the file vllm_metrics_monitor-0.1.0.tar.gz.

File metadata

  • Download URL: vllm_metrics_monitor-0.1.0.tar.gz
  • Upload date:
  • Size: 57.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.16 {"installer":{"name":"uv","version":"0.11.16","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for vllm_metrics_monitor-0.1.0.tar.gz
Algorithm Hash digest
SHA256 0c57087a594489017ea53cedca68bd429e86790ae83b09d26de30efddb389e81
MD5 f88016c7a161a9416191ad046fa027ff
BLAKE2b-256 8e2e4b61db3ba0acf88c4db5e10065ac4d7c8185e7ad462ec3f930066bc88907

See more details on using hashes here.

File details

Details for the file vllm_metrics_monitor-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: vllm_metrics_monitor-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 16.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.16 {"installer":{"name":"uv","version":"0.11.16","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for vllm_metrics_monitor-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 41405d6a395878348b18376b205975375fdff4e09a2f971a17aa177ab3fbc726
MD5 3c9e0b2a19c0843f69a54d985b39e23f
BLAKE2b-256 7ce04cfc043e34448294a7686344c87edb0bbe6169ad22bf590284aab0565b8b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page