Diagnostic tool for vLLM inference servers
Project description
Diagnose vLLM serving issues from /metrics.
vLLM Doctor reads production metrics and turns them into operational findings: what looks wrong, how confident the diagnosis is, and which vLLM knobs are worth checking first.
vllm-doctor --url http://localhost:8000/metrics
vLLM Doctor is not a dashboard replacement. It is a fast diagnostic snapshot for a single server or Prometheus target.
Why not just a dashboard?
Dashboards show metrics. vLLM Doctor explains inference-system behavior.
| Dashboards | vLLM Doctor | |
|---|---|---|
| Shows raw metrics | ✓ | ✓ |
| Explains what's wrong | ✗ | ✓ |
| Recommends vLLM configs | ✗ | ✓ |
| Requires setup | ✓ | ✗ |
| Works on a single server | ✗ | ✓ |
Installation
With pip:
pip install vllm-doctor
With uv:
uv tool install vllm-doctor
Quickstart
Direct scrape:
vllm-doctor --url http://localhost:8000/metrics
Prometheus:
vllm-doctor --url http://localhost:9090
JSON output:
vllm-doctor --url http://localhost:8000/metrics --format json
Verbose:
vllm-doctor --url http://localhost:8000/metrics --verbose
Example output
─────────── vLLM Doctor · Health: CRITICAL · Window: 5m ────────────
╭─ ✖ KV cache pressure [high confidence] ─────────────────────────────╮
│ GPU KV cache usage: 94% · Waiting requests: 7 │
│ │
│ → Reduce max_num_seqs to limit concurrent sequences │
│ → Increase gpu_memory_utilization if GPU memory headroom exists │
╰──────────────────────────────────────────────────────────────────────╯
╭─ ⚠ Queue pressure [low confidence] ─────────────────────────────────╮
│ Waiting requests: 7 │
│ │
│ → Add replicas or increase concurrency limits │
│ → Inspect autoscaling thresholds │
╰──────────────────────────────────────────────────────────────────────╯
─────────────────────────── Observed Metrics ───────────────────────────
Requests Running 12
Requests Waiting 7
GPU Cache Usage ███████████████████░ 94%
Generation Tokens/s 42.0
TTFT p95 (s) 3.200
TPOT p95 (s) 0.050
Documentation
Read the full documentation: https://aminalaee.github.io/vllm-doctor
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file vllm_doctor-0.1.0.tar.gz.
File metadata
- Download URL: vllm_doctor-0.1.0.tar.gz
- Upload date:
- Size: 12.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.11.14 {"installer":{"name":"uv","version":"0.11.14","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
36bdade153a99d2e398928219b19632c12e4c96a88838b4ff14a19a54eb3c682
|
|
| MD5 |
b615418dc239ef4d234adb051f0dec63
|
|
| BLAKE2b-256 |
8abf36171821bd311e297647f1dc78e8381a02a89d69f4d07ef288484c84542b
|
File details
Details for the file vllm_doctor-0.1.0-py3-none-any.whl.
File metadata
- Download URL: vllm_doctor-0.1.0-py3-none-any.whl
- Upload date:
- Size: 21.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.11.14 {"installer":{"name":"uv","version":"0.11.14","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d18369e160718154d806be55167c6c93d321ea8d93f2095b5aa0ee95bed8bb87
|
|
| MD5 |
bd25f88007b86c23f41ba273dde3bd3e
|
|
| BLAKE2b-256 |
348760ca8a3edd7e7a619c1be0d277ae60733c5ef2c6a716b02933f55e2bddd9
|