Skip to main content

Diagnostic tool for vLLM inference servers

Project description

vLLM Doctor

Package version Supported Python versions

Diagnose vLLM serving issues from /metrics.

vLLM Doctor reads production metrics and turns them into operational findings: what looks wrong, how confident the diagnosis is, and which vLLM knobs are worth checking first.

vllm-doctor --url http://localhost:8000/metrics

vLLM Doctor is not a dashboard replacement. It is a fast diagnostic snapshot for a single server or Prometheus target.

Why not just a dashboard?

Dashboards show metrics. vLLM Doctor explains inference-system behavior.

Dashboards vLLM Doctor
Shows raw metrics
Explains what's wrong
Recommends vLLM configs
Requires setup
Works on a single server

Installation

With pip:

pip install vllm-doctor

With uv:

uv tool install vllm-doctor

Quickstart

Direct scrape:

vllm-doctor --url http://localhost:8000/metrics

Prometheus:

vllm-doctor --url http://localhost:9090

JSON output:

vllm-doctor --url http://localhost:8000/metrics --format json

Verbose:

vllm-doctor --url http://localhost:8000/metrics --verbose

Example output

─────────── vLLM Doctor  ·  Health: CRITICAL  ·  Window: 5m ────────────

╭─  KV cache pressure  [high confidence] ─────────────────────────────╮
│   GPU KV cache usage: 94%  ·  Waiting requests: 7                    │
│                                                                      │
│    Reduce max_num_seqs to limit concurrent sequences                │
│    Increase gpu_memory_utilization if GPU memory headroom exists    │
╰──────────────────────────────────────────────────────────────────────╯
╭─  Queue pressure  [low confidence] ─────────────────────────────────╮
│   Waiting requests: 7                                                │
│                                                                      │
│    Add replicas or increase concurrency limits                      │
│    Inspect autoscaling thresholds                                   │
╰──────────────────────────────────────────────────────────────────────╯

─────────────────────────── Observed Metrics ───────────────────────────

  Requests Running                             12
  Requests Waiting                              7
  GPU Cache Usage        ███████████████████░ 94%
  Generation Tokens/s                        42.0
  TTFT p95 (s)                              3.200
  TPOT p95 (s)                              0.050

Documentation

Read the full documentation: https://aminalaee.github.io/vllm-doctor

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vllm_doctor-0.1.0.tar.gz (12.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vllm_doctor-0.1.0-py3-none-any.whl (21.3 kB view details)

Uploaded Python 3

File details

Details for the file vllm_doctor-0.1.0.tar.gz.

File metadata

  • Download URL: vllm_doctor-0.1.0.tar.gz
  • Upload date:
  • Size: 12.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.14 {"installer":{"name":"uv","version":"0.11.14","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for vllm_doctor-0.1.0.tar.gz
Algorithm Hash digest
SHA256 36bdade153a99d2e398928219b19632c12e4c96a88838b4ff14a19a54eb3c682
MD5 b615418dc239ef4d234adb051f0dec63
BLAKE2b-256 8abf36171821bd311e297647f1dc78e8381a02a89d69f4d07ef288484c84542b

See more details on using hashes here.

File details

Details for the file vllm_doctor-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: vllm_doctor-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 21.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.14 {"installer":{"name":"uv","version":"0.11.14","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for vllm_doctor-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d18369e160718154d806be55167c6c93d321ea8d93f2095b5aa0ee95bed8bb87
MD5 bd25f88007b86c23f41ba273dde3bd3e
BLAKE2b-256 348760ca8a3edd7e7a619c1be0d277ae60733c5ef2c6a716b02933f55e2bddd9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page