Skip to main content

A web interface for managing and interacting with vLLM servers

Project description

vLLM Playground

A modern web interface for managing and interacting with vLLM servers (www.github.com/vllm-project/vllm). Supports GPU and CPU modes, with special optimizations for macOS Apple Silicon and enterprise deployment on OpenShift/Kubernetes.

๐Ÿ†• vLLM-Omni Multimodal Generation

vLLM-Omni Audio Generation

Generate images, edit photos, create speech, and produce music - all with vLLM-Omni integration.

โœจ Claude Code Integration

vLLM Playground Claude Code

Run Claude Code with open-source models served by vLLM - your private, local coding assistant.

โœจ Agentic-Ready with MCP Support

vLLM Playground MCP Integration

MCP (Model Context Protocol) integration enables models to use external tools with human-in-the-loop approval.

๐Ÿ–ผ๏ธ VLM (Vision Language Model)

VLM Support

Upload images and chat with vision models like Qwen2.5-VL, LLaVA, and more.

๐Ÿงฉ Multiple instances & backends

Run subprocess, container, and remote vLLM servers side by side; switch tabs, save configs, and manage everything from Management โ†’ Instances. See Multi-Instance Guide for details.

vLLM Server with multiple instance tabs and saved backends

Instances management grid

๐Ÿ†• What's New in v0.1.6

Observability Dashboard

Real-time Observability Dashboard with auto-discovered vLLM metrics, category filtering, and threshold alerts.

  • ๐Ÿ“Š Observability Dashboard - Full-page metrics dashboard with time-series charts, threshold alerts, and auto-discovery
  • ๐Ÿ” PagedAttention Visualizer - Real-time KV cache utilization heatmap with eviction alerts
  • ๐Ÿ”ข Token Counter & Logprobs - Live token estimation and per-token probability heatmap
  • โšก Speculative Decoding Dashboard - Acceptance rate, speedup factor, and method configuration

See Changelog for full details.


๐Ÿš€ Quick Start

# Install from PyPI
pip install vllm-playground

# Pre-download container image (~10GB for GPU)
vllm-playground pull

# Start the playground
vllm-playground

Open http://localhost:7860 and click "Start Server" - that's it! ๐ŸŽ‰

CLI Options

vllm-playground pull                # Pre-download GPU image (NVIDIA)
vllm-playground pull --nvidia       # Pre-download NVIDIA GPU image
vllm-playground pull --amd          # Pre-download AMD ROCm image
vllm-playground pull --tpu          # Pre-download Google TPU image
vllm-playground pull --cpu          # Pre-download CPU image
vllm-playground pull --all          # Pre-download all images
vllm-playground --port 8080         # Custom port
vllm-playground stop                # Stop running instance
vllm-playground status              # Check status

โœจ Key Features

Feature Description
๐ŸŒ Remote Server Connect to any remote vLLM instance via URL + API key
๐Ÿงฉ Multi-instance Several backends at once (subprocess, container, remote); tabs + Instances page
๐Ÿ–ผ๏ธ VLM Support Upload images and chat with vision models (Qwen2.5-VL, LLaVA)
๐Ÿค– Claude Code Use open-source models as Claude Code backend via vLLM
๐Ÿ’ฌ Modern Chat UI Markdown-rendered chat with streaming responses
๐Ÿ”ง Tool Calling Function calling with Llama, Mistral, Qwen, and more
๐Ÿ”— MCP Integration Connect to MCP servers for agentic capabilities
๐Ÿ—๏ธ Structured Outputs Constrain responses to JSON Schema, Regex, or Grammar
๐Ÿณ Container Mode Zero-setup vLLM via automatic container management
โ˜ธ๏ธ OpenShift/K8s Enterprise deployment with dynamic pod creation
๐Ÿ“Š Benchmarking GuideLLM integration for load testing
๐Ÿ“š Recipes One-click configs from vLLM community recipes

๐Ÿ“ฆ Installation Options

Method Command Best For
PyPI pip install vllm-playground Most users
With Benchmarking pip install vllm-playground[benchmark] Load testing
From Source git clone + python run.py Development
OpenShift/K8s ./openshift/deploy.sh Enterprise

๐Ÿ“– See Installation Guide for detailed instructions.


๐Ÿ”ง Configuration

Tool Calling

Enable in Server Configuration before starting:

  1. Check "Enable Tool Calling"
  2. Select parser (or "Auto-detect")
  3. Start server
  4. Define tools in the ๐Ÿ”ง toolbar panel

Supported Models:

  • Llama 3.x (llama3_json)
  • Mistral (mistral)
  • Qwen (hermes)
  • Hermes (hermes)

Claude Code Integration

Use vLLM to serve open-source models as a backend for Claude Code:

  1. Go to Claude Code in the sidebar
  2. Start vLLM with a recommended model (see tips on the page)
  3. The embedded terminal connects automatically

Requirements:

  • vLLM v0.12.0+ (for Anthropic Messages API)
  • Model with native 65K+ context and tool calling support
  • ttyd installed for web terminal

Recommended Model for most GPUs:

meta-llama/Llama-3.1-8B-Instruct
--max-model-len 65536 --enable-auto-tool-choice --tool-call-parser llama3_json

Note: This integration demonstrates using vLLM as a backend for Claude Code. Claude Code is a separate product by Anthropic - users must install it independently and comply with Anthropic's Commercial Terms of Service. vLLM Playground provides the terminal interface only.

MCP Servers

Connect to external tools via Model Context Protocol:

  1. Go to MCP Servers in the sidebar
  2. Add a server (presets available: Filesystem, Git, Fetch, Time)
  3. Connect and enable in chat panel

โš ๏ธ MCP requires Python 3.10+

CPU Mode (macOS)

Edit config/vllm_cpu.env:

export VLLM_CPU_KVCACHE_SPACE=40
export VLLM_CPU_OMP_THREADS_BIND=auto

Metal GPU Support (macOS Apple Silicon)

vLLM Playground supports Apple Silicon GPU acceleration:

  1. Install vllm-metal following official instructions
  2. Configure playground to use Metal:
    • Run Mode: Subprocess
    • Compute Mode: Metal
    • Venv Path: ~/.venv-vllm-metal (or your installation path)

See macOS Metal Guide for details.

Custom vLLM Installations

Use specific vLLM versions or custom builds:

  1. Install vLLM in a virtual environment
  2. Configure playground:
    • Run Mode: Subprocess
    • Venv Path: /path/to/your/venv

See Custom venv Guide for details.


๐Ÿ“– Documentation

Getting Started

Features

Deployment

Reference

Releases

  • Changelog - Version history and changes
  • v0.1.7 - Hotfix & tutorials
  • v0.1.6 - Observability dashboard, PagedAttention visualizer, token counter, logprobs
  • v0.1.5 - Remote server, VLM vision support, markdown rendering
  • v0.1.4 - vLLM-Omni multimodal, Studio UI
  • v0.1.3 - Multi-accelerators, Claude Code, vLLM-Metal
  • v0.1.2 - ModelScope integration, i18n improvements
  • v0.1.1 - MCP integration, runtime detection
  • v0.1.0 - First release, modern UI, tool calling

๐Ÿ—๏ธ Architecture

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚   User Browser   โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
         โ”‚ http://localhost:7860
         โ†“
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚   Web UI (Host)  โ”‚  โ† FastAPI + JavaScript
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
         โ”‚
    โ”Œโ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”
    โ†“         โ†“
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€-โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ vLLM    โ”‚ โ”‚  MCP   โ”‚  โ† Containers / External Servers
โ”‚Containerโ”‚ โ”‚Servers โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€-โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

๐Ÿ“– See Architecture Overview for details.


๐Ÿ†˜ Quick Troubleshooting

Issue Solution
Port in use vllm-playground stop
Container won't start podman logs vllm-service
Tool calling fails Restart with "Enable Tool Calling" checked
Image pull errors vllm-playground pull --all

๐Ÿ“– See Troubleshooting Guide for more.


๐Ÿ”— Related Projects


๐Ÿ“ License

Apache 2.0 License - See LICENSE file for details.

๐Ÿค Contributing

Contributions welcome! Please see CONTRIBUTING.md for setup instructions and guidelines.


Made with โค๏ธ for the vLLM community

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vllm_playground-0.1.8rc2.tar.gz (9.2 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vllm_playground-0.1.8rc2-py3-none-any.whl (9.2 MB view details)

Uploaded Python 3

File details

Details for the file vllm_playground-0.1.8rc2.tar.gz.

File metadata

  • Download URL: vllm_playground-0.1.8rc2.tar.gz
  • Upload date:
  • Size: 9.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.17

File hashes

Hashes for vllm_playground-0.1.8rc2.tar.gz
Algorithm Hash digest
SHA256 de96309032d5e47777f2f29842484296e4b79784af1d2fbea34c2247b58abe6e
MD5 196be96df37a514eb5f3a4770c5f5d7e
BLAKE2b-256 ecd5475bbdddd984e226907307b43fb08602cdc7969b018dc4feb480c2fa44d0

See more details on using hashes here.

File details

Details for the file vllm_playground-0.1.8rc2-py3-none-any.whl.

File metadata

File hashes

Hashes for vllm_playground-0.1.8rc2-py3-none-any.whl
Algorithm Hash digest
SHA256 5a33e6ccb275ee51fe60232c6279e045bd7993d0da9614d01c6ef92b86fb644b
MD5 b7cb54659fa3d73a7329898108e47b76
BLAKE2b-256 d5bf2a4d3d3458b61a5f009339e1dea1017333926f0fa418085aeeb51f20e208

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page