A CLI tool to conveniently serve LLMs with vLLM

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

These details have not been verified by PyPI

Project description

vLLM CLI

A command-line interface tool for serving Large Language Models using vLLM. Provides both interactive and command-line modes with features for configuration profiles, model management, and server monitoring.

vLLM CLI Welcome Screen Interactive terminal interface with GPU status and system overview
Tip: You can customize the GPU stats bar in settings

Features

🎯 Interactive Mode - Rich terminal interface with menu-driven navigation
⚡ Command-Line Mode - Direct CLI commands for automation and scripting
🤖 Model Management - Automatic discovery of local models with HuggingFace and Ollama support
🔧 Configuration Profiles - Pre-configured and custom server profiles for different use cases
📊 Server Monitoring - Real-time monitoring of active vLLM servers
🖥️ System Information - GPU, memory, and CUDA compatibility checking
📝 Advanced Configuration - Full control over vLLM parameters with validation

What's New in v0.2.5

Multi-Model Proxy Server (Experimental)

The Multi-Model Proxy is a new experimental feature that enables serving multiple LLMs through a single unified API endpoint. This feature is currently under active development and available for testing.

What It Does:

Single Endpoint - All your models accessible through one API
Live Management - Add or remove models without stopping the service
Dynamic GPU Management - Efficient GPU resource distribution through vLLM's sleep/wake functionality
Interactive Setup - User-friendly wizard guides you through configuration

Note: This is an experimental feature under active development. Your feedback helps us improve! Please share your experience through GitHub Issues.

For complete documentation, see the 🌐 Multi-Model Proxy Guide.

What's New in v0.2.4

🚀 Hardware-Optimized Profiles for GPT-OSS Models

New built-in profiles specifically optimized for serving GPT-OSS models on different GPU architectures:

gpt_oss_ampere - Optimized for NVIDIA A100 GPUs
gpt_oss_hopper - Optimized for NVIDIA H100/H200 GPUs
gpt_oss_blackwell - Optimized for NVIDIA Blackwell GPUs

Based on official vLLM GPT recipes for maximum performance.

⚡ Shortcuts System

Save and quickly launch your favorite model + profile combinations:

vllm-cli serve --shortcut my-gpt-server

🦙 Full Ollama Integration

Automatic discovery of Ollama models
GGUF format support (experimental)
System and user directory scanning

🔧 Enhanced Configuration

Environment Variables - Universal and profile-specific environment variable management
GPU Selection - Choose specific GPUs for model serving (--device 0,1)
Enhanced System Info - vLLM feature detection with attention backend availability

See CHANGELOG.md for detailed release notes.

Quick Start

Important: vLLM Installation Notes

⚠️ Binary Compatibility Warning: vLLM contains pre-compiled CUDA kernels that must match your PyTorch version exactly. Installing mismatched versions will cause errors.

vLLM-CLI will not install vLLM or Pytorch by default.

Installation

Option 1: Install vLLM seperately and then install vLLM CLI (Recommended)

# Install vLLM -- Skip this step if you have vllm installed in your environment
uv venv --python 3.12 --seed
source .venv/bin/activate
uv pip install vllm --torch-backend=auto
# Or specify a backend: uv pip install vllm --torch-backend=cu128

# Install vLLM CLI
uv pip install --upgrade vllm-cli
uv run vllm-cli

# If you are using conda:
# Activate the environment you have vllm installed in
pip install vllm-cli
vllm-cli

Option 2: Install vLLM CLI + vLLM

# Install vLLM CLI + vLLM
pip install vllm-cli[vllm]
vllm-cli

Option 3: Build from source (You still need to install vLLM seperately)

git clone https://github.com/Chen-zexi/vllm-cli.git
cd vllm-cli
pip install -e .

Option 4: For Isolated Installation (pipx/system packages)

⚠️ Compatibility Note: pipx creates isolated environments which may have compatibility issues with vLLM's CUDA dependencies. Consider using uv or conda (see above) for better PyTorch/CUDA compatibility.

# If you do not want to use virtual environment and want to install vLLM along with vLLM CLI
pipx install "vllm-cli[vllm]"

# If you want to install pre-release version
pipx install --pip-args="--pre" "vllm-cli[vllm]"

Prerequisites

Python 3.9+
CUDA-compatible GPU (recommended)
vLLM package installed
For dependency issues, see Troubleshooting Guide

Basic Usage

# Interactive mode - menu-driven interface
vllm-cl
# Serve a model
vllm-cli serve --model openai/gpt-oss-20b

# Use a shortcut
vllm-cli serve --shortcut my-model

For detailed usage instructions, see the 📘 Usage Guide and 🌐 Multi-Model Proxy Guide.

Configuration

Built-in Profiles

vLLM CLI includes 7 optimized profiles for different use cases:

General Purpose:

standard - Minimal configuration with smart defaults
high_throughput - Maximum performance configuration
low_memory - Memory-constrained environments
moe_optimized - Optimized for Mixture of Experts models

Hardware-Specific (GPT-OSS):

gpt_oss_ampere - NVIDIA A100 GPUs
gpt_oss_hopper - NVIDIA H100/H200 GPUs
gpt_oss_blackwell - NVIDIA Blackwell GPUs

See 📋 Profiles Guide for detailed information.

Configuration Files

Main Config: ~/.config/vllm-cli/config.yaml
User Profiles: ~/.config/vllm-cli/user_profiles.json
Shortcuts: ~/.config/vllm-cli/shortcuts.json

Documentation

📘 Usage Guide - Complete usage instructions
🌐 Multi-Model Proxy - Serve multiple models simultaneously
📋 Profiles Guide - Built-in profiles details
❓ Troubleshooting - Common issues and solutions
📸 Screenshots - Visual feature overview
🔍 Model Discovery - Model management guide
🦙 Ollama Integration - Using Ollama models
⚙️ Custom Models - Serving custom models
🗺️ Roadmap - Future development plans

Integration with hf-model-tool

vLLM CLI uses hf-model-tool for model discovery:

Comprehensive model scanning
Ollama model support
Shared configuration

Development

Project Structure

src/vllm_cli/
├── cli/           # CLI command handling
├── config/        # Configuration management
├── models/        # Model management
├── server/        # Server lifecycle
├── ui/            # Terminal interface
└── schemas/       # JSON schemas

Contributing

Contributions are welcome! Please feel free to open an issue or submit a pull request.

License

MIT License - see LICENSE file for details.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

zc2610

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.2.5

Aug 25, 2025

0.2.5rc2 pre-release

Aug 24, 2025

0.2.5rc1 pre-release

Aug 22, 2025

0.2.4

Aug 20, 2025

0.2.4rc2 pre-release

Aug 19, 2025

0.2.4rc1 pre-release

Aug 19, 2025

0.2.3

Aug 18, 2025

0.2.2

Aug 18, 2025

0.2.1

Aug 17, 2025

0.2.1rc1 pre-release

Aug 17, 2025

0.2.0

Aug 17, 2025

0.1.1

Aug 16, 2025

0.1.0

Aug 15, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vllm_cli-0.2.5.tar.gz (230.4 kB view details)

Uploaded Aug 25, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

vllm_cli-0.2.5-py3-none-any.whl (249.6 kB view details)

Uploaded Aug 25, 2025 Python 3

File details

Details for the file vllm_cli-0.2.5.tar.gz.

File metadata

Download URL: vllm_cli-0.2.5.tar.gz
Upload date: Aug 25, 2025
Size: 230.4 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for vllm_cli-0.2.5.tar.gz
Algorithm	Hash digest
SHA256	`4b384958a82f40650e63b4bdf5538c543c35c3680422ab66ec2ccba7ce0b032d`
MD5	`70393b1acb280ea54a880cc1034446ab`
BLAKE2b-256	`ad734ebb1471317a6dfa55bc9c8c21fe5d2d108f22aaa8357702b13a4d9069e8`

See more details on using hashes here.

Provenance

The following attestation bundles were made for vllm_cli-0.2.5.tar.gz:

Publisher: python-publish.yml on Chen-zexi/vllm-cli

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: vllm_cli-0.2.5.tar.gz
- Subject digest: 4b384958a82f40650e63b4bdf5538c543c35c3680422ab66ec2ccba7ce0b032d
- Sigstore transparency entry: 429887159
- Sigstore integration time: Aug 25, 2025
Source repository:
- Permalink: Chen-zexi/vllm-cli@b49de5b4635b20dbf81cc985fb7b913551c5166a
- Branch / Tag: refs/tags/v0.2.5
- Owner: https://github.com/Chen-zexi
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: python-publish.yml@b49de5b4635b20dbf81cc985fb7b913551c5166a
- Trigger Event: release

File details

Details for the file vllm_cli-0.2.5-py3-none-any.whl.

File metadata

Download URL: vllm_cli-0.2.5-py3-none-any.whl
Upload date: Aug 25, 2025
Size: 249.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for vllm_cli-0.2.5-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b100dd8b001adda684e8035ee306fc024604317200b1db3d8cfebbdd4d9f2df8`
MD5	`0e48b8caeaaed2468b1a0b5ec9aa2a9e`
BLAKE2b-256	`8c1238f0a8d045ab23dac015d37a268bcc480e0738a71d13005ab66609b7486a`

See more details on using hashes here.

Provenance

The following attestation bundles were made for vllm_cli-0.2.5-py3-none-any.whl:

Publisher: python-publish.yml on Chen-zexi/vllm-cli

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: vllm_cli-0.2.5-py3-none-any.whl
- Subject digest: b100dd8b001adda684e8035ee306fc024604317200b1db3d8cfebbdd4d9f2df8
- Sigstore transparency entry: 429887164
- Sigstore integration time: Aug 25, 2025
Source repository:
- Permalink: Chen-zexi/vllm-cli@b49de5b4635b20dbf81cc985fb7b913551c5166a
- Branch / Tag: refs/tags/v0.2.5
- Owner: https://github.com/Chen-zexi
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: python-publish.yml@b49de5b4635b20dbf81cc985fb7b913551c5166a
- Trigger Event: release

vllm-cli 0.2.5

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

vLLM CLI

Features

What's New in v0.2.5

Multi-Model Proxy Server (Experimental)

What's New in v0.2.4

🚀 Hardware-Optimized Profiles for GPT-OSS Models

⚡ Shortcuts System

🦙 Full Ollama Integration

🔧 Enhanced Configuration

Quick Start

Important: vLLM Installation Notes

Installation

Option 1: Install vLLM seperately and then install vLLM CLI (Recommended)

Option 2: Install vLLM CLI + vLLM

Option 3: Build from source (You still need to install vLLM seperately)

Option 4: For Isolated Installation (pipx/system packages)

Prerequisites

Basic Usage

Configuration

Built-in Profiles

Configuration Files

Documentation

Integration with hf-model-tool

Development

Project Structure

Contributing

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance