Profile of vLLM

Last released Feb 5, 2026

A high-throughput and memory-efficient inference and serving engine for LLMs

Last released Feb 1, 2026

A framework for efficient model inference with omni-modality models

Last released Feb 4, 2026

High-performance Rust-based load balancer for VLLM with multiple routing algorithms and prefill-decode disaggregation support

Last released Dec 30, 2025

A high-throughput and memory-efficient inference and serving engine for LLMs

vLLM