4 projects
vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
vllm-omni
A framework for efficient model inference with omni-modality models
vllm-router
High-performance Rust-based load balancer for VLLM with multiple routing algorithms and prefill-decode disaggregation support
vllm-tpu
A high-throughput and memory-efficient inference and serving engine for LLMs