Skip to main content

SGLang is a fast serving framework for large language models and vision language models.

Project description

logo

PyPI PyPI - Downloads license issue resolution open issues Ask DeepWiki


Blog | Documentation | Roadmap | Join Slack | Weekly Dev Meeting | Slides

News

  • [2026/02] 🔥 Unlocking 25x Inference Performance with SGLang on NVIDIA GB300 NVL72 (blog).
  • [2026/01] 🔥 SGLang Diffusion accelerates video and image generation (blog).
  • [2025/12] SGLang provides day-0 support for latest open models (MiMo-V2-Flash, Nemotron 3 Nano, Mistral Large 3, LLaDA 2.0 Diffusion LLM, MiniMax M2).
  • [2025/10] 🔥 SGLang now runs natively on TPU with the SGLang-Jax backend (blog).
  • [2025/09] Deploying DeepSeek on GB200 NVL72 with PD and Large Scale EP (Part II): 3.8x Prefill, 4.8x Decode Throughput (blog).
  • [2025/09] SGLang Day 0 Support for DeepSeek-V3.2 with Sparse Attention (blog).
  • [2025/08] SGLang x AMD SF Meetup on 8/22: Hands-on GPU workshop, tech talks by AMD/xAI/SGLang, and networking (Roadmap, Large-scale EP, Highlights, AITER/MoRI, Wave).
More
  • [2025/11] SGLang Diffusion accelerates video and image generation (blog).
  • [2025/10] PyTorch Conference 2025 SGLang Talk (slide).
  • [2025/10] SGLang x Nvidia SF Meetup on 10/2 (recap).
  • [2025/08] SGLang provides day-0 support for OpenAI gpt-oss model (instructions)
  • [2025/06] SGLang, the high-performance serving infrastructure powering trillions of tokens daily, has been awarded the third batch of the Open Source AI Grant by a16z (a16z blog).
  • [2025/05] Deploying DeepSeek with PD Disaggregation and Large-scale Expert Parallelism on 96 H100 GPUs (blog).
  • [2025/06] Deploying DeepSeek on GB200 NVL72 with PD and Large Scale EP (Part I): 2.7x Higher Decoding Throughput (blog).
  • [2025/03] Supercharge DeepSeek-R1 Inference on AMD Instinct MI300X (AMD blog)
  • [2025/03] SGLang Joins PyTorch Ecosystem: Efficient LLM Serving Engine (PyTorch blog)
  • [2025/02] Unlock DeepSeek-R1 Inference Performance on AMD Instinct™ MI300X GPU (AMD blog)
  • [2025/01] SGLang provides day one support for DeepSeek V3/R1 models on NVIDIA and AMD GPUs with DeepSeek-specific optimizations. (instructions, AMD blog, 10+ other companies)
  • [2024/12] v0.4 Release: Zero-Overhead Batch Scheduler, Cache-Aware Load Balancer, Faster Structured Outputs (blog).
  • [2024/10] The First SGLang Online Meetup (slides).
  • [2024/09] v0.3 Release: 7x Faster DeepSeek MLA, 1.5x Faster torch.compile, Multi-Image/Video LLaVA-OneVision (blog).
  • [2024/07] v0.2 Release: Faster Llama3 Serving with SGLang Runtime (vs. TensorRT-LLM, vLLM) (blog).
  • [2024/02] SGLang enables 3x faster JSON decoding with compressed finite state machine (blog).
  • [2024/01] SGLang provides up to 5x faster inference with RadixAttention (blog).
  • [2024/01] SGLang powers the serving of the official LLaVA v1.6 release demo (usage).

About

SGLang is a high-performance serving framework for large language models and multimodal models. It is designed to deliver low-latency and high-throughput inference across a wide range of setups, from a single GPU to large distributed clusters. Its core features include:

  • Fast Runtime: Provides efficient serving with RadixAttention for prefix caching, a zero-overhead CPU scheduler, prefill-decode disaggregation, speculative decoding, continuous batching, paged attention, tensor/pipeline/expert/data parallelism, structured outputs, chunked prefill, quantization (FP4/FP8/INT4/AWQ/GPTQ), and multi-LoRA batching.
  • Broad Model Support: Supports a wide range of language models (Llama, Qwen, DeepSeek, Kimi, GLM, GPT, Gemma, Mistral, etc.), embedding models (e5-mistral, gte, mcdse), reward models (Skywork), and diffusion models (WAN, Qwen-Image), with easy extensibility for adding new models. Compatible with most Hugging Face models and OpenAI APIs.
  • Extensive Hardware Support: Runs on NVIDIA GPUs (GB200/B300/H100/A100/Spark/5090), AMD GPUs (MI355/MI300), Intel Xeon CPUs, Google TPUs, Ascend NPUs, and more.
  • Active Community: SGLang is open-source and supported by a vibrant community with widespread industry adoption, powering over 400,000 GPUs worldwide.
  • RL & Post-Training Backbone: SGLang is a proven rollout backend used for training many frontier models, with native RL integrations and adoption by well-known post-training frameworks such as AReaL, Miles, slime, Tunix, verl and more.

Getting Started

Benchmark and Performance

Learn more in the release blogs: v0.2 blog, v0.3 blog, v0.4 blog, Large-scale expert parallelism, GB200 rack-scale parallelism, GB300 long context.

Adoption and Sponsorship

SGLang has been deployed at large scale, generating trillions of tokens in production each day. It is trusted and adopted by a wide range of leading enterprises and institutions, including xAI, AMD, NVIDIA, Intel, LinkedIn, Cursor, Oracle Cloud, Google Cloud, Microsoft Azure, AWS, Atlas Cloud, Voltage Park, Nebius, DataCrunch, Novita, InnoMatrix, MIT, UCLA, the University of Washington, Stanford, UC Berkeley, Tsinghua University, Jam & Tea Studios, Baseten, and other major technology organizations. As an open-source LLM inference engine, SGLang has become the de facto industry standard, with deployments running on over 400,000 GPUs worldwide. SGLang is currently hosted under the non-profit open-source organization LMSYS.

logo

Contact Us

For enterprises interested in adopting or deploying SGLang at scale, including technical consulting, sponsorship opportunities, or partnership inquiries, please contact us at sglang@lmsys.org.

Long-term active SGLang contributors are eligible for coding agent sponsorship, such as Cursor, Claude Code, or OpenAI Codex. Email sglang@lmsys.org with your most important commits or pull requests.

Acknowledgment

We learned the design and reused code from the following projects: Guidance, vLLM, LightLLM, FlashInfer, Outlines, and LMQL.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

sglang-0.5.13.post1-cp313-cp313-manylinux_2_34_x86_64.whl (11.9 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.34+ x86-64

sglang-0.5.13.post1-cp313-cp313-manylinux_2_34_aarch64.whl (11.7 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.34+ ARM64

sglang-0.5.13.post1-cp312-cp312-manylinux_2_34_x86_64.whl (11.9 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.34+ x86-64

sglang-0.5.13.post1-cp312-cp312-manylinux_2_34_aarch64.whl (11.7 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.34+ ARM64

sglang-0.5.13.post1-cp311-cp311-manylinux_2_34_x86_64.whl (11.9 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.34+ x86-64

sglang-0.5.13.post1-cp311-cp311-manylinux_2_34_aarch64.whl (11.7 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.34+ ARM64

sglang-0.5.13.post1-cp310-cp310-manylinux_2_34_x86_64.whl (11.9 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.34+ x86-64

sglang-0.5.13.post1-cp310-cp310-manylinux_2_34_aarch64.whl (11.7 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.34+ ARM64

File details

Details for the file sglang-0.5.13.post1-cp313-cp313-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for sglang-0.5.13.post1-cp313-cp313-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 73a0130a0a0e1365f7dc5517648807b2da3cba3067259d993289e4458e3da5aa
MD5 91f39a76d141fe96997fb5b172724702
BLAKE2b-256 b899019d2d7a3b3dc7fe0c8de392473515855b07d2e652c31d68dfc5358232e6

See more details on using hashes here.

File details

Details for the file sglang-0.5.13.post1-cp313-cp313-manylinux_2_34_aarch64.whl.

File metadata

File hashes

Hashes for sglang-0.5.13.post1-cp313-cp313-manylinux_2_34_aarch64.whl
Algorithm Hash digest
SHA256 d17b257c92f443e464079c788ef9068f61c49be14b67853ce8ac83d45f6e48f7
MD5 7f6bd110b7bd3000ab887b73d5a5fb85
BLAKE2b-256 0e19de541dabc13755e08d4c9f826837c67b6b958c36909256894927835897ef

See more details on using hashes here.

File details

Details for the file sglang-0.5.13.post1-cp312-cp312-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for sglang-0.5.13.post1-cp312-cp312-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 55a8f3a54cfb56d562ca5b3e77678d28a1ecb97eb89fc9f673c2fc3e33f2d544
MD5 1c313f522bcae70e206d06929fa956e8
BLAKE2b-256 7060aeb365ed95b7c8bb27595c2de737bcdca741324cb5e8ab0ca5fc855823ad

See more details on using hashes here.

File details

Details for the file sglang-0.5.13.post1-cp312-cp312-manylinux_2_34_aarch64.whl.

File metadata

File hashes

Hashes for sglang-0.5.13.post1-cp312-cp312-manylinux_2_34_aarch64.whl
Algorithm Hash digest
SHA256 e1ce7d4654a3e2a11a2a31dd3d2ea9cb011a1fd00406521c2b92bf4698ee073f
MD5 7437b19bd0e5c1ebacc1c38656255a86
BLAKE2b-256 37024207d79c01a2d91e2710ea835074c52717f3ecbc4c2b02677b9f7023f100

See more details on using hashes here.

File details

Details for the file sglang-0.5.13.post1-cp311-cp311-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for sglang-0.5.13.post1-cp311-cp311-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 1c1653ecf2893c7d7b1cf0c3e1cc9b7550ebad95c809de2f44bcc91b7a1e1fe3
MD5 dd1abc937779111ab61a2834ce69476b
BLAKE2b-256 e0699d098c10f35fa2475e896c246b1ef615d96484df9189cdfd8fbc231e2a39

See more details on using hashes here.

File details

Details for the file sglang-0.5.13.post1-cp311-cp311-manylinux_2_34_aarch64.whl.

File metadata

File hashes

Hashes for sglang-0.5.13.post1-cp311-cp311-manylinux_2_34_aarch64.whl
Algorithm Hash digest
SHA256 261e31eb6bfb767657ee9015b8fb795077cb9d5c425b192c9c35b9869f994781
MD5 80a7b543352ff7abefa08c1ae3d64b65
BLAKE2b-256 b79062f3ef3d2a43a610895bc49e40518f19f1e9a875e3fa0cc675abcb5afdd4

See more details on using hashes here.

File details

Details for the file sglang-0.5.13.post1-cp310-cp310-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for sglang-0.5.13.post1-cp310-cp310-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 26c20e1379cba9312675ee8d322660757b5695e22d08e2f977d35c6517497e59
MD5 bfc48bd924fe48cf2270c261a6321fd0
BLAKE2b-256 f0ddb7c890688ab18deceec51fb3ff603c19575480b453eb1f8db6d85ec91780

See more details on using hashes here.

File details

Details for the file sglang-0.5.13.post1-cp310-cp310-manylinux_2_34_aarch64.whl.

File metadata

File hashes

Hashes for sglang-0.5.13.post1-cp310-cp310-manylinux_2_34_aarch64.whl
Algorithm Hash digest
SHA256 b21ea502fc7bc326901ede6aed8f7f123b2e0f03fecaf1f625a0a0f2bba74810
MD5 b0506aa547940ef1d8cd1acc481d7e0c
BLAKE2b-256 63bf86d4d6900df18260a344e033881c8b980a7616dcd10f23e940191b4881ec

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page