Skip to main content

A high-throughput and memory-efficient inference and serving engine for LLMs

Project description

vLLM

Easy, fast, and cheap LLM serving for everyone

| Documentation | Blog | Discussions |


Latest News 🔥

  • [2023/08] We would like to express our sincere gratitude to Andreessen Horowitz (a16z) for providing a generous grant to support the open-source development and research of vLLM.
  • [2023/07] Added support for LLaMA-2! You can run and serve 7B/13B/70B LLaMA-2s on vLLM with a single command!
  • [2023/06] Serving vLLM On any Cloud with SkyPilot. Check out a 1-click example to start the vLLM demo, and the blog post for the story behind vLLM development on the clouds.
  • [2023/06] We officially released vLLM! FastChat-vLLM integration has powered LMSYS Vicuna and Chatbot Arena since mid-April. Check out our blog post.

vLLM is a fast and easy-to-use library for LLM inference and serving.

vLLM is fast with:

  • State-of-the-art serving throughput
  • Efficient management of attention key and value memory with PagedAttention
  • Continuous batching of incoming requests
  • Optimized CUDA kernels

vLLM is flexible and easy to use with:

  • Seamless integration with popular HuggingFace models
  • High-throughput serving with various decoding algorithms, including parallel sampling, beam search, and more
  • Tensor parallelism support for distributed inference
  • Streaming outputs
  • OpenAI-compatible API server

vLLM seamlessly supports many Huggingface models, including the following architectures:

  • Aquila (BAAI/Aquila-7B, BAAI/AquilaChat-7B, etc.)
  • Baichuan (baichuan-inc/Baichuan-7B, baichuan-inc/Baichuan-13B-Chat, etc.)
  • BLOOM (bigscience/bloom, bigscience/bloomz, etc.)
  • Falcon (tiiuae/falcon-7b, tiiuae/falcon-40b, tiiuae/falcon-rw-7b, etc.)
  • GPT-2 (gpt2, gpt2-xl, etc.)
  • GPT BigCode (bigcode/starcoder, bigcode/gpt_bigcode-santacoder, etc.)
  • GPT-J (EleutherAI/gpt-j-6b, nomic-ai/gpt4all-j, etc.)
  • GPT-NeoX (EleutherAI/gpt-neox-20b, databricks/dolly-v2-12b, stabilityai/stablelm-tuned-alpha-7b, etc.)
  • InternLM (internlm/internlm-7b, internlm/internlm-chat-7b, etc.)
  • LLaMA & LLaMA-2 (meta-llama/Llama-2-70b-hf, lmsys/vicuna-13b-v1.3, young-geng/koala, openlm-research/open_llama_13b, etc.)
  • MPT (mosaicml/mpt-7b, mosaicml/mpt-30b, etc.)
  • OPT (facebook/opt-66b, facebook/opt-iml-max-30b, etc.)
  • Qwen (Qwen/Qwen-7B, Qwen/Qwen-7B-Chat, etc.)

Install vLLM with pip or from source:

pip install vllm

Getting Started

Visit our documentation to get started.

Performance

vLLM outperforms HuggingFace Transformers (HF) by up to 24x and Text Generation Inference (TGI) by up to 3.5x, in terms of throughput. For details, check out our blog post.


Serving throughput when each request asks for 1 output completion.


Serving throughput when each request asks for 3 output completions.

Contributing

We welcome and value any contributions and collaborations. Please check out CONTRIBUTING.md for how to get involved.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vllm-0.1.7.tar.gz (114.9 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

vllm-0.1.7-cp311-cp311-manylinux1_x86_64.whl (19.1 MB view details)

Uploaded CPython 3.11

vllm-0.1.7-cp310-cp310-manylinux1_x86_64.whl (19.1 MB view details)

Uploaded CPython 3.10

vllm-0.1.7-cp39-cp39-manylinux1_x86_64.whl (19.1 MB view details)

Uploaded CPython 3.9

vllm-0.1.7-cp38-cp38-manylinux1_x86_64.whl (19.1 MB view details)

Uploaded CPython 3.8

File details

Details for the file vllm-0.1.7.tar.gz.

File metadata

  • Download URL: vllm-0.1.7.tar.gz
  • Upload date:
  • Size: 114.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.17

File hashes

Hashes for vllm-0.1.7.tar.gz
Algorithm Hash digest
SHA256 d871f2dae1ce45e796882fe95a818a3fd98e0fd00fc454586f60e0164c6895e6
MD5 1999f78b81daf62aab55816f9f12f845
BLAKE2b-256 a3e7dd194e23233059103f7380b14755a0a12aea3e8f3d2c32a003682320b54d

See more details on using hashes here.

File details

Details for the file vllm-0.1.7-cp311-cp311-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for vllm-0.1.7-cp311-cp311-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 c002a99a2073e3670172730fde198d869a22885d17691fc6c42dfac4ca137858
MD5 987a313c2e76f7c9de24bce29f680ac5
BLAKE2b-256 156efa764023f65825e09083c765eb80ed66db96a033882e289709d28354539b

See more details on using hashes here.

File details

Details for the file vllm-0.1.7-cp310-cp310-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for vllm-0.1.7-cp310-cp310-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 3a4b6f62a91de422c391b9b7e408ab4ecd33e2c6fd1f2b04dcbf8c5f192bff86
MD5 eb5816369ddfff7811aad148cbc51e86
BLAKE2b-256 26ae804920b9bb72503d2c7ee12f4781306472aa8c24185d03cc29e0a012675a

See more details on using hashes here.

File details

Details for the file vllm-0.1.7-cp39-cp39-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for vllm-0.1.7-cp39-cp39-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 184c51cef885d9e236f312e20a13326fbbe90297ac2d88ee16356c237d8956c8
MD5 6118d29b1f45322273e1d90259c194cf
BLAKE2b-256 4aec6af0087697d4197586843f27041accdba31c635ab7288aeac0e477b9b711

See more details on using hashes here.

File details

Details for the file vllm-0.1.7-cp38-cp38-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for vllm-0.1.7-cp38-cp38-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 e73b9b658c51b9a554f45928e1a86bb97b001bd6b2201d1d53ff168594be6114
MD5 b190a8982f4850b6dfc5ea4b46d3f1ef
BLAKE2b-256 e56606d1f707cfad4363880930ec5edb4b1048cd027efdb4f872585824e4e7da

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page