Skip to main content

A high-throughput and memory-efficient inference and serving engine for LLMs

Project description

vLLM

Easy, fast, and cheap LLM serving for everyone

| Documentation | Blog | Paper | Discord |


Latest News 🔥

  • [2023/10] We hosted the first vLLM meetup in SF! Please find the meetup slides here.
  • [2023/09] We created our Discord server! Join us to discuss vLLM and LLM serving! We will also post the latest announcements and updates there.
  • [2023/09] We released our PagedAttention paper on arXiv!
  • [2023/08] We would like to express our sincere gratitude to Andreessen Horowitz (a16z) for providing a generous grant to support the open-source development and research of vLLM.
  • [2023/07] Added support for LLaMA-2! You can run and serve 7B/13B/70B LLaMA-2s on vLLM with a single command!
  • [2023/06] Serving vLLM On any Cloud with SkyPilot. Check out a 1-click example to start the vLLM demo, and the blog post for the story behind vLLM development on the clouds.
  • [2023/06] We officially released vLLM! FastChat-vLLM integration has powered LMSYS Vicuna and Chatbot Arena since mid-April. Check out our blog post.

vLLM is a fast and easy-to-use library for LLM inference and serving.

vLLM is fast with:

  • State-of-the-art serving throughput
  • Efficient management of attention key and value memory with PagedAttention
  • Continuous batching of incoming requests
  • Optimized CUDA kernels

vLLM is flexible and easy to use with:

  • Seamless integration with popular Hugging Face models
  • High-throughput serving with various decoding algorithms, including parallel sampling, beam search, and more
  • Tensor parallelism support for distributed inference
  • Streaming outputs
  • OpenAI-compatible API server

vLLM seamlessly supports many Hugging Face models, including the following architectures:

  • Aquila & Aquila2 (BAAI/AquilaChat2-7B, BAAI/AquilaChat2-34B, BAAI/Aquila-7B, BAAI/AquilaChat-7B, etc.)
  • Baichuan (baichuan-inc/Baichuan-7B, baichuan-inc/Baichuan-13B-Chat, etc.)
  • BLOOM (bigscience/bloom, bigscience/bloomz, etc.)
  • Falcon (tiiuae/falcon-7b, tiiuae/falcon-40b, tiiuae/falcon-rw-7b, etc.)
  • GPT-2 (gpt2, gpt2-xl, etc.)
  • GPT BigCode (bigcode/starcoder, bigcode/gpt_bigcode-santacoder, etc.)
  • GPT-J (EleutherAI/gpt-j-6b, nomic-ai/gpt4all-j, etc.)
  • GPT-NeoX (EleutherAI/gpt-neox-20b, databricks/dolly-v2-12b, stabilityai/stablelm-tuned-alpha-7b, etc.)
  • InternLM (internlm/internlm-7b, internlm/internlm-chat-7b, etc.)
  • LLaMA & LLaMA-2 (meta-llama/Llama-2-70b-hf, lmsys/vicuna-13b-v1.3, young-geng/koala, openlm-research/open_llama_13b, etc.)
  • Mistral (mistralai/Mistral-7B-v0.1, mistralai/Mistral-7B-Instruct-v0.1, etc.)
  • MPT (mosaicml/mpt-7b, mosaicml/mpt-30b, etc.)
  • OPT (facebook/opt-66b, facebook/opt-iml-max-30b, etc.)
  • Qwen (Qwen/Qwen-7B, Qwen/Qwen-7B-Chat, etc.)

Install vLLM with pip or from source:

pip install vllm

Getting Started

Visit our documentation to get started.

Contributing

We welcome and value any contributions and collaborations. Please check out CONTRIBUTING.md for how to get involved.

Citation

If you use vLLM for your research, please cite our paper:

@inproceedings{kwon2023efficient,
  title={Efficient Memory Management for Large Language Model Serving with PagedAttention},
  author={Woosuk Kwon and Zhuohan Li and Siyuan Zhuang and Ying Sheng and Lianmin Zheng and Cody Hao Yu and Joseph E. Gonzalez and Hao Zhang and Ion Stoica},
  booktitle={Proceedings of the ACM SIGOPS 29th Symposium on Operating Systems Principles},
  year={2023}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vllm-0.2.1.post1.tar.gz (132.2 kB view details)

Uploaded Source

Built Distributions

vllm-0.2.1.post1-cp311-cp311-manylinux1_x86_64.whl (28.7 MB view details)

Uploaded CPython 3.11

vllm-0.2.1.post1-cp310-cp310-manylinux1_x86_64.whl (28.6 MB view details)

Uploaded CPython 3.10

vllm-0.2.1.post1-cp39-cp39-manylinux1_x86_64.whl (28.6 MB view details)

Uploaded CPython 3.9

vllm-0.2.1.post1-cp38-cp38-manylinux1_x86_64.whl (28.6 MB view details)

Uploaded CPython 3.8

File details

Details for the file vllm-0.2.1.post1.tar.gz.

File metadata

  • Download URL: vllm-0.2.1.post1.tar.gz
  • Upload date:
  • Size: 132.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.18

File hashes

Hashes for vllm-0.2.1.post1.tar.gz
Algorithm Hash digest
SHA256 3f9deb68b4dfa4464650e4a4b13527543124a16ffd112cfb51bb033a3ac351fd
MD5 d395f97317cc723b5c4fdd25826b643d
BLAKE2b-256 b22eb5f6222ff465405c60d2b9b2fb1e1e63009887f25b63f602c934b95e82a1

See more details on using hashes here.

File details

Details for the file vllm-0.2.1.post1-cp311-cp311-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for vllm-0.2.1.post1-cp311-cp311-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 abce4346f8ba1cbce402983514758a0d6d5cd0149fdb28cee48bfabf25443ab2
MD5 200e409c032b21d2008189fed8b240f1
BLAKE2b-256 a9e704bbed21b18904feb5b4e6f78f1716e6a160ef961ed95df1711437c28989

See more details on using hashes here.

File details

Details for the file vllm-0.2.1.post1-cp310-cp310-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for vllm-0.2.1.post1-cp310-cp310-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 6fb7d1512ff8c2984e6ae19e11c2f6492205a3de6affe06a6d32645e6b0c4865
MD5 4ae042ef80a7ae292d79d362cd59da00
BLAKE2b-256 a549b9c00ca8651f3d83cec5208a499a4e84eab5afabd75457acaa496e328cd6

See more details on using hashes here.

File details

Details for the file vllm-0.2.1.post1-cp39-cp39-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for vllm-0.2.1.post1-cp39-cp39-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 4066596d08bcc8321a61c38e353f1e5e96234f5e9fbfb768a04ebbc26a211a26
MD5 9e1335aa520ad320596e7f0dd85be894
BLAKE2b-256 b70a25449e11d1b71b4688280fb35d3af26f7674fe146afad8d7470ce50bbe15

See more details on using hashes here.

File details

Details for the file vllm-0.2.1.post1-cp38-cp38-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for vllm-0.2.1.post1-cp38-cp38-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 9ed8a77254735a07fd2cd122215c84eb87199b36480f7e9a79e914b5aa4c2b91
MD5 07a2b212cd525e4c0f7808838dd9a1fc
BLAKE2b-256 34f37a752bd6730f3d975d3a6ad70c033a2b296f57e2bf73122dd54ddf38ef5e

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page