A high-throughput and memory-efficient inference and serving engine for LLMs
Project description
nm-vllm
Overview
This repo nm-vllm-ent
contains all the source for the Neuralmagic Enterprise Edition of vllm
. The nm-vllm
packages built from this repo are supported enterprise distributions of vLLM. Packages are versioned Python wheels and docker images. These are released as "production level" official releases and "beta level" Nightly's.
Official releases are made at the discretion of Neuralmagic, but typically track with vllm
releases. These wheels are available via "public pypi" as well as "nm-pypi".
Nightly's are released every night given green runs in automation. The wheels are available at "nm-pypi".
Installation
PyPI
The nm-vllm PyPi package includes pre-compiled binaries for CUDA (version 12.1) kernels. For other PyTorch or CUDA versions, please compile the package from source.
Install it using pip:
pip install nm-vllm --extra-index-url https://pypi.neuralmagic.com/simple
To utilize the weight sparsity features, include the optional sparse
dependencies.
pip install nm-vllm[sparse] --extra-index-url https://pypi.neuralmagic.com/simple
You can also build and install nm-vllm
from source (this will take ~10 minutes):
git clone https://github.com/neuralmagic/nm-vllm.git
cd nm-vllm
pip install -e .[sparse] --extra-index-url https://pypi.neuralmagic.com/simple
Docker
The nm-vllm
container registry includes premade docker images.
Launch the OpenAI-compatible server with:
MODEL_ID=Qwen/Qwen2-0.5B-Instruct
docker run --gpus all --shm-size 2g ghcr.io/neuralmagic/nm-vllm-openai:latest --model $MODEL_ID
Models
Neural Magic maintains a variety of optimized models on our Hugging Face organization profiles:
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
File details
Details for the file nm_vllm-0.5.2.0-cp38-abi3-manylinux_2_17_x86_64.whl
.
File metadata
- Download URL: nm_vllm-0.5.2.0-cp38-abi3-manylinux_2_17_x86_64.whl
- Upload date:
- Size: 147.0 MB
- Tags: CPython 3.8+, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2dec31a07d6e43800c5b77f8516633fb39addf374133897242edca79dbe7f5c6 |
|
MD5 | 94d3470e9736f5c764ce941db803a9c9 |
|
BLAKE2b-256 | db8662128ea9363e47ec0258a9edd21cb6666018be333223cc0459f736cd743c |