vLLM backend for kani
Project description
kani-ext-vllm
This repository adds the VLLMEngine.
This package is considered provisional and maintained on a best-effort basis.
To install this package, you can install it from PyPI:
$ pip install kani-ext-vllm
Alternatively, you can install it using the git source:
$ pip install git+https://github.com/zhudotexe/kani-ext-vllm.git@main
See https://docs.vllm.ai/en/latest/index.html for more information on vLLM.
Usage
This package provides 3 main methods of serving models with vLLM:
- Offline mode
- vLLM-Native API mode
- OpenAI-Compatible API mode
These are generally equivalent, but offer slightly different options for each mode:
| Mode | Communication | Multiple Parallel Models? | Prompt Template/Parsing | Best For |
|---|---|---|---|---|
| Offline | Local | No | kani | Low-level control over the model |
| vLLM API | HTTP | Yes | kani | Running multiple different models in parallel |
| OpenAI API | HTTP | Yes | vLLM | Fast iteration and testing multiple models; multimodal models |
Offline Mode
from kani import Kani, chat_in_terminal
from kani.ext.vllm import VLLMEngine
engine = VLLMEngine(model_id="meta-llama/Meta-Llama-3-8B-Instruct")
ai = Kani(engine)
chat_in_terminal(ai)
vLLM-Native API Mode
[!NOTE] Using offline mode is preferred unless you need to load multiple models in parallel.
[!NOTE] The vLLM server will be started on a random free port. It will not be exposed to the wider internet (i.e, it binds to localhost).
When loading a model in API mode, the model's context length can not be read from the configuration, so you must pass
the max_context_size.
from kani import Kani, chat_in_terminal
from kani.ext.vllm import VLLMServerEngine
engine = VLLMServerEngine(model_id="meta-llama/Meta-Llama-3-8B-Instruct", max_context_size=128000)
ai = Kani(engine)
chat_in_terminal(ai)
OpenAI-Compatible API Mode
[!NOTE] The vLLM server will be started on a random free port. It will not be exposed to the wider internet (i.e, it binds to localhost).
When loading a model in API mode, the model's context length can not be read from the configuration, so you must pass
the max_context_size.
from kani import Kani, chat_in_terminal
from kani.ext.vllm import VLLMOpenAIEngine
engine = VLLMOpenAIEngine(model_id="meta-llama/Meta-Llama-3-8B-Instruct", max_context_size=128000)
ai = Kani(engine)
chat_in_terminal(ai)
Using Multiple GPUs
For multi-GPU support (probably needed), add model_load_kwargs={"tensor_parallel_size": 4}. Replace "4" with the
number of GPUs you have available.
[!NOTE] If you are loading in an API mode, use
vllm_args={"tensor_parallel_size": 4}instead.
Examples
Offline Mode
from kani.ext.vllm import VLLMEngine
from vllm import SamplingParams
model = VLLMEngine(
model_id="mistralai/Mistral-Small-Instruct-2409",
model_load_kwargs={"tensor_parallel_size": 2, "tokenizer_mode": "auto"},
sampling_params=SamplingParams(temperature=0, max_tokens=2048),
)
vLLM-Native API Mode
from kani.ext.vllm import VLLMServerEngine
model = VLLMServerEngine(
model_id="mistralai/Mistral-Small-Instruct-2409",
max_context_size=32000,
vllm_args={"tensor_parallel_size": 2, "tokenizer_mode": "auto"},
# note that these should not be wrapped in SamplingParams!
temperature=0,
max_tokens=2048,
)
See https://docs.vllm.ai/en/stable/serving/openai_compatible_server.html#completions-api_1 for a list of valid decoding parameters that can be specified in the engine constructor.
OpenAI-Compatible API Mode
from kani.ext.vllm import VLLMOpenAIEngine
model = VLLMOpenAIEngine(
model_id="Qwen/Qwen3-Omni-30B-A3B-Instruct",
max_context_size=32768,
vllm_args={"tensor_parallel_size": 2, "allowed_local_media_path": "/"},
# note that these should not be wrapped in SamplingParams!
temperature=0,
max_tokens=2048,
)
See https://docs.vllm.ai/en/stable/serving/openai_compatible_server.html#chat-api_1 for a list of valid decoding parameters that can be specified in the engine constructor.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file kani_ext_vllm-0.2.1.tar.gz.
File metadata
- Download URL: kani_ext_vllm-0.2.1.tar.gz
- Upload date:
- Size: 16.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
24fe6c79e62bf4446b244af4270bf80142642e781f5f96d295c77d96e13a4889
|
|
| MD5 |
874a5ecee389f1f93a4f2a52cd4f7883
|
|
| BLAKE2b-256 |
cac6fa3fe65c7416297f8732e79ae9ef3c89e1d2d32458a1b15c28342b16f2bf
|
Provenance
The following attestation bundles were made for kani_ext_vllm-0.2.1.tar.gz:
Publisher:
pythonpublish.yml on zhudotexe/kani-ext-vllm
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
kani_ext_vllm-0.2.1.tar.gz -
Subject digest:
24fe6c79e62bf4446b244af4270bf80142642e781f5f96d295c77d96e13a4889 - Sigstore transparency entry: 697203529
- Sigstore integration time:
-
Permalink:
zhudotexe/kani-ext-vllm@872c075ab212089882a4b9eebcfbbf72b332d724 -
Branch / Tag:
refs/tags/v0.2.1 - Owner: https://github.com/zhudotexe
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pythonpublish.yml@872c075ab212089882a4b9eebcfbbf72b332d724 -
Trigger Event:
release
-
Statement type:
File details
Details for the file kani_ext_vllm-0.2.1-py3-none-any.whl.
File metadata
- Download URL: kani_ext_vllm-0.2.1-py3-none-any.whl
- Upload date:
- Size: 13.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f27a81794b1c396d7aeeaa0854a59ce6a461f30be26c9dbe752624bb965fbb40
|
|
| MD5 |
0af8ff17ab6b47b326fa359b1f4233c2
|
|
| BLAKE2b-256 |
061f624901d55986c3a85762817748683a2639ff2cc78256df4adb9cd0a93fa9
|
Provenance
The following attestation bundles were made for kani_ext_vllm-0.2.1-py3-none-any.whl:
Publisher:
pythonpublish.yml on zhudotexe/kani-ext-vllm
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
kani_ext_vllm-0.2.1-py3-none-any.whl -
Subject digest:
f27a81794b1c396d7aeeaa0854a59ce6a461f30be26c9dbe752624bb965fbb40 - Sigstore transparency entry: 697203563
- Sigstore integration time:
-
Permalink:
zhudotexe/kani-ext-vllm@872c075ab212089882a4b9eebcfbbf72b332d724 -
Branch / Tag:
refs/tags/v0.2.1 - Owner: https://github.com/zhudotexe
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pythonpublish.yml@872c075ab212089882a4b9eebcfbbf72b332d724 -
Trigger Event:
release
-
Statement type: