Skip to main content

vLLM backend for kani

Project description

kani-ext-vllm

This repository adds the VLLMEngine.

This package is considered provisional and maintained on a best-effort basis.

To install this package, you can install it from PyPI:

$ pip install kani-ext-vllm

Alternatively, you can install it using the git source:

$ pip install git+https://github.com/zhudotexe/kani-ext-vllm.git@main

See https://docs.vllm.ai/en/latest/index.html for more information on vLLM.

Usage

This package provides 2 main methods of serving models with vLLM: offline mode (preferred), and API mode. These are generally equivalent, and differ only in how Kani communicates with vLLM workers.

Generally, you should use offline mode unless you need to load multiple models in parallel.

Offline Mode

from kani import Kani, chat_in_terminal
from kani.ext.vllm import VLLMEngine

engine = VLLMEngine(model_id="meta-llama/Meta-Llama-3-8B-Instruct")
ai = Kani(engine)
chat_in_terminal(ai)

API Mode

[!ATTENTION] Using offline mode is preferred unless you need to load multiple models in parallel.

[!NOTE] The vLLM server will be started on a random free port. It will not be exposed to the wider internet (i.e, it binds to localhost).

When loading a model in API mode, the model's context length can not be read from the configuration, so you must pass the max_context_len.

from kani import Kani, chat_in_terminal
from kani.ext.vllm import VLLMServerEngine

engine = VLLMServerEngine(model_id="meta-llama/Meta-Llama-3-8B-Instruct", max_context_len=128000)
ai = Kani(engine)
chat_in_terminal(ai)

Command R

[!NOTE] Command R only supports loading in offline mode.

Command R's HF impl does not support the full 128k ctx length. Cohere recommends using vLLM, so here we are.

from kani import Kani, chat_in_terminal
from kani.ext.vllm import CommandRVLLMEngine

engine = CommandRVLLMEngine(model_id="CohereForAI/c4ai-command-r-v01")
ai = Kani(engine)
chat_in_terminal(ai)

Using Multiple GPUs

For multi-GPU support (probably needed), add model_load_kwargs={"tensor_parallel_size": 4}. Replace "4" with the number of GPUs you have available.

[!NOTE] If you are loading in API mode, use vllm_args={"tensor_parallel_size": 4} instead.

Examples

Offline Mode

from kani.ext.vllm import VLLMEngine
from vllm import SamplingParams

model = VLLMEngine(
    model_id="mistralai/Mistral-Small-Instruct-2409",
    model_load_kwargs={"tensor_parallel_size": 2, "tokenizer_mode": "auto"},
    sampling_params=SamplingParams(temperature=0, max_tokens=2048),
)

API Mode

from kani.ext.vllm import VLLMServerEngine

model = VLLMServerEngine(
    model_id="mistralai/Mistral-Small-Instruct-2409",
    max_context_len=32000,
    vllm_args={"tensor_parallel_size": 2, "tokenizer_mode": "auto"},
    # note that these should not be wrapped in SamplingParams!
    temperature=0,
    max_tokens=2048,
)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kani_ext_vllm-0.0.8.tar.gz (13.0 kB view details)

Uploaded Source

Built Distribution

kani_ext_vllm-0.0.8-py3-none-any.whl (12.9 kB view details)

Uploaded Python 3

File details

Details for the file kani_ext_vllm-0.0.8.tar.gz.

File metadata

  • Download URL: kani_ext_vllm-0.0.8.tar.gz
  • Upload date:
  • Size: 13.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for kani_ext_vllm-0.0.8.tar.gz
Algorithm Hash digest
SHA256 4b2dd4487dc35dc8668068d6a5c8da3a593e66757f58afff363f618ac24773e0
MD5 315ee0b08df01f939b50acf5f44ce0a5
BLAKE2b-256 203d1941d7ec4cadbd0b5b98dfdcdf7bb39e9204b9ee500dc997b94f02067e93

See more details on using hashes here.

Provenance

The following attestation bundles were made for kani_ext_vllm-0.0.8.tar.gz:

Publisher: pythonpublish.yml on zhudotexe/kani-ext-vllm

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file kani_ext_vllm-0.0.8-py3-none-any.whl.

File metadata

  • Download URL: kani_ext_vllm-0.0.8-py3-none-any.whl
  • Upload date:
  • Size: 12.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for kani_ext_vllm-0.0.8-py3-none-any.whl
Algorithm Hash digest
SHA256 b5874b581976961ea17e8fc7bb66cd0047745c3c825c4573047fd8649c4ff2e1
MD5 9ad8feb715d41497016f4395b2cc707c
BLAKE2b-256 15c7690ff950ee7e794e43cca861f89c48090f4735d5b9daac75b077809c257b

See more details on using hashes here.

Provenance

The following attestation bundles were made for kani_ext_vllm-0.0.8-py3-none-any.whl:

Publisher: pythonpublish.yml on zhudotexe/kani-ext-vllm

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page