vLLM backend for kani

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

zhu.exe

These details have not been verified by PyPI

Intended Audience
- Science/Research
License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3
Topic
- Scientific/Engineering :: Artificial Intelligence

Project description

kani-ext-vllm

This repository adds the VLLMEngine.

This package is considered provisional and maintained on a best-effort basis.

To install this package, you can install it from PyPI:

$ pip install kani-ext-vllm

Alternatively, you can install it using the git source:

$ pip install git+https://github.com/zhudotexe/kani-ext-vllm.git@main

See https://docs.vllm.ai/en/latest/index.html for more information on vLLM.

Usage

This package provides 2 main methods of serving models with vLLM: offline mode (preferred), and API mode. These are generally equivalent, and differ only in how Kani communicates with vLLM workers.

Generally, you should use offline mode unless you need to load multiple models in parallel.

Offline Mode

from kani import Kani, chat_in_terminal
from kani.ext.vllm import VLLMEngine

engine = VLLMEngine(model_id="meta-llama/Meta-Llama-3-8B-Instruct")
ai = Kani(engine)
chat_in_terminal(ai)

API Mode

[!NOTE] Using offline mode is preferred unless you need to load multiple models in parallel.

[!NOTE] The vLLM server will be started on a random free port. It will not be exposed to the wider internet (i.e, it binds to localhost).

When loading a model in API mode, the model's context length can not be read from the configuration, so you must pass the max_context_len.

from kani import Kani, chat_in_terminal
from kani.ext.vllm import VLLMServerEngine

engine = VLLMServerEngine(model_id="meta-llama/Meta-Llama-3-8B-Instruct", max_context_len=128000)
ai = Kani(engine)
chat_in_terminal(ai)

Using Multiple GPUs

For multi-GPU support (probably needed), add model_load_kwargs={"tensor_parallel_size": 4}. Replace "4" with the number of GPUs you have available.

[!NOTE] If you are loading in API mode, use vllm_args={"tensor_parallel_size": 4} instead.

Examples

Offline Mode

from kani.ext.vllm import VLLMEngine
from vllm import SamplingParams

model = VLLMEngine(
    model_id="mistralai/Mistral-Small-Instruct-2409",
    model_load_kwargs={"tensor_parallel_size": 2, "tokenizer_mode": "auto"},
    sampling_params=SamplingParams(temperature=0, max_tokens=2048),
)

API Mode

from kani.ext.vllm import VLLMServerEngine

model = VLLMServerEngine(
    model_id="mistralai/Mistral-Small-Instruct-2409",
    max_context_len=32000,
    vllm_args={"tensor_parallel_size": 2, "tokenizer_mode": "auto"},
    # note that these should not be wrapped in SamplingParams!
    temperature=0,
    max_tokens=2048,
)

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

zhu.exe

These details have not been verified by PyPI

Intended Audience
- Science/Research
License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3
Topic
- Scientific/Engineering :: Artificial Intelligence

Release history Release notifications | RSS feed

0.3.1

May 6, 2026

0.3.0

Mar 4, 2026

0.2.2

Dec 8, 2025

0.2.1

Nov 12, 2025

0.2.0

Oct 30, 2025

This version

0.1.0

Aug 21, 2025

0.0.8

Feb 3, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kani_ext_vllm-0.1.0.tar.gz (11.8 kB view details)

Uploaded Aug 21, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

kani_ext_vllm-0.1.0-py3-none-any.whl (9.9 kB view details)

Uploaded Aug 21, 2025 Python 3

File details

Details for the file kani_ext_vllm-0.1.0.tar.gz.

File metadata

Download URL: kani_ext_vllm-0.1.0.tar.gz
Upload date: Aug 21, 2025
Size: 11.8 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for kani_ext_vllm-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`78975ab458d72d740ec1d8a9d654230b80031317646b5d7babfffb01671ddace`
MD5	`8a9be8884296c3d5f424aacd5d00fa9f`
BLAKE2b-256	`07dcf29b79a4b8a4075b4773b2c20fa1f0a3b1fd5db71980637644cf48d7ffce`

See more details on using hashes here.

Provenance

The following attestation bundles were made for kani_ext_vllm-0.1.0.tar.gz:

Publisher: pythonpublish.yml on zhudotexe/kani-ext-vllm

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: kani_ext_vllm-0.1.0.tar.gz
- Subject digest: 78975ab458d72d740ec1d8a9d654230b80031317646b5d7babfffb01671ddace
- Sigstore transparency entry: 417313605
- Sigstore integration time: Aug 21, 2025
Source repository:
- Permalink: zhudotexe/kani-ext-vllm@5a465e398e028b0d5972250914e49698f67b9863
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/zhudotexe
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: pythonpublish.yml@5a465e398e028b0d5972250914e49698f67b9863
- Trigger Event: release

File details

Details for the file kani_ext_vllm-0.1.0-py3-none-any.whl.

File metadata

Download URL: kani_ext_vllm-0.1.0-py3-none-any.whl
Upload date: Aug 21, 2025
Size: 9.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for kani_ext_vllm-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`7cbbe4c73c59e7c568d8acba36cc7157963a2d993158ab1345c46a106850bacf`
MD5	`5c1aabfab3068de821e9eacdd2a895a8`
BLAKE2b-256	`0b91c30aaf47fa1d92f9cdc8b997823c83208b8336771a5430351679f0887909`

See more details on using hashes here.

Provenance

The following attestation bundles were made for kani_ext_vllm-0.1.0-py3-none-any.whl:

Publisher: pythonpublish.yml on zhudotexe/kani-ext-vllm

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: kani_ext_vllm-0.1.0-py3-none-any.whl
- Subject digest: 7cbbe4c73c59e7c568d8acba36cc7157963a2d993158ab1345c46a106850bacf
- Sigstore transparency entry: 417313618
- Sigstore integration time: Aug 21, 2025
Source repository:
- Permalink: zhudotexe/kani-ext-vllm@5a465e398e028b0d5972250914e49698f67b9863
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/zhudotexe
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: pythonpublish.yml@5a465e398e028b0d5972250914e49698f67b9863
- Trigger Event: release

kani-ext-vllm 0.1.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

kani-ext-vllm

Usage

Offline Mode

API Mode

Using Multiple GPUs

Examples

Offline Mode

API Mode

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance