Minimal Python SDK for the vLLM API

These details have not been verified by PyPI

Project links

Project description

vLLM SDK

Minimal Python SDK for the vLLM API. This package provides a lightweight client library for interacting with vLLM API servers, with only httpx and pydantic as dependencies.

Installation

pip install vllm-sdk

Quick Start

import asyncio
from vllm_sdk import VLLMClient, ChatMessage

async def main():
    async with VLLMClient(base_url="http://localhost:8000") as client:
        # Non-streaming chat completion
        response = await client.chat_completions(
            model="meta-llama/Meta-Llama-3.3-70B-Instruct",
            messages=[
                ChatMessage(role="user", content="Hello!")
            ],
        )
        print(response.choices[0].message.content)

        # Streaming chat completion
        async for chunk in client.chat_completions_stream(
            model="meta-llama/Meta-Llama-3.3-70B-Instruct",
            messages=[
                ChatMessage(role="user", content="Tell me a story")
            ],
        ):
            if chunk.choices[0].delta.content:
                print(chunk.choices[0].delta.content, end="", flush=True)

asyncio.run(main())

Features

Minimal Dependencies: Only requires httpx and pydantic
Type Safety: Full Pydantic schema validation for requests and responses
Async Support: Built on httpx for async/await support
Streaming: Support for streaming chat completions
Feature Search: Search SAE features by semantic similarity

API Reference

VLLMClient

The main client class for interacting with the vLLM API.

Methods

chat_completions() - Create a non-streaming chat completion
chat_completions_stream() - Stream chat completions (async generator)
feature_search() - Search for SAE features

Schemas

All request and response models are available for import:

ChatMessage - Individual chat message
ChatCompletionRequest - Chat completion request
ChatCompletionResponse - Chat completion response
ChatCompletionChunk - Streaming chunk
FeatureSearchRequest - Feature search request
FeatureSearchResponse - Feature search response
ModelName - Supported model names enum

Examples

Feature Search

from vllm_sdk import VLLMClient, FeatureSearchRequest

async with VLLMClient(base_url="http://localhost:8000") as client:
    response = await client.feature_search(
        query="pirate speech",
        model="meta-llama/Meta-Llama-3.3-70B-Instruct",
        top_k=10,
    )
    for feature in response.data:
        print(f"{feature.id}: {feature.label} (layer {feature.layer})")

With Interventions

from vllm_sdk import VLLMClient, ChatMessage, InterventionSpec

async with VLLMClient(base_url="http://localhost:8000") as client:
    response = await client.chat_completions(
        model="meta-llama/Meta-Llama-3.3-70B-Instruct",
        messages=[ChatMessage(role="user", content="Hello!")],
        interventions=[
            InterventionSpec(
                feature_id="feature_123",
                strength=2.0,
                mode="add"
            )
        ],
    )

License

Apache 2.0

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.2.0

Dec 8, 2025

0.1.4

Dec 3, 2025

0.1.3

Nov 26, 2025

0.1.2

Nov 24, 2025

0.1.1

Nov 14, 2025

This version

0.1.0

Nov 13, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vllm_sdk-0.1.0.tar.gz (9.4 kB view details)

Uploaded Nov 13, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

vllm_sdk-0.1.0-py3-none-any.whl (11.1 kB view details)

Uploaded Nov 13, 2025 Python 3

File details

Details for the file vllm_sdk-0.1.0.tar.gz.

File metadata

Download URL: vllm_sdk-0.1.0.tar.gz
Upload date: Nov 13, 2025
Size: 9.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for vllm_sdk-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`cc0de1fab1189ff3bd3afe05b15a8b07a7500287b95594f8e6a77bdbcaaa4137`
MD5	`012469c68ef716d912ed761e128d9359`
BLAKE2b-256	`c95984551771ae6399889586a77a786d43fa2e773b549007c6428032a3cfe555`

See more details on using hashes here.

File details

Details for the file vllm_sdk-0.1.0-py3-none-any.whl.

File metadata

Download URL: vllm_sdk-0.1.0-py3-none-any.whl
Upload date: Nov 13, 2025
Size: 11.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for vllm_sdk-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`4f2fba27952b2bfbe7536afeae167e1feb74b43b72ef8c4d6994b008b66df60e`
MD5	`8e81e568c2973c5dd1f68c96090baa58`
BLAKE2b-256	`3765fd99831a4779951447e0c475184c92d8c4d4181dd42c9d030d0574452aa9`

See more details on using hashes here.

vllm-sdk 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

vLLM SDK

Installation

Quick Start

Features

API Reference

VLLMClient

Methods

Schemas

Examples

Feature Search

With Interventions

License

Links

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes