Skip to main content

Minimal Python SDK for the vLLM API

Project description

vLLM SDK

Minimal Python SDK for the vLLM API. This package provides a lightweight client library for interacting with vLLM API servers, with only httpx and pydantic as dependencies.

Installation

pip install vllm-sdk

Quick Start

import asyncio
from vllm_sdk import VLLMClient, ChatMessage

async def main():
    async with VLLMClient(base_url="http://localhost:8000") as client:
        # Non-streaming chat completion
        response = await client.chat_completions(
            model="meta-llama/Llama-3.3-70B-Instruct",
            messages=[
                ChatMessage(role="user", content="Hello!")
            ],
        )
        print(response.choices[0].message.content)

        # Streaming chat completion
        async for chunk in client.chat_completions_stream(
            model="meta-llama/Llama-3.3-70B-Instruct",
            messages=[
                ChatMessage(role="user", content="Tell me a story")
            ],
        ):
            if chunk.choices[0].delta.content:
                print(chunk.choices[0].delta.content, end="", flush=True)

asyncio.run(main())

Features

  • Minimal Dependencies: Only requires httpx and pydantic
  • Type Safety: Full Pydantic schema validation for requests and responses
  • Async Support: Built on httpx for async/await support
  • Streaming: Support for streaming chat completions
  • Feature Search: Search SAE features by semantic similarity

API Reference

VLLMClient

The main client class for interacting with the vLLM API.

Methods

  • chat_completions() - Create a non-streaming chat completion
  • chat_completions_stream() - Stream chat completions (async generator)
  • feature_search() - Search for SAE features

Schemas

All request and response models are available for import:

  • ChatMessage - Individual chat message
  • ChatCompletionRequest - Chat completion request
  • ChatCompletionResponse - Chat completion response
  • ChatCompletionChunk - Streaming chunk
  • FeatureSearchRequest - Feature search request
  • FeatureSearchResponse - Feature search response
  • ModelName - Supported model names enum

Examples

Feature Search

from vllm_sdk import VLLMClient, FeatureSearchRequest

async with VLLMClient(base_url="http://localhost:8000") as client:
    response = await client.feature_search(
        query="pirate speech",
        model="meta-llama/Llama-3.3-70B-Instruct",
        top_k=10,
    )
    for feature in response.data:
        print(f"{feature.id}: {feature.label} (layer {feature.layer})")

With Interventions

from vllm_sdk import Client, Variant

client = Client(api_key="your-api-key")
variant = Variant("meta-llama/Llama-3.3-70B-Instruct")
variant.add_intervention(feature_id=12345, strength=0.8, mode="add")

response = client.chat.completions.create(
    model=variant,
    messages=[{"role": "user", "content": "Hello!"}],
    max_completion_tokens=256,
)
print(response.choices[0].message.content)
client.close()

License

Apache 2.0

Links

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vllm_sdk-0.1.2.tar.gz (13.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vllm_sdk-0.1.2-py3-none-any.whl (16.0 kB view details)

Uploaded Python 3

File details

Details for the file vllm_sdk-0.1.2.tar.gz.

File metadata

  • Download URL: vllm_sdk-0.1.2.tar.gz
  • Upload date:
  • Size: 13.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for vllm_sdk-0.1.2.tar.gz
Algorithm Hash digest
SHA256 1e33077193d7d391f1e387f4564de7674e85a5daef36a28d265fab05753b4055
MD5 f7214bb30f05b5cf3e319d2060460099
BLAKE2b-256 3e7033509631cd041a654de98d6af2fa8c7424d653aa5ea8b233b54c082f8944

See more details on using hashes here.

File details

Details for the file vllm_sdk-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: vllm_sdk-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 16.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for vllm_sdk-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 31b443a2f514f0c6bdf2150a100083093ca12a22cf000b65c1271d434b575044
MD5 2c657bb73c0a03b095ce733cd5600c72
BLAKE2b-256 988ba98ac5b1a637958b7702c43ca7ac5e1980eac7f8c2cd6b5100db0277121b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page