Skip to main content

Minimal Python SDK for the vLLM API

Project description

vLLM SDK

Minimal Python SDK for the vLLM API. This package provides a lightweight client library for interacting with vLLM API servers, with only httpx and pydantic as dependencies.

Installation

pip install vllm-sdk

Quick Start

import asyncio
from vllm_sdk import VLLMClient, ChatMessage

async def main():
    async with VLLMClient(base_url="http://localhost:8000") as client:
        # Non-streaming chat completion
        response = await client.chat_completions(
            model="meta-llama/Llama-3.3-70B-Instruct",
            messages=[
                ChatMessage(role="user", content="Hello!")
            ],
        )
        print(response.choices[0].message.content)

        # Streaming chat completion
        async for chunk in client.chat_completions_stream(
            model="meta-llama/Llama-3.3-70B-Instruct",
            messages=[
                ChatMessage(role="user", content="Tell me a story")
            ],
        ):
            if chunk.choices[0].delta.content:
                print(chunk.choices[0].delta.content, end="", flush=True)

asyncio.run(main())

Features

  • Minimal Dependencies: Only requires httpx and pydantic
  • Type Safety: Full Pydantic schema validation for requests and responses
  • Async Support: Built on httpx for async/await support
  • Streaming: Support for streaming chat completions
  • Feature Search: Search SAE features by semantic similarity

API Reference

VLLMClient

The main client class for interacting with the vLLM API.

Methods

  • chat_completions() - Create a non-streaming chat completion
  • chat_completions_stream() - Stream chat completions (async generator)
  • feature_search() - Search for SAE features

Schemas

All request and response models are available for import:

  • ChatMessage - Individual chat message
  • ChatCompletionRequest - Chat completion request
  • ChatCompletionResponse - Chat completion response
  • ChatCompletionChunk - Streaming chunk
  • FeatureSearchRequest - Feature search request
  • FeatureSearchResponse - Feature search response
  • ModelName - Supported model names enum

Examples

Feature Search

from vllm_sdk import VLLMClient, FeatureSearchRequest

async with VLLMClient(base_url="http://localhost:8000") as client:
    response = await client.feature_search(
        query="pirate speech",
        model="meta-llama/Llama-3.3-70B-Instruct",
        top_k=10,
    )
    for feature in response.data:
        print(f"{feature.id}: {feature.label} (layer {feature.layer})")

With Interventions

from vllm_sdk import Client, Variant

client = Client(api_key="your-api-key")
variant = Variant("meta-llama/Llama-3.3-70B-Instruct")
variant.add_intervention(feature_id=12345, strength=0.8, mode="add")

response = client.chat.completions.create(
    model=variant,
    messages=[{"role": "user", "content": "Hello!"}],
    max_completion_tokens=256,
)
print(response.choices[0].message.content)
client.close()

License

Apache 2.0

Links

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vllm_sdk-0.1.3.tar.gz (14.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vllm_sdk-0.1.3-py3-none-any.whl (16.3 kB view details)

Uploaded Python 3

File details

Details for the file vllm_sdk-0.1.3.tar.gz.

File metadata

  • Download URL: vllm_sdk-0.1.3.tar.gz
  • Upload date:
  • Size: 14.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for vllm_sdk-0.1.3.tar.gz
Algorithm Hash digest
SHA256 c3b65e862d6e86409239436606a86bb3f7897751a1045eb14f647646881bda4b
MD5 5b5e794e7f40d03e1b087b23c6e10d5f
BLAKE2b-256 4ed34610f0d9981d2a2fa3dc3a902f5e64a84412294a6f2561cbec4d328570de

See more details on using hashes here.

File details

Details for the file vllm_sdk-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: vllm_sdk-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 16.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for vllm_sdk-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 d1fa140adfcd8ca5ebc610bb3a47750c1797caf88018b6fb043980738140f40b
MD5 1d30f664f190086abaa3144dfb4db3dd
BLAKE2b-256 53f1a49ee62867f61fce8333e0c7d0f6574b6c9fd9b02f895a0b38de8df4d3a6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page