Skip to main content

Minimal Python SDK for the vLLM API

Project description

vLLM SDK

Minimal Python SDK for the vLLM API. This package provides a lightweight client library for interacting with vLLM API servers, with only httpx and pydantic as dependencies.

Installation

pip install vllm-sdk

Quick Start

import asyncio
from vllm_sdk import VLLMClient, ChatMessage

async def main():
    async with VLLMClient(base_url="http://localhost:8000") as client:
        # Non-streaming chat completion
        response = await client.chat_completions(
            model="meta-llama/Meta-Llama-3.3-70B-Instruct",
            messages=[
                ChatMessage(role="user", content="Hello!")
            ],
        )
        print(response.choices[0].message.content)

        # Streaming chat completion
        async for chunk in client.chat_completions_stream(
            model="meta-llama/Meta-Llama-3.3-70B-Instruct",
            messages=[
                ChatMessage(role="user", content="Tell me a story")
            ],
        ):
            if chunk.choices[0].delta.content:
                print(chunk.choices[0].delta.content, end="", flush=True)

asyncio.run(main())

Features

  • Minimal Dependencies: Only requires httpx and pydantic
  • Type Safety: Full Pydantic schema validation for requests and responses
  • Async Support: Built on httpx for async/await support
  • Streaming: Support for streaming chat completions
  • Feature Search: Search SAE features by semantic similarity

API Reference

VLLMClient

The main client class for interacting with the vLLM API.

Methods

  • chat_completions() - Create a non-streaming chat completion
  • chat_completions_stream() - Stream chat completions (async generator)
  • feature_search() - Search for SAE features

Schemas

All request and response models are available for import:

  • ChatMessage - Individual chat message
  • ChatCompletionRequest - Chat completion request
  • ChatCompletionResponse - Chat completion response
  • ChatCompletionChunk - Streaming chunk
  • FeatureSearchRequest - Feature search request
  • FeatureSearchResponse - Feature search response
  • ModelName - Supported model names enum

Examples

Feature Search

from vllm_sdk import VLLMClient, FeatureSearchRequest

async with VLLMClient(base_url="http://localhost:8000") as client:
    response = await client.feature_search(
        query="pirate speech",
        model="meta-llama/Meta-Llama-3.3-70B-Instruct",
        top_k=10,
    )
    for feature in response.data:
        print(f"{feature.id}: {feature.label} (layer {feature.layer})")

With Interventions

from vllm_sdk import VLLMClient, ChatMessage, InterventionSpec

async with VLLMClient(base_url="http://localhost:8000") as client:
    response = await client.chat_completions(
        model="meta-llama/Meta-Llama-3.3-70B-Instruct",
        messages=[ChatMessage(role="user", content="Hello!")],
        interventions=[
            InterventionSpec(
                feature_id="feature_123",
                strength=2.0,
                mode="add"
            )
        ],
    )

License

Apache 2.0

Links

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vllm_sdk-0.1.0.tar.gz (9.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vllm_sdk-0.1.0-py3-none-any.whl (11.1 kB view details)

Uploaded Python 3

File details

Details for the file vllm_sdk-0.1.0.tar.gz.

File metadata

  • Download URL: vllm_sdk-0.1.0.tar.gz
  • Upload date:
  • Size: 9.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for vllm_sdk-0.1.0.tar.gz
Algorithm Hash digest
SHA256 cc0de1fab1189ff3bd3afe05b15a8b07a7500287b95594f8e6a77bdbcaaa4137
MD5 012469c68ef716d912ed761e128d9359
BLAKE2b-256 c95984551771ae6399889586a77a786d43fa2e773b549007c6428032a3cfe555

See more details on using hashes here.

File details

Details for the file vllm_sdk-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: vllm_sdk-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 11.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for vllm_sdk-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 4f2fba27952b2bfbe7536afeae167e1feb74b43b72ef8c4d6994b008b66df60e
MD5 8e81e568c2973c5dd1f68c96090baa58
BLAKE2b-256 3765fd99831a4779951447e0c475184c92d8c4d4181dd42c9d030d0574452aa9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page