Skip to main content

Minimal Python SDK for the vLLM API

Project description

vLLM SDK

Minimal Python SDK for the vLLM API. This package provides a lightweight client library for interacting with vLLM API servers, with only httpx and pydantic as dependencies.

Installation

pip install vllm-sdk

Quick Start

import asyncio
from vllm_sdk import VLLMClient, ChatMessage

async def main():
    async with VLLMClient(base_url="http://localhost:8000") as client:
        # Non-streaming chat completion
        response = await client.chat_completions(
            model="meta-llama/Meta-Llama-3.3-70B-Instruct",
            messages=[
                ChatMessage(role="user", content="Hello!")
            ],
        )
        print(response.choices[0].message.content)

        # Streaming chat completion
        async for chunk in client.chat_completions_stream(
            model="meta-llama/Meta-Llama-3.3-70B-Instruct",
            messages=[
                ChatMessage(role="user", content="Tell me a story")
            ],
        ):
            if chunk.choices[0].delta.content:
                print(chunk.choices[0].delta.content, end="", flush=True)

asyncio.run(main())

Features

  • Minimal Dependencies: Only requires httpx and pydantic
  • Type Safety: Full Pydantic schema validation for requests and responses
  • Async Support: Built on httpx for async/await support
  • Streaming: Support for streaming chat completions
  • Feature Search: Search SAE features by semantic similarity

API Reference

VLLMClient

The main client class for interacting with the vLLM API.

Methods

  • chat_completions() - Create a non-streaming chat completion
  • chat_completions_stream() - Stream chat completions (async generator)
  • feature_search() - Search for SAE features

Schemas

All request and response models are available for import:

  • ChatMessage - Individual chat message
  • ChatCompletionRequest - Chat completion request
  • ChatCompletionResponse - Chat completion response
  • ChatCompletionChunk - Streaming chunk
  • FeatureSearchRequest - Feature search request
  • FeatureSearchResponse - Feature search response
  • ModelName - Supported model names enum

Examples

Feature Search

from vllm_sdk import VLLMClient, FeatureSearchRequest

async with VLLMClient(base_url="http://localhost:8000") as client:
    response = await client.feature_search(
        query="pirate speech",
        model="meta-llama/Meta-Llama-3.3-70B-Instruct",
        top_k=10,
    )
    for feature in response.data:
        print(f"{feature.id}: {feature.label} (layer {feature.layer})")

With Interventions

from vllm_sdk import VLLMClient, ChatMessage, InterventionSpec

async with VLLMClient(base_url="http://localhost:8000") as client:
    response = await client.chat_completions(
        model="meta-llama/Meta-Llama-3.3-70B-Instruct",
        messages=[ChatMessage(role="user", content="Hello!")],
        interventions=[
            InterventionSpec(
                feature_id="feature_123",
                strength=2.0,
                mode="add"
            )
        ],
    )

License

Apache 2.0

Links

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vllm_sdk-0.1.1.tar.gz (10.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vllm_sdk-0.1.1-py3-none-any.whl (11.9 kB view details)

Uploaded Python 3

File details

Details for the file vllm_sdk-0.1.1.tar.gz.

File metadata

  • Download URL: vllm_sdk-0.1.1.tar.gz
  • Upload date:
  • Size: 10.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for vllm_sdk-0.1.1.tar.gz
Algorithm Hash digest
SHA256 14e23162b622806b0f5cd9971067aca432ee668a3a359732f2da1cbc138d3bb1
MD5 328137644abbef9d3cb924e1a5dc4698
BLAKE2b-256 6625a4d262298a63864a1a90edf500ef6d7ac3d0a4234ae48db500d83af5cff3

See more details on using hashes here.

File details

Details for the file vllm_sdk-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: vllm_sdk-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 11.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for vllm_sdk-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 99cd0f2d102316da81a05b0e7be53c610bbb764311943e2cada51f10787f6db9
MD5 953f4eacb7d99ba4e9e5d23a6cffe261
BLAKE2b-256 310aa591374dcd7598db68a4930174d59d0d14005e4051e08d64f94d61789073

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page