Minimal Python SDK for the vLLM API
Project description
vLLM SDK
Minimal Python SDK for the vLLM API. This package provides a lightweight client library for interacting with vLLM API servers, with only httpx and pydantic as dependencies.
Installation
pip install vllm-sdk
Quick Start
import asyncio
from vllm_sdk import VLLMClient, ChatMessage
async def main():
async with VLLMClient(base_url="http://localhost:8000") as client:
# Non-streaming chat completion
response = await client.chat_completions(
model="meta-llama/Meta-Llama-3.3-70B-Instruct",
messages=[
ChatMessage(role="user", content="Hello!")
],
)
print(response.choices[0].message.content)
# Streaming chat completion
async for chunk in client.chat_completions_stream(
model="meta-llama/Meta-Llama-3.3-70B-Instruct",
messages=[
ChatMessage(role="user", content="Tell me a story")
],
):
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)
asyncio.run(main())
Features
- Minimal Dependencies: Only requires
httpxandpydantic - Type Safety: Full Pydantic schema validation for requests and responses
- Async Support: Built on
httpxfor async/await support - Streaming: Support for streaming chat completions
- Feature Search: Search SAE features by semantic similarity
API Reference
VLLMClient
The main client class for interacting with the vLLM API.
Methods
chat_completions()- Create a non-streaming chat completionchat_completions_stream()- Stream chat completions (async generator)feature_search()- Search for SAE features
Schemas
All request and response models are available for import:
ChatMessage- Individual chat messageChatCompletionRequest- Chat completion requestChatCompletionResponse- Chat completion responseChatCompletionChunk- Streaming chunkFeatureSearchRequest- Feature search requestFeatureSearchResponse- Feature search responseModelName- Supported model names enum
Examples
Feature Search
from vllm_sdk import VLLMClient, FeatureSearchRequest
async with VLLMClient(base_url="http://localhost:8000") as client:
response = await client.feature_search(
query="pirate speech",
model="meta-llama/Meta-Llama-3.3-70B-Instruct",
top_k=10,
)
for feature in response.data:
print(f"{feature.id}: {feature.label} (layer {feature.layer})")
With Interventions
from vllm_sdk import VLLMClient, ChatMessage, InterventionSpec
async with VLLMClient(base_url="http://localhost:8000") as client:
response = await client.chat_completions(
model="meta-llama/Meta-Llama-3.3-70B-Instruct",
messages=[ChatMessage(role="user", content="Hello!")],
interventions=[
InterventionSpec(
feature_id="feature_123",
strength=2.0,
mode="add"
)
],
)
License
Apache 2.0
Links
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file vllm_sdk-0.1.0.tar.gz.
File metadata
- Download URL: vllm_sdk-0.1.0.tar.gz
- Upload date:
- Size: 9.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cc0de1fab1189ff3bd3afe05b15a8b07a7500287b95594f8e6a77bdbcaaa4137
|
|
| MD5 |
012469c68ef716d912ed761e128d9359
|
|
| BLAKE2b-256 |
c95984551771ae6399889586a77a786d43fa2e773b549007c6428032a3cfe555
|
File details
Details for the file vllm_sdk-0.1.0-py3-none-any.whl.
File metadata
- Download URL: vllm_sdk-0.1.0-py3-none-any.whl
- Upload date:
- Size: 11.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4f2fba27952b2bfbe7536afeae167e1feb74b43b72ef8c4d6994b008b66df60e
|
|
| MD5 |
8e81e568c2973c5dd1f68c96090baa58
|
|
| BLAKE2b-256 |
3765fd99831a4779951447e0c475184c92d8c4d4181dd42c9d030d0574452aa9
|