Model-agnostic LLM execution library
Project description
vox
Model-agnostic LLM execution library for Python. One interface, every provider.
Write your code once and run it against OpenAI, Anthropic, Google Gemini, OpenRouter, or local models via LM Studio — with streaming, tool use, structured output, and reasoning support out of the box.
Installation
# Core library (no provider SDKs)
pip install vox-llm
# With a specific provider
pip install "vox-llm[openai]"
pip install "vox-llm[anthropic]"
pip install "vox-llm[gemini]"
# All providers
pip install "vox-llm[all]"
Note: the PyPI package is
vox-llm(the namevoxwas already taken). The Python import name is stillvox—from vox import VoxClientworks unchanged.
From GitHub (pinned to a tag):
pip install "vox-llm[all] @ git+https://github.com/benballintyn/vox.git@v0.1.0"
Requires Python 3.11+.
Quick Start
from vox import VoxClient, Message
client = VoxClient(openai_api_key="sk-...")
response = client.complete(
messages=[Message(role="user", content="What is the speed of light?")],
model="gpt-4o",
)
print(response.message.text)
Switch providers by changing the model name — no other code changes needed:
# OpenAI
response = client.complete(messages, model="gpt-4o")
# Anthropic
response = client.complete(messages, model="claude-sonnet-4-20250514")
# Gemini
response = client.complete(messages, model="gemini-2.5-pro")
Provider Setup
Pass API keys directly or via environment variables:
client = VoxClient(
openai_api_key="sk-...", # or OPENAI_API_KEY env var
anthropic_api_key="sk-ant-...", # or ANTHROPIC_API_KEY env var
gemini_api_key="...", # or GEMINI_API_KEY env var
openrouter_api_key="sk-or-...", # or OPENROUTER_API_KEY env var
lmstudio_base_url="http://localhost:1234/v1", # default
)
Provider Auto-Detection
Vox resolves the provider from the model name automatically:
| Model prefix | Provider |
|---|---|
gpt-, o1, o3, o4 |
OpenAI |
claude- |
Anthropic |
gemini- |
Gemini |
For OpenRouter and LM Studio, pass provider= explicitly:
response = client.complete(
messages=messages,
model="meta-llama/llama-3-70b",
provider="openrouter",
)
Per-Provider Configuration
Override defaults with ProviderConfig:
from vox import VoxClient, ProviderConfig
client = VoxClient(
provider_configs={
"openai": ProviderConfig(
api_key="sk-...",
timeout=60.0,
max_retries=3,
),
"openrouter": ProviderConfig(
api_key="sk-or-...",
app_name="MyApp", # sent as X-Title header
app_url="https://myapp.com", # sent as HTTP-Referer header
),
}
)
Completions
Basic
from vox import VoxClient, Message
client = VoxClient(openai_api_key="sk-...")
response = client.complete(
messages=[
Message(role="system", content="You are a helpful assistant."),
Message(role="user", content="Explain quantum entanglement."),
],
model="gpt-4o",
max_tokens=500,
temperature=0.7,
)
print(response.message.text)
print(f"Tokens: {response.usage.total_tokens}")
Async
response = await client.acomplete(
messages=[Message(role="user", content="Hello")],
model="claude-sonnet-4-20250514",
)
Streaming
for chunk in client.stream(
messages=[Message(role="user", content="Write a haiku about Python.")],
model="gpt-4o",
):
if chunk.type == "text":
print(chunk.text, end="", flush=True)
elif chunk.type == "usage":
print(f"\nTokens: {chunk.usage.total_tokens}")
elif chunk.type == "done":
print(f"\nFinish reason: {chunk.finish_reason}")
Async Streaming
async for chunk in client.astream(messages=messages, model="gemini-2.5-pro"):
if chunk.type == "text":
print(chunk.text, end="")
Stream Chunk Types
chunk.type |
Fields | Description |
|---|---|---|
"text" |
text |
Content delta |
"tool_call_start" |
tool_call |
New tool call (id, name, arguments) |
"tool_call_delta" |
tool_call_id, arguments_delta |
Partial JSON for tool arguments |
"thinking" |
thinking_text |
Reasoning/thinking delta |
"usage" |
usage |
Final token counts |
"done" |
finish_reason |
Generation complete |
Tool Use (Function Calling)
Define tools, let the model call them, feed results back:
from vox import VoxClient, Message, Tool, ToolResult
client = VoxClient(openai_api_key="sk-...")
# 1. Define tools
tools = [
Tool(
name="get_weather",
description="Get current weather for a city.",
parameters={
"type": "object",
"properties": {
"city": {"type": "string", "description": "City name"},
},
"required": ["city"],
},
),
]
# 2. Send messages with tools
messages = [Message(role="user", content="What's the weather in Tokyo?")]
response = client.complete(messages=messages, model="gpt-4o", tools=tools)
# 3. Handle tool calls
if response.message.tool_calls:
messages.append(response.message) # add assistant's tool call message
for tc in response.message.tool_calls:
# Execute the function (your code)
result = get_weather(tc.arguments["city"])
# Return result to the model
tool_result = ToolResult(
tool_call_id=tc.id,
name=tc.name,
content=result,
)
messages.append(tool_result.to_message())
# 4. Get final response
final = client.complete(messages=messages, model="gpt-4o", tools=tools)
print(final.message.text)
This works identically across OpenAI, Anthropic, Gemini, and OpenRouter — vox translates the tool definitions and results to each provider's native format.
Provider-native (server-side) tools
Some providers offer server-side tools that run on their infrastructure — Anthropic's web_search_20250305, OpenAI's web_search_preview, Gemini's Google Search grounding, and others. These have provider-specific shapes and no cross-provider abstraction, so vox does not model them as a Tool. Instead, the tools list accepts raw dicts alongside vox Tool objects — raw dicts are passed through to the provider verbatim:
response = client.complete(
messages=[Message(role="user", content="What's the current 10Y JGB yield?")],
model="claude-sonnet-4-5-20250929",
tools=[
my_function_tool, # vox Tool — translated to the provider's format
{ # raw dict — passed through verbatim
"type": "web_search_20250305",
"name": "web_search",
"max_uses": 5,
},
],
)
The caller is responsible for matching the resolved provider's expected schema — a raw dict shaped for one provider won't work on another. An entry that is neither a Tool nor a dict raises a TypeError.
Structured Output
Pass a Pydantic model to get validated, typed responses:
from pydantic import BaseModel
from vox import VoxClient, Message
class MovieReview(BaseModel):
title: str
rating: float
summary: str
pros: list[str]
cons: list[str]
client = VoxClient(openai_api_key="sk-...")
response = client.complete(
messages=[Message(role="user", content="Review the movie Inception.")],
model="gpt-4o",
response_schema=MovieReview,
)
review: MovieReview = response.parsed
print(f"{review.title}: {review.rating}/10")
print(f"Pros: {', '.join(review.pros)}")
The schema is automatically converted to each provider's native format:
- OpenAI: JSON schema in response_format
- Anthropic: Synthetic tool with forced invocation
- Gemini: response_schema parameter
- OpenRouter/LM Studio: JSON schema in response_format
Reasoning / Thinking
Enable extended reasoning for models that support it:
from vox import VoxClient, Message, ReasoningConfig
client = VoxClient(anthropic_api_key="sk-ant-...")
response = client.complete(
messages=[Message(role="user", content="Prove that sqrt(2) is irrational.")],
model="claude-sonnet-4-20250514",
reasoning=ReasoningConfig(enabled=True, budget_tokens=10000),
)
# Access thinking blocks
if response.thinking:
for block in response.thinking:
print(f"[Thinking] {block.text[:200]}...")
print(response.message.text)
Configuration by Provider
| Provider | Config | Description |
|---|---|---|
| Anthropic | budget_tokens |
Token budget for extended thinking |
| OpenAI (o-series) | level ("low"/"medium"/"high") |
Reasoning effort level |
| Gemini 2.5 | budget_tokens |
Thinking token budget |
| Gemini 3+ | level ("low"/"medium"/"high") |
Thinking level |
Multimodal (Vision)
Send images alongside text:
from vox import Message, TextContent, ImageContent
message = Message(
role="user",
content=[
TextContent(text="What's in this image?"),
ImageContent(
source_type="url",
media_type="image/jpeg",
data="https://example.com/photo.jpg",
),
],
)
response = client.complete(messages=[message], model="gpt-4o")
For base64 images:
import base64
with open("photo.png", "rb") as f:
b64 = base64.b64encode(f.read()).decode()
message = Message(
role="user",
content=[
TextContent(text="Describe this image."),
ImageContent(source_type="base64", media_type="image/png", data=b64),
],
)
Error Handling
All provider errors are normalized to a consistent hierarchy:
from vox.errors import (
VoxError, # base class
AuthenticationError, # invalid/missing API key
RateLimitError, # rate limited (has .retry_after)
QuotaExceededError, # billing/quota limit
InvalidRequestError, # malformed request
ProviderError, # server error (5xx)
ContentFilterError, # safety system blocked content
ModelNotFoundError, # model doesn't exist
)
try:
response = client.complete(messages=messages, model="gpt-4o")
except RateLimitError as e:
print(f"Rate limited by {e.provider}, retry after {e.retry_after}s")
except AuthenticationError as e:
print(f"Auth failed for {e.provider}: {e}")
except VoxError as e:
print(f"LLM error: {e}")
API Reference
VoxClient
VoxClient(
openai_api_key: str | None = None,
anthropic_api_key: str | None = None,
gemini_api_key: str | None = None,
openrouter_api_key: str | None = None,
lmstudio_base_url: str = "http://localhost:1234/v1",
openrouter_app_name: str | None = None,
openrouter_app_url: str | None = None,
provider_configs: dict[str, ProviderConfig] | None = None,
)
Methods
| Method | Signature | Returns |
|---|---|---|
complete() |
(messages, model, *, provider, max_tokens, temperature, tools, response_schema, reasoning, stop, **kwargs) |
CompletionResponse |
acomplete() |
Same as above | CompletionResponse (async) |
stream() |
Same as above | Iterator[StreamChunk] |
astream() |
Same as above | AsyncIterator[StreamChunk] |
CompletionResponse
| Field | Type | Description |
|---|---|---|
message |
Message |
Assistant's response message |
usage |
Usage |
Token counts |
provider |
str |
Provider name |
model |
str |
Model used |
finish_reason |
str | None |
Why generation stopped |
thinking |
list[ThinkingBlock] | None |
Reasoning blocks |
parsed |
Any |
Validated Pydantic instance (when response_schema used) |
Message
| Field | Type | Description |
|---|---|---|
role |
"system" | "user" | "assistant" | "tool" |
Message role |
content |
str | list[ContentPart] |
Text or multimodal content |
tool_calls |
list[ToolCallData] | None |
Tool calls (assistant messages) |
tool_call_id |
str | None |
Tool result reference |
name |
str | None |
Tool name (for tool messages) |
Property: .text — extracts plain text from any content format.
Tool
Tool(
name: str, # Function name
description: str, # What the function does
parameters: dict, # JSON Schema for arguments
)
ToolResult
ToolResult(
tool_call_id: str, # ID from ToolCallData
name: str, # Tool name
content: str, # Result content
is_error: bool = False, # Whether execution failed
)
Method: .to_message() — converts to a Message with role="tool".
Usage
| Field | Type | Description |
|---|---|---|
prompt_tokens |
int |
Input tokens |
completion_tokens |
int |
Output tokens |
total_tokens |
int |
Total tokens |
reasoning_tokens |
int |
Reasoning/thinking tokens |
cache_read_tokens |
int |
Prompt cache hits |
cache_creation_tokens |
int |
Prompt cache writes |
ProviderConfig
ProviderConfig(
api_key: str | None = None,
base_url: str | None = None,
default_model: str | None = None,
app_name: str | None = None, # OpenRouter: X-Title header
app_url: str | None = None, # OpenRouter: HTTP-Referer header
timeout: float = 120.0,
max_retries: int = 2,
)
ReasoningConfig
ReasoningConfig(
enabled: bool = True,
budget_tokens: int | None = None, # Anthropic, Gemini 2.5
level: str | None = None, # "low" | "medium" | "high" — OpenAI o-series, Gemini 3+
)
LM Studio (Local Models)
Run models locally with LM Studio:
client = VoxClient(lmstudio_base_url="http://localhost:1234/v1")
response = client.complete(
messages=[Message(role="user", content="Hello!")],
model="local-model",
provider="lmstudio",
)
Make sure LM Studio is running with a model loaded. The default base URL is http://localhost:1234/v1.
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file vox_llm-0.3.0.tar.gz.
File metadata
- Download URL: vox_llm-0.3.0.tar.gz
- Upload date:
- Size: 49.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d66234ffc418686e46260261fc779831868358ff3a6a60fd2b44ee24f1c43a14
|
|
| MD5 |
93d6520eebdd768a606dec405c424571
|
|
| BLAKE2b-256 |
9ae0bf3cc38d6ebf25f2c2f8787cf98116a8f9ef39b61b9ac22b70c04b3227b0
|
File details
Details for the file vox_llm-0.3.0-py3-none-any.whl.
File metadata
- Download URL: vox_llm-0.3.0-py3-none-any.whl
- Upload date:
- Size: 56.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cc631ca0102a37e03763edb3b9dac0bd8d54aa12821f51344033efb2753c53de
|
|
| MD5 |
a4b51de1e894ea50a95ceba7379d7165
|
|
| BLAKE2b-256 |
7dfc00cb38d504640832408eb0acd87a234d2a4b1aaed2efa132778323515565
|