Skip to main content

Model-agnostic LLM execution library

Project description

vox

Model-agnostic LLM execution library for Python. One interface, every provider.

Write your code once and run it against OpenAI, Anthropic, Google Gemini, OpenRouter, or local models via LM Studio — with streaming, tool use, structured output, and reasoning support out of the box.

Installation

# Core library (no provider SDKs)
pip install vox-llm

# With a specific provider
pip install "vox-llm[openai]"
pip install "vox-llm[anthropic]"
pip install "vox-llm[gemini]"

# All providers
pip install "vox-llm[all]"

Note: the PyPI package is vox-llm (the name vox was already taken). The Python import name is still voxfrom vox import VoxClient works unchanged.

From GitHub (pinned to a tag):

pip install "vox-llm[all] @ git+https://github.com/benballintyn/vox.git@v0.1.0"

Requires Python 3.11+.

Quick Start

from vox import VoxClient, Message

client = VoxClient(openai_api_key="sk-...")

response = client.complete(
    messages=[Message(role="user", content="What is the speed of light?")],
    model="gpt-4o",
)
print(response.message.text)

Switch providers by changing the model name — no other code changes needed:

# OpenAI
response = client.complete(messages, model="gpt-4o")

# Anthropic
response = client.complete(messages, model="claude-sonnet-4-20250514")

# Gemini
response = client.complete(messages, model="gemini-2.5-pro")

Provider Setup

Pass API keys directly or via environment variables:

client = VoxClient(
    openai_api_key="sk-...",           # or OPENAI_API_KEY env var
    anthropic_api_key="sk-ant-...",    # or ANTHROPIC_API_KEY env var
    gemini_api_key="...",              # or GEMINI_API_KEY env var
    openrouter_api_key="sk-or-...",    # or OPENROUTER_API_KEY env var
    lmstudio_base_url="http://localhost:1234/v1",  # default
)

Provider Auto-Detection

Vox resolves the provider from the model name automatically:

Model prefix Provider
gpt-, o1, o3, o4 OpenAI
claude- Anthropic
gemini- Gemini

For OpenRouter and LM Studio, pass provider= explicitly:

response = client.complete(
    messages=messages,
    model="meta-llama/llama-3-70b",
    provider="openrouter",
)

Per-Provider Configuration

Override defaults with ProviderConfig:

from vox import VoxClient, ProviderConfig

client = VoxClient(
    provider_configs={
        "openai": ProviderConfig(
            api_key="sk-...",
            timeout=60.0,
            max_retries=3,
        ),
        "openrouter": ProviderConfig(
            api_key="sk-or-...",
            app_name="MyApp",           # sent as X-Title header
            app_url="https://myapp.com", # sent as HTTP-Referer header
        ),
    }
)

Completions

Basic

from vox import VoxClient, Message

client = VoxClient(openai_api_key="sk-...")

response = client.complete(
    messages=[
        Message(role="system", content="You are a helpful assistant."),
        Message(role="user", content="Explain quantum entanglement."),
    ],
    model="gpt-4o",
    max_tokens=500,
    temperature=0.7,
)

print(response.message.text)
print(f"Tokens: {response.usage.total_tokens}")

Async

response = await client.acomplete(
    messages=[Message(role="user", content="Hello")],
    model="claude-sonnet-4-20250514",
)

Streaming

for chunk in client.stream(
    messages=[Message(role="user", content="Write a haiku about Python.")],
    model="gpt-4o",
):
    if chunk.type == "text":
        print(chunk.text, end="", flush=True)
    elif chunk.type == "usage":
        print(f"\nTokens: {chunk.usage.total_tokens}")
    elif chunk.type == "done":
        print(f"\nFinish reason: {chunk.finish_reason}")

Async Streaming

async for chunk in client.astream(messages=messages, model="gemini-2.5-pro"):
    if chunk.type == "text":
        print(chunk.text, end="")

Stream Chunk Types

chunk.type Fields Description
"text" text Content delta
"tool_call_start" tool_call New tool call (id, name, arguments)
"tool_call_delta" tool_call_id, arguments_delta Partial JSON for tool arguments
"thinking" thinking_text Reasoning/thinking delta
"usage" usage Final token counts
"done" finish_reason Generation complete

Tool Use (Function Calling)

Define tools, let the model call them, feed results back:

from vox import VoxClient, Message, Tool, ToolResult

client = VoxClient(openai_api_key="sk-...")

# 1. Define tools
tools = [
    Tool(
        name="get_weather",
        description="Get current weather for a city.",
        parameters={
            "type": "object",
            "properties": {
                "city": {"type": "string", "description": "City name"},
            },
            "required": ["city"],
        },
    ),
]

# 2. Send messages with tools
messages = [Message(role="user", content="What's the weather in Tokyo?")]
response = client.complete(messages=messages, model="gpt-4o", tools=tools)

# 3. Handle tool calls
if response.message.tool_calls:
    messages.append(response.message)  # add assistant's tool call message

    for tc in response.message.tool_calls:
        # Execute the function (your code)
        result = get_weather(tc.arguments["city"])

        # Return result to the model
        tool_result = ToolResult(
            tool_call_id=tc.id,
            name=tc.name,
            content=result,
        )
        messages.append(tool_result.to_message())

    # 4. Get final response
    final = client.complete(messages=messages, model="gpt-4o", tools=tools)
    print(final.message.text)

This works identically across OpenAI, Anthropic, Gemini, and OpenRouter — vox translates the tool definitions and results to each provider's native format.

Provider-native (server-side) tools

Some providers offer server-side tools that run on their infrastructure — Anthropic's web_search_20250305, OpenAI's web_search_preview, Gemini's Google Search grounding, and others. These have provider-specific shapes and no cross-provider abstraction, so vox does not model them as a Tool. Instead, the tools list accepts raw dicts alongside vox Tool objects — raw dicts are passed through to the provider verbatim:

response = client.complete(
    messages=[Message(role="user", content="What's the current 10Y JGB yield?")],
    model="claude-sonnet-4-5-20250929",
    tools=[
        my_function_tool,  # vox Tool — translated to the provider's format
        {                  # raw dict — passed through verbatim
            "type": "web_search_20250305",
            "name": "web_search",
            "max_uses": 5,
        },
    ],
)

The caller is responsible for matching the resolved provider's expected schema — a raw dict shaped for one provider won't work on another. An entry that is neither a Tool nor a dict raises a TypeError.

Structured Output

Pass a Pydantic model to get validated, typed responses:

from pydantic import BaseModel
from vox import VoxClient, Message

class MovieReview(BaseModel):
    title: str
    rating: float
    summary: str
    pros: list[str]
    cons: list[str]

client = VoxClient(openai_api_key="sk-...")

response = client.complete(
    messages=[Message(role="user", content="Review the movie Inception.")],
    model="gpt-4o",
    response_schema=MovieReview,
)

review: MovieReview = response.parsed
print(f"{review.title}: {review.rating}/10")
print(f"Pros: {', '.join(review.pros)}")

The schema is automatically converted to each provider's native format:

  • OpenAI: JSON schema in response_format
  • Anthropic: Synthetic tool with forced invocation
  • Gemini: response_schema parameter
  • OpenRouter/LM Studio: JSON schema in response_format

Reasoning / Thinking

Enable extended reasoning for models that support it:

from vox import VoxClient, Message, ReasoningConfig

client = VoxClient(anthropic_api_key="sk-ant-...")

response = client.complete(
    messages=[Message(role="user", content="Prove that sqrt(2) is irrational.")],
    model="claude-sonnet-4-20250514",
    reasoning=ReasoningConfig(enabled=True, budget_tokens=10000),
)

# Access thinking blocks
if response.thinking:
    for block in response.thinking:
        print(f"[Thinking] {block.text[:200]}...")

print(response.message.text)

Configuration by Provider

Provider Config Description
Anthropic budget_tokens Token budget for extended thinking
OpenAI (o-series) level ("low"/"medium"/"high") Reasoning effort level
Gemini 2.5 budget_tokens Thinking token budget
Gemini 3+ level ("low"/"medium"/"high") Thinking level

Multimodal (Vision)

Send images alongside text:

from vox import Message, TextContent, ImageContent

message = Message(
    role="user",
    content=[
        TextContent(text="What's in this image?"),
        ImageContent(
            source_type="url",
            media_type="image/jpeg",
            data="https://example.com/photo.jpg",
        ),
    ],
)

response = client.complete(messages=[message], model="gpt-4o")

For base64 images:

import base64

with open("photo.png", "rb") as f:
    b64 = base64.b64encode(f.read()).decode()

message = Message(
    role="user",
    content=[
        TextContent(text="Describe this image."),
        ImageContent(source_type="base64", media_type="image/png", data=b64),
    ],
)

Error Handling

All provider errors are normalized to a consistent hierarchy:

from vox.errors import (
    VoxError,              # base class
    AuthenticationError,   # invalid/missing API key
    RateLimitError,        # rate limited (has .retry_after)
    QuotaExceededError,    # billing/quota limit
    InvalidRequestError,   # malformed request
    ProviderError,         # server error (5xx)
    ContentFilterError,    # safety system blocked content
    ModelNotFoundError,    # model doesn't exist
)

try:
    response = client.complete(messages=messages, model="gpt-4o")
except RateLimitError as e:
    print(f"Rate limited by {e.provider}, retry after {e.retry_after}s")
except AuthenticationError as e:
    print(f"Auth failed for {e.provider}: {e}")
except VoxError as e:
    print(f"LLM error: {e}")

API Reference

VoxClient

VoxClient(
    openai_api_key: str | None = None,
    anthropic_api_key: str | None = None,
    gemini_api_key: str | None = None,
    openrouter_api_key: str | None = None,
    lmstudio_base_url: str = "http://localhost:1234/v1",
    openrouter_app_name: str | None = None,
    openrouter_app_url: str | None = None,
    provider_configs: dict[str, ProviderConfig] | None = None,
)

Methods

Method Signature Returns
complete() (messages, model, *, provider, max_tokens, temperature, tools, response_schema, reasoning, stop, **kwargs) CompletionResponse
acomplete() Same as above CompletionResponse (async)
stream() Same as above Iterator[StreamChunk]
astream() Same as above AsyncIterator[StreamChunk]

CompletionResponse

Field Type Description
message Message Assistant's response message
usage Usage Token counts
provider str Provider name
model str Model used
finish_reason str | None Why generation stopped
thinking list[ThinkingBlock] | None Reasoning blocks
parsed Any Validated Pydantic instance (when response_schema used)

Message

Field Type Description
role "system" | "user" | "assistant" | "tool" Message role
content str | list[ContentPart] Text or multimodal content
tool_calls list[ToolCallData] | None Tool calls (assistant messages)
tool_call_id str | None Tool result reference
name str | None Tool name (for tool messages)

Property: .text — extracts plain text from any content format.

Tool

Tool(
    name: str,              # Function name
    description: str,       # What the function does
    parameters: dict,       # JSON Schema for arguments
)

ToolResult

ToolResult(
    tool_call_id: str,      # ID from ToolCallData
    name: str,              # Tool name
    content: str,           # Result content
    is_error: bool = False, # Whether execution failed
)

Method: .to_message() — converts to a Message with role="tool".

Usage

Field Type Description
prompt_tokens int Input tokens
completion_tokens int Output tokens
total_tokens int Total tokens
reasoning_tokens int Reasoning/thinking tokens
cache_read_tokens int Prompt cache hits
cache_creation_tokens int Prompt cache writes

ProviderConfig

ProviderConfig(
    api_key: str | None = None,
    base_url: str | None = None,
    default_model: str | None = None,
    app_name: str | None = None,     # OpenRouter: X-Title header
    app_url: str | None = None,      # OpenRouter: HTTP-Referer header
    timeout: float = 120.0,
    max_retries: int = 2,
)

ReasoningConfig

ReasoningConfig(
    enabled: bool = True,
    budget_tokens: int | None = None,   # Anthropic, Gemini 2.5
    level: str | None = None,           # "low" | "medium" | "high" — OpenAI o-series, Gemini 3+
)

LM Studio (Local Models)

Run models locally with LM Studio:

client = VoxClient(lmstudio_base_url="http://localhost:1234/v1")

response = client.complete(
    messages=[Message(role="user", content="Hello!")],
    model="local-model",
    provider="lmstudio",
)

Make sure LM Studio is running with a model loaded. The default base URL is http://localhost:1234/v1.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vox_llm-0.3.0.tar.gz (49.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vox_llm-0.3.0-py3-none-any.whl (56.5 kB view details)

Uploaded Python 3

File details

Details for the file vox_llm-0.3.0.tar.gz.

File metadata

  • Download URL: vox_llm-0.3.0.tar.gz
  • Upload date:
  • Size: 49.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for vox_llm-0.3.0.tar.gz
Algorithm Hash digest
SHA256 d66234ffc418686e46260261fc779831868358ff3a6a60fd2b44ee24f1c43a14
MD5 93d6520eebdd768a606dec405c424571
BLAKE2b-256 9ae0bf3cc38d6ebf25f2c2f8787cf98116a8f9ef39b61b9ac22b70c04b3227b0

See more details on using hashes here.

File details

Details for the file vox_llm-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: vox_llm-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 56.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for vox_llm-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 cc631ca0102a37e03763edb3b9dac0bd8d54aa12821f51344033efb2753c53de
MD5 a4b51de1e894ea50a95ceba7379d7165
BLAKE2b-256 7dfc00cb38d504640832408eb0acd87a234d2a4b1aaed2efa132778323515565

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page