Skip to main content

Unified multi-LLM provider abstraction supporting OpenAI, Anthropic, Google, Groq, xAI and more

Project description

quartermaster-providers

Unified multi-LLM provider abstraction for Python. Write once, run against OpenAI, Anthropic, Google, Groq, xAI, or any OpenAI-compatible endpoint.

PyPI version Python 3.11+ License: Apache 2.0

Features

  • 6 Providers: OpenAI, Anthropic, Google, Groq, xAI, plus a generic OpenAI-compatible adapter
  • Streaming: Async generators for token-by-token responses
  • Tool Calling: Unified interface for function/tool invocation across all providers
  • Structured Output: JSON-schema-constrained response generation
  • Extended Thinking: Claude and o-series reasoning chains
  • Vision: Image understanding on supported models
  • Transcription: Audio-to-text via OpenAI Whisper
  • Token Counting: Estimate tokens and cost before making requests
  • Provider Registry: Register providers once, resolve by name or model pattern
  • Testing Utilities: MockProvider and InMemoryHistory for unit tests
  • Type-Safe: Dataclass responses with full type hints

New in v0.4.0

  • CircuitBreaker -- CircuitBreaker(failure_threshold=, recovery_timeout=) wraps any provider; raises CircuitOpenError when the failure threshold is reached and recovers after the timeout.

Installation

pip install quartermaster-providers

Install with provider-specific extras:

pip install quartermaster-providers[openai]
pip install quartermaster-providers[anthropic]
pip install quartermaster-providers[openai,anthropic,google]
pip install quartermaster-providers[all]

Supported Providers

Provider Class Models (examples)
OpenAI OpenAIProvider gpt-4o, gpt-4-turbo, o1, o3-mini
Anthropic AnthropicProvider claude-sonnet-4-20250514, claude-3-haiku
Google GoogleProvider gemini-1.5-pro, gemini-pro
Groq GroqProvider llama-3-70b, mixtral-8x7b
xAI XAIProvider grok-2, grok-2-mini
Quartermaster QuartermasterProvider All models via one API key
Custom OpenAICompatibleProvider Any OpenAI-compatible API

Local / Self-Hosted Providers

Provider Class Description
Ollama OllamaProvider Local models via Ollama
vLLM VLLMProvider High-throughput inference server
LM Studio LMStudioProvider Desktop LLM app
TGI TGIProvider HuggingFace Text Generation Inference
LocalAI LocalAIProvider OpenAI-compatible local server
llama.cpp LlamaCppProvider llama.cpp HTTP server

Register local providers with one line — the module-level helper builds a registry, normalises base_url (auto-appends /v1 if missing), honours the OLLAMA_HOST env var, and remembers the default model so callers don't have to repeat it:

from quartermaster_providers import register_local

provider_registry = register_local(
    "ollama",
    base_url="http://localhost:11434",   # or set $OLLAMA_HOST
    default_model="gemma4:26b",
)
provider = provider_registry.get("ollama")

Quick Start

Text Generation

import asyncio
from quartermaster_providers import LLMConfig
from quartermaster_providers.providers import OpenAIProvider

async def main():
    provider = OpenAIProvider(api_key="sk-...")
    config = LLMConfig(
        model="gpt-4o",
        provider="openai",
        temperature=0.7,
        max_output_tokens=1024,
    )

    response = await provider.generate_text_response(
        prompt="Explain gradient descent in two sentences.",
        config=config,
    )
    print(response.content)  # str
    print(response.stop_reason)  # "end_turn", "max_tokens", etc.

asyncio.run(main())

Tool Calling

import asyncio
from quartermaster_providers import LLMConfig, ToolDefinition
from quartermaster_providers.providers import AnthropicProvider

async def main():
    provider = AnthropicProvider(api_key="sk-ant-...")
    config = LLMConfig(model="claude-sonnet-4-20250514", provider="anthropic")

    tools = [
        ToolDefinition(
            name="get_weather",
            description="Get current weather for a location",
            input_schema={
                "type": "object",
                "properties": {
                    "location": {"type": "string", "description": "City name"},
                },
                "required": ["location"],
            },
        ),
    ]

    response = await provider.generate_tool_parameters(
        prompt="What is the weather in Tokyo?",
        tools=tools,
        config=config,
    )

    for call in response.tool_calls:
        print(f"{call.tool_name}({call.parameters})")
        # get_weather({'location': 'Tokyo'})

    print(f"Usage: {response.usage.total_tokens} tokens")

asyncio.run(main())

Tool Calling with quartermaster-tools

Tools created with @tool() integrate directly via ToolDescriptor:

from quartermaster_tools import tool

@tool()
def get_weather(city: str) -> dict:
    """Get current weather for a city.

    Args:
        city: The city name to look up.
    """
    return {"city": city, "temperature": 22}

# Convert to provider-compatible format
tool_def = get_weather.info().to_anthropic_tools()
# Or for OpenAI:
tool_def = get_weather.info().to_openai_tools()

Streaming

import asyncio
from quartermaster_providers import LLMConfig
from quartermaster_providers.providers import OpenAIProvider

async def main():
    provider = OpenAIProvider(api_key="sk-...")
    config = LLMConfig(model="gpt-4o", provider="openai", stream=True)

    async for chunk in await provider.generate_text_response(
        prompt="Write a haiku about Python.",
        config=config,
    ):
        print(chunk.content, end="", flush=True)

asyncio.run(main())

Structured Output

import asyncio
from quartermaster_providers import LLMConfig
from quartermaster_providers.providers import OpenAIProvider

async def main():
    provider = OpenAIProvider(api_key="sk-...")
    config = LLMConfig(model="gpt-4o", provider="openai")

    schema = {
        "type": "object",
        "properties": {
            "title": {"type": "string"},
            "summary": {"type": "string"},
            "topics": {"type": "array", "items": {"type": "string"}},
        },
        "required": ["title", "summary", "topics"],
    }

    response = await provider.generate_structured_response(
        prompt="Analyze the concept of reinforcement learning.",
        response_schema=schema,
        config=config,
    )

    print(response.structured_output["title"])
    print(response.structured_output["topics"])

asyncio.run(main())

API Reference

LLMConfig

Controls request behavior across all providers.

from quartermaster_providers import LLMConfig

config = LLMConfig(
    model="gpt-4o",             # Provider model identifier
    provider="openai",          # Provider name
    stream=False,               # Stream token-by-token
    temperature=0.7,            # 0.0 (deterministic) to 2.0 (creative)
    system_message=None,        # System prompt
    max_input_tokens=None,      # Input token limit
    max_output_tokens=None,     # Output token limit
    max_messages=None,          # Conversation context limit
    vision=False,               # Enable image understanding
    thinking_enabled=False,     # Extended thinking (Claude, o-series)
    thinking_budget=None,       # Max thinking tokens
    top_p=None,                 # Nucleus sampling
    top_k=None,                 # Top-k sampling
    frequency_penalty=None,     # Frequency penalty (OpenAI)
    presence_penalty=None,      # Presence penalty (OpenAI)
)

AbstractLLMProvider Methods

Every provider implements these methods:

Method Returns Description
await list_models() list[str] Available model identifiers
estimate_token_count(text, model) int Token estimate without API call
prepare_tool(tool) Any Convert ToolDefinition to provider format
await generate_text_response(prompt, config) TokenResponse or AsyncIterator[TokenResponse] Text generation (streaming when config.stream=True)
await generate_tool_parameters(prompt, tools, config) ToolCallResponse Function/tool calling
await generate_native_response(prompt, tools, config) NativeResponse Text + thinking + tool calls combined
await generate_structured_response(prompt, schema, config) StructuredResponse JSON-schema-constrained output
await transcribe(audio_path) str Audio-to-text transcription

Cost estimation (non-abstract, returns None if pricing unavailable):

Method Returns
get_cost_per_1k_input_tokens(model) float | None
get_cost_per_1k_output_tokens(model) float | None
estimate_cost(text, model, output_tokens) float | None

Response Types

TokenResponse -- single response or streaming chunk:

response.content      # str -- text content
response.stop_reason  # str | None -- "end_turn", "max_tokens", "tool_use"

ToolCallResponse -- tool invocation results:

response.text_content  # str -- any text alongside tool calls
response.tool_calls    # list[ToolCall] -- each has .tool_name, .tool_id, .parameters
response.stop_reason   # str | None
response.usage         # TokenUsage | None

StructuredResponse -- JSON-schema-constrained output:

response.structured_output  # dict[str, Any] -- parsed JSON
response.raw_output         # str -- raw model text
response.usage              # TokenUsage | None

NativeResponse -- complete model output:

response.text_content  # str
response.thinking      # list[ThinkingResponse] -- reasoning blocks
response.tool_calls    # list[ToolCall]
response.usage         # TokenUsage | None

TokenUsage -- token accounting:

usage.input_tokens                  # int
usage.output_tokens                 # int
usage.cache_creation_input_tokens   # int (Anthropic prompt caching)
usage.cache_read_input_tokens       # int
usage.total_tokens                  # property: input + output

ProviderRegistry

Register providers once and resolve by name or model pattern:

from quartermaster_providers import ProviderRegistry
from quartermaster_providers.providers import OpenAIProvider, AnthropicProvider

registry = ProviderRegistry()
registry.register("openai", OpenAIProvider, api_key="sk-...")
registry.register("anthropic", AnthropicProvider, api_key="sk-ant-...")

# Get by name
provider = registry.get("openai")

# Auto-resolve from model name (gpt-* -> openai, claude-* -> anthropic, etc.)
provider = registry.get_for_model("gpt-4o")
provider = registry.get_for_model("claude-sonnet-4-20250514")

# List registered providers
registry.list_providers()  # ["anthropic", "openai"]

Model-to-provider inference patterns: gpt-*/o1-*/o3-* -> openai, claude-* -> anthropic, gemini-* -> google, llama-*/mixtral-* -> groq, grok-* -> xai.

Token Counting and Cost Estimation

from quartermaster_providers.providers import OpenAIProvider

provider = OpenAIProvider(api_key="sk-...")

tokens = provider.estimate_token_count("Hello, world!", "gpt-4o")
print(f"Estimated tokens: {tokens}")

cost = provider.estimate_cost("Hello, world!", "gpt-4o", output_tokens=100)
if cost is not None:
    print(f"Estimated cost: ${cost:.6f}")

Error Handling

All providers raise consistent exceptions from quartermaster_providers.exceptions:

from quartermaster_providers.exceptions import (
    ProviderError,          # Base exception (has .provider, .status_code)
    AuthenticationError,    # Invalid/missing API key (401)
    RateLimitError,         # Rate limited (429, has .retry_after)
    InvalidModelError,      # Model not available (404, has .model)
    InvalidRequestError,    # Malformed request (400)
    ContentFilterError,     # Blocked by safety filter (400)
    ContextLengthError,     # Input exceeds context window (400)
    ServiceUnavailableError,  # Provider temporarily down (503)
)

Testing

Use MockProvider for unit tests without real API calls:

from quartermaster_providers import LLMConfig, TokenResponse
from quartermaster_providers.testing import MockProvider

mock = MockProvider(responses=[
    TokenResponse(content="Paris", stop_reason="end_turn"),
    TokenResponse(content="Berlin", stop_reason="end_turn"),
])

config = LLMConfig(model="mock", provider="mock")

response = await mock.generate_text_response("Capital of France?", config)
assert response.content == "Paris"
assert mock.call_count == 1
assert mock.last_prompt == "Capital of France?"

# InMemoryHistory for conversation testing
from quartermaster_providers.testing import InMemoryHistory

history = InMemoryHistory()
history.add_message("user", "Hello")
history.add_message("assistant", "Hi there!")
assert len(history) == 2

Contributing

Contributions welcome. See CONTRIBUTING.md for guidelines.

License

Apache License 2.0. See LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

quartermaster_providers-0.8.2.tar.gz (109.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

quartermaster_providers-0.8.2-py3-none-any.whl (62.3 kB view details)

Uploaded Python 3

File details

Details for the file quartermaster_providers-0.8.2.tar.gz.

File metadata

  • Download URL: quartermaster_providers-0.8.2.tar.gz
  • Upload date:
  • Size: 109.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for quartermaster_providers-0.8.2.tar.gz
Algorithm Hash digest
SHA256 0503a1ecc20cbdd551367384a98858ea8e1063aa461062bb4c5ea1e2d53aedb8
MD5 311208a1f86aac408406eef8b6d5b181
BLAKE2b-256 9d8344c6d67fb02d2b7e905c2b8fd70a0c02e17591729634973a99378ecee395

See more details on using hashes here.

File details

Details for the file quartermaster_providers-0.8.2-py3-none-any.whl.

File metadata

File hashes

Hashes for quartermaster_providers-0.8.2-py3-none-any.whl
Algorithm Hash digest
SHA256 e2e416749c3b3a33164b5137ee3c4ac02e6ba3db4ab3629a925ef922549660af
MD5 3f138fd46206b871af8c6daca5821655
BLAKE2b-256 8c6ddacdf1fd6c94c8eb85e00b28abd90a8b42f55b8c7a5652d87b73f1a61aaa

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page