Skip to main content

Unified multi-LLM provider abstraction supporting OpenAI, Anthropic, Google, Groq, xAI and more

Project description

quartermaster-providers

Unified multi-LLM provider abstraction for Python. Write once, run against OpenAI, Anthropic, Google, Groq, xAI, or any OpenAI-compatible endpoint.

PyPI version Python 3.11+ License: Apache 2.0

Features

  • 6 Providers: OpenAI, Anthropic, Google, Groq, xAI, plus a generic OpenAI-compatible adapter
  • Streaming: Async generators for token-by-token responses
  • Tool Calling: Unified interface for function/tool invocation across all providers
  • Structured Output: JSON-schema-constrained response generation
  • Extended Thinking: Claude and o-series reasoning chains
  • Vision: Image understanding on supported models
  • Transcription: Audio-to-text via OpenAI Whisper
  • Token Counting: Estimate tokens and cost before making requests
  • Provider Registry: Register providers once, resolve by name or model pattern
  • Testing Utilities: MockProvider and InMemoryHistory for unit tests
  • Type-Safe: Dataclass responses with full type hints

New in v0.4.0

  • Native Ollama /api/chat for tool calls -- auto-detected when the provider is "ollama", eliminating tool-name hallucinations on Gemma models. Override with ollama_tool_protocol= in qm.configure().
  • CircuitBreaker -- CircuitBreaker(failure_threshold=, recovery_timeout=) wraps any provider; raises CircuitOpenError when the failure threshold is reached and recovers after the timeout.

Installation

pip install quartermaster-providers

Install with provider-specific extras:

pip install quartermaster-providers[openai]
pip install quartermaster-providers[anthropic]
pip install quartermaster-providers[openai,anthropic,google]
pip install quartermaster-providers[all]

Supported Providers

Provider Class Models (examples)
OpenAI OpenAIProvider gpt-4o, gpt-4-turbo, o1, o3-mini
Anthropic AnthropicProvider claude-sonnet-4-20250514, claude-3-haiku
Google GoogleProvider gemini-1.5-pro, gemini-pro
Groq GroqProvider llama-3-70b, mixtral-8x7b
xAI XAIProvider grok-2, grok-2-mini
Quartermaster QuartermasterProvider All models via one API key
Custom OpenAICompatibleProvider Any OpenAI-compatible API

Local / Self-Hosted Providers

Provider Class Description
Ollama OllamaProvider Local models via Ollama
vLLM VLLMProvider High-throughput inference server
LM Studio LMStudioProvider Desktop LLM app
TGI TGIProvider HuggingFace Text Generation Inference
LocalAI LocalAIProvider OpenAI-compatible local server
llama.cpp LlamaCppProvider llama.cpp HTTP server

Register local providers with one line — the module-level helper builds a registry, normalises base_url (auto-appends /v1 if missing), honours the OLLAMA_HOST env var, and remembers the default model so callers don't have to repeat it:

from quartermaster_providers import register_local

provider_registry = register_local(
    "ollama",
    base_url="http://localhost:11434",   # or set $OLLAMA_HOST
    default_model="gemma4:26b",
)
provider = provider_registry.get("ollama")

Sync OllamaProvider.chat() shim

For one-shot calls from sync code (Celery workers, Django views, CLI scripts) the OllamaProvider exposes a synchronous native /api/chat shim — no asgiref.async_to_sync wrapper required, and thinking / reasoning text is auto-promoted into content so reasoning models like gemma4:26b never return an empty result on short prompts:

from quartermaster_providers.providers.local import OllamaProvider

provider = OllamaProvider(default_model="gemma4:26b")  # honours $OLLAMA_HOST
result = provider.chat(
    messages=[
        {"role": "system", "content": "Respond in Slovenian."},
        {"role": "user", "content": "Pozdravljen!"},
    ],
    max_output_tokens=128,    # honoured — capped at Ollama's `num_predict`
    thinking_level="off",     # off / low / medium / high
)
print(result.content)         # str — promoted from `reasoning` if `content` empty
print(result.tool_calls)      # list[ToolCall]
print(result.usage)           # {prompt_tokens, completion_tokens, total_tokens}

Connection errors raise ServiceUnavailableError; HTTP errors raise ProviderError with status_code attached. Neither swallows into a soft "no answer" result the way the OpenAI-compat path used to.

Quick Start

Text Generation

import asyncio
from quartermaster_providers import LLMConfig
from quartermaster_providers.providers import OpenAIProvider

async def main():
    provider = OpenAIProvider(api_key="sk-...")
    config = LLMConfig(
        model="gpt-4o",
        provider="openai",
        temperature=0.7,
        max_output_tokens=1024,
    )

    response = await provider.generate_text_response(
        prompt="Explain gradient descent in two sentences.",
        config=config,
    )
    print(response.content)  # str
    print(response.stop_reason)  # "end_turn", "max_tokens", etc.

asyncio.run(main())

Tool Calling

import asyncio
from quartermaster_providers import LLMConfig, ToolDefinition
from quartermaster_providers.providers import AnthropicProvider

async def main():
    provider = AnthropicProvider(api_key="sk-ant-...")
    config = LLMConfig(model="claude-sonnet-4-20250514", provider="anthropic")

    tools = [
        ToolDefinition(
            name="get_weather",
            description="Get current weather for a location",
            input_schema={
                "type": "object",
                "properties": {
                    "location": {"type": "string", "description": "City name"},
                },
                "required": ["location"],
            },
        ),
    ]

    response = await provider.generate_tool_parameters(
        prompt="What is the weather in Tokyo?",
        tools=tools,
        config=config,
    )

    for call in response.tool_calls:
        print(f"{call.tool_name}({call.parameters})")
        # get_weather({'location': 'Tokyo'})

    print(f"Usage: {response.usage.total_tokens} tokens")

asyncio.run(main())

Tool Calling with quartermaster-tools

Tools created with @tool() integrate directly via ToolDescriptor:

from quartermaster_tools import tool

@tool()
def get_weather(city: str) -> dict:
    """Get current weather for a city.

    Args:
        city: The city name to look up.
    """
    return {"city": city, "temperature": 22}

# Convert to provider-compatible format
tool_def = get_weather.info().to_anthropic_tools()
# Or for OpenAI:
tool_def = get_weather.info().to_openai_tools()

Streaming

import asyncio
from quartermaster_providers import LLMConfig
from quartermaster_providers.providers import OpenAIProvider

async def main():
    provider = OpenAIProvider(api_key="sk-...")
    config = LLMConfig(model="gpt-4o", provider="openai", stream=True)

    async for chunk in await provider.generate_text_response(
        prompt="Write a haiku about Python.",
        config=config,
    ):
        print(chunk.content, end="", flush=True)

asyncio.run(main())

Structured Output

import asyncio
from quartermaster_providers import LLMConfig
from quartermaster_providers.providers import OpenAIProvider

async def main():
    provider = OpenAIProvider(api_key="sk-...")
    config = LLMConfig(model="gpt-4o", provider="openai")

    schema = {
        "type": "object",
        "properties": {
            "title": {"type": "string"},
            "summary": {"type": "string"},
            "topics": {"type": "array", "items": {"type": "string"}},
        },
        "required": ["title", "summary", "topics"],
    }

    response = await provider.generate_structured_response(
        prompt="Analyze the concept of reinforcement learning.",
        response_schema=schema,
        config=config,
    )

    print(response.structured_output["title"])
    print(response.structured_output["topics"])

asyncio.run(main())

API Reference

LLMConfig

Controls request behavior across all providers.

from quartermaster_providers import LLMConfig

config = LLMConfig(
    model="gpt-4o",             # Provider model identifier
    provider="openai",          # Provider name
    stream=False,               # Stream token-by-token
    temperature=0.7,            # 0.0 (deterministic) to 2.0 (creative)
    system_message=None,        # System prompt
    max_input_tokens=None,      # Input token limit
    max_output_tokens=None,     # Output token limit
    max_messages=None,          # Conversation context limit
    vision=False,               # Enable image understanding
    thinking_enabled=False,     # Extended thinking (Claude, o-series)
    thinking_budget=None,       # Max thinking tokens
    top_p=None,                 # Nucleus sampling
    top_k=None,                 # Top-k sampling
    frequency_penalty=None,     # Frequency penalty (OpenAI)
    presence_penalty=None,      # Presence penalty (OpenAI)
)

AbstractLLMProvider Methods

Every provider implements these methods:

Method Returns Description
await list_models() list[str] Available model identifiers
estimate_token_count(text, model) int Token estimate without API call
prepare_tool(tool) Any Convert ToolDefinition to provider format
await generate_text_response(prompt, config) TokenResponse or AsyncIterator[TokenResponse] Text generation (streaming when config.stream=True)
await generate_tool_parameters(prompt, tools, config) ToolCallResponse Function/tool calling
await generate_native_response(prompt, tools, config) NativeResponse Text + thinking + tool calls combined
await generate_structured_response(prompt, schema, config) StructuredResponse JSON-schema-constrained output
await transcribe(audio_path) str Audio-to-text transcription

Cost estimation (non-abstract, returns None if pricing unavailable):

Method Returns
get_cost_per_1k_input_tokens(model) float | None
get_cost_per_1k_output_tokens(model) float | None
estimate_cost(text, model, output_tokens) float | None

Response Types

TokenResponse -- single response or streaming chunk:

response.content      # str -- text content
response.stop_reason  # str | None -- "end_turn", "max_tokens", "tool_use"

ToolCallResponse -- tool invocation results:

response.text_content  # str -- any text alongside tool calls
response.tool_calls    # list[ToolCall] -- each has .tool_name, .tool_id, .parameters
response.stop_reason   # str | None
response.usage         # TokenUsage | None

StructuredResponse -- JSON-schema-constrained output:

response.structured_output  # dict[str, Any] -- parsed JSON
response.raw_output         # str -- raw model text
response.usage              # TokenUsage | None

NativeResponse -- complete model output:

response.text_content  # str
response.thinking      # list[ThinkingResponse] -- reasoning blocks
response.tool_calls    # list[ToolCall]
response.usage         # TokenUsage | None

TokenUsage -- token accounting:

usage.input_tokens                  # int
usage.output_tokens                 # int
usage.cache_creation_input_tokens   # int (Anthropic prompt caching)
usage.cache_read_input_tokens       # int
usage.total_tokens                  # property: input + output

ProviderRegistry

Register providers once and resolve by name or model pattern:

from quartermaster_providers import ProviderRegistry
from quartermaster_providers.providers import OpenAIProvider, AnthropicProvider

registry = ProviderRegistry()
registry.register("openai", OpenAIProvider, api_key="sk-...")
registry.register("anthropic", AnthropicProvider, api_key="sk-ant-...")

# Get by name
provider = registry.get("openai")

# Auto-resolve from model name (gpt-* -> openai, claude-* -> anthropic, etc.)
provider = registry.get_for_model("gpt-4o")
provider = registry.get_for_model("claude-sonnet-4-20250514")

# List registered providers
registry.list_providers()  # ["anthropic", "openai"]

Model-to-provider inference patterns: gpt-*/o1-*/o3-* -> openai, claude-* -> anthropic, gemini-* -> google, llama-*/mixtral-* -> groq, grok-* -> xai.

Token Counting and Cost Estimation

from quartermaster_providers.providers import OpenAIProvider

provider = OpenAIProvider(api_key="sk-...")

tokens = provider.estimate_token_count("Hello, world!", "gpt-4o")
print(f"Estimated tokens: {tokens}")

cost = provider.estimate_cost("Hello, world!", "gpt-4o", output_tokens=100)
if cost is not None:
    print(f"Estimated cost: ${cost:.6f}")

Error Handling

All providers raise consistent exceptions from quartermaster_providers.exceptions:

from quartermaster_providers.exceptions import (
    ProviderError,          # Base exception (has .provider, .status_code)
    AuthenticationError,    # Invalid/missing API key (401)
    RateLimitError,         # Rate limited (429, has .retry_after)
    InvalidModelError,      # Model not available (404, has .model)
    InvalidRequestError,    # Malformed request (400)
    ContentFilterError,     # Blocked by safety filter (400)
    ContextLengthError,     # Input exceeds context window (400)
    ServiceUnavailableError,  # Provider temporarily down (503)
)

Testing

Use MockProvider for unit tests without real API calls:

from quartermaster_providers import LLMConfig, TokenResponse
from quartermaster_providers.testing import MockProvider

mock = MockProvider(responses=[
    TokenResponse(content="Paris", stop_reason="end_turn"),
    TokenResponse(content="Berlin", stop_reason="end_turn"),
])

config = LLMConfig(model="mock", provider="mock")

response = await mock.generate_text_response("Capital of France?", config)
assert response.content == "Paris"
assert mock.call_count == 1
assert mock.last_prompt == "Capital of France?"

# InMemoryHistory for conversation testing
from quartermaster_providers.testing import InMemoryHistory

history = InMemoryHistory()
history.add_message("user", "Hello")
history.add_message("assistant", "Hi there!")
assert len(history) == 2

Contributing

Contributions welcome. See CONTRIBUTING.md for guidelines.

License

Apache License 2.0. See LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

quartermaster_providers-0.4.9.tar.gz (105.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

quartermaster_providers-0.4.9-py3-none-any.whl (65.6 kB view details)

Uploaded Python 3

File details

Details for the file quartermaster_providers-0.4.9.tar.gz.

File metadata

  • Download URL: quartermaster_providers-0.4.9.tar.gz
  • Upload date:
  • Size: 105.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for quartermaster_providers-0.4.9.tar.gz
Algorithm Hash digest
SHA256 b33ffe4e9cdf493ec61b722c2992863cec8343c8cc5a407ac7442645a613aa24
MD5 e1d1a1b46f0a1f335e8c014c9df40c07
BLAKE2b-256 16edf98672764d0720ae9f7ab8441020cd1a4d56b6d486f5e70c7b319a7e7e8e

See more details on using hashes here.

File details

Details for the file quartermaster_providers-0.4.9-py3-none-any.whl.

File metadata

File hashes

Hashes for quartermaster_providers-0.4.9-py3-none-any.whl
Algorithm Hash digest
SHA256 83cfe10c11b5fa1ef863f449946b10ae574c1df60b5633d4293a41299617c007
MD5 ea88f03b565c3f6415fd978ba8f336f2
BLAKE2b-256 8b4b820ffc6c2e5c77353df2b2c4589c49123c3d0b1806e637f6f1cba33135a5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page