Unified multi-LLM provider abstraction supporting OpenAI, Anthropic, Google, Groq, xAI and more

These details have not been verified by PyPI

Project links

Project description

quartermaster-providers

Unified multi-LLM provider abstraction for Python. Write once, run against OpenAI, Anthropic, Google, Groq, xAI, or any OpenAI-compatible endpoint.

Features

6 Providers: OpenAI, Anthropic, Google, Groq, xAI, plus a generic OpenAI-compatible adapter
Streaming: Async generators for token-by-token responses
Tool Calling: Unified interface for function/tool invocation across all providers
Structured Output: JSON-schema-constrained response generation
Extended Thinking: Claude and o-series reasoning chains
Vision: Image understanding on supported models
Transcription: Audio-to-text via OpenAI Whisper
Token Counting: Estimate tokens and cost before making requests
Provider Registry: Register providers once, resolve by name or model pattern
Testing Utilities: MockProvider and InMemoryHistory for unit tests
Type-Safe: Dataclass responses with full type hints

Installation

pip install quartermaster-providers

Install with provider-specific extras:

pip install quartermaster-providers[openai]
pip install quartermaster-providers[anthropic]
pip install quartermaster-providers[openai,anthropic,google]
pip install quartermaster-providers[all]

Supported Providers

Provider	Class	Models (examples)
OpenAI	`OpenAIProvider`	gpt-4o, gpt-4-turbo, o1, o3-mini
Anthropic	`AnthropicProvider`	claude-sonnet-4-20250514, claude-3-haiku
Google	`GoogleProvider`	gemini-1.5-pro, gemini-pro
Groq	`GroqProvider`	llama-3-70b, mixtral-8x7b
xAI	`XAIProvider`	grok-2, grok-2-mini
Quartermaster	`QuartermasterProvider`	All models via one API key
Custom	`OpenAICompatibleProvider`	Any OpenAI-compatible API

Local / Self-Hosted Providers

Provider	Class	Description
Ollama	`OllamaProvider`	Local models via Ollama
vLLM	`VLLMProvider`	High-throughput inference server
LM Studio	`LMStudioProvider`	Desktop LLM app
TGI	`TGIProvider`	HuggingFace Text Generation Inference
LocalAI	`LocalAIProvider`	OpenAI-compatible local server
llama.cpp	`LlamaCppProvider`	llama.cpp HTTP server

Register local providers with one line — the module-level helper builds a registry, normalises base_url (auto-appends /v1 if missing), honours the OLLAMA_HOST env var, and remembers the default model so callers don't have to repeat it:

from quartermaster_providers import register_local

provider_registry = register_local(
    "ollama",
    base_url="http://localhost:11434",   # or set $OLLAMA_HOST
    default_model="gemma4:26b",
)
provider = provider_registry.get("ollama")

Sync `OllamaProvider.chat()` shim

For one-shot calls from sync code (Celery workers, Django views, CLI scripts) the OllamaProvider exposes a synchronous native /api/chat shim — no asgiref.async_to_sync wrapper required, and thinking / reasoning text is auto-promoted into content so reasoning models like gemma4:26b never return an empty result on short prompts:

from quartermaster_providers.providers.local import OllamaProvider

provider = OllamaProvider(default_model="gemma4:26b")  # honours $OLLAMA_HOST
result = provider.chat(
    messages=[
        {"role": "system", "content": "Respond in Slovenian."},
        {"role": "user", "content": "Pozdravljen!"},
    ],
    max_output_tokens=128,    # honoured — capped at Ollama's `num_predict`
    thinking_level="off",     # off / low / medium / high
)
print(result.content)         # str — promoted from `reasoning` if `content` empty
print(result.tool_calls)      # list[ToolCall]
print(result.usage)           # {prompt_tokens, completion_tokens, total_tokens}

Connection errors raise ServiceUnavailableError; HTTP errors raise ProviderError with status_code attached. Neither swallows into a soft "no answer" result the way the OpenAI-compat path used to.

Quick Start

Text Generation

import asyncio
from quartermaster_providers import LLMConfig
from quartermaster_providers.providers import OpenAIProvider

async def main():
    provider = OpenAIProvider(api_key="sk-...")
    config = LLMConfig(
        model="gpt-4o",
        provider="openai",
        temperature=0.7,
        max_output_tokens=1024,
    )

    response = await provider.generate_text_response(
        prompt="Explain gradient descent in two sentences.",
        config=config,
    )
    print(response.content)  # str
    print(response.stop_reason)  # "end_turn", "max_tokens", etc.

asyncio.run(main())

Tool Calling

import asyncio
from quartermaster_providers import LLMConfig, ToolDefinition
from quartermaster_providers.providers import AnthropicProvider

async def main():
    provider = AnthropicProvider(api_key="sk-ant-...")
    config = LLMConfig(model="claude-sonnet-4-20250514", provider="anthropic")

    tools = [
        ToolDefinition(
            name="get_weather",
            description="Get current weather for a location",
            input_schema={
                "type": "object",
                "properties": {
                    "location": {"type": "string", "description": "City name"},
                },
                "required": ["location"],
            },
        ),
    ]

    response = await provider.generate_tool_parameters(
        prompt="What is the weather in Tokyo?",
        tools=tools,
        config=config,
    )

    for call in response.tool_calls:
        print(f"{call.tool_name}({call.parameters})")
        # get_weather({'location': 'Tokyo'})

    print(f"Usage: {response.usage.total_tokens} tokens")

asyncio.run(main())

Tool Calling with quartermaster-tools

Tools created with @tool() integrate directly via ToolDescriptor:

from quartermaster_tools import tool

@tool()
def get_weather(city: str) -> dict:
    """Get current weather for a city.

    Args:
        city: The city name to look up.
    """
    return {"city": city, "temperature": 22}

# Convert to provider-compatible format
tool_def = get_weather.info().to_anthropic_tools()
# Or for OpenAI:
tool_def = get_weather.info().to_openai_tools()

Streaming

import asyncio
from quartermaster_providers import LLMConfig
from quartermaster_providers.providers import OpenAIProvider

async def main():
    provider = OpenAIProvider(api_key="sk-...")
    config = LLMConfig(model="gpt-4o", provider="openai", stream=True)

    async for chunk in await provider.generate_text_response(
        prompt="Write a haiku about Python.",
        config=config,
    ):
        print(chunk.content, end="", flush=True)

asyncio.run(main())

Structured Output

import asyncio
from quartermaster_providers import LLMConfig
from quartermaster_providers.providers import OpenAIProvider

async def main():
    provider = OpenAIProvider(api_key="sk-...")
    config = LLMConfig(model="gpt-4o", provider="openai")

    schema = {
        "type": "object",
        "properties": {
            "title": {"type": "string"},
            "summary": {"type": "string"},
            "topics": {"type": "array", "items": {"type": "string"}},
        },
        "required": ["title", "summary", "topics"],
    }

    response = await provider.generate_structured_response(
        prompt="Analyze the concept of reinforcement learning.",
        response_schema=schema,
        config=config,
    )

    print(response.structured_output["title"])
    print(response.structured_output["topics"])

asyncio.run(main())

API Reference

LLMConfig

Controls request behavior across all providers.

from quartermaster_providers import LLMConfig

config = LLMConfig(
    model="gpt-4o",             # Provider model identifier
    provider="openai",          # Provider name
    stream=False,               # Stream token-by-token
    temperature=0.7,            # 0.0 (deterministic) to 2.0 (creative)
    system_message=None,        # System prompt
    max_input_tokens=None,      # Input token limit
    max_output_tokens=None,     # Output token limit
    max_messages=None,          # Conversation context limit
    vision=False,               # Enable image understanding
    thinking_enabled=False,     # Extended thinking (Claude, o-series)
    thinking_budget=None,       # Max thinking tokens
    top_p=None,                 # Nucleus sampling
    top_k=None,                 # Top-k sampling
    frequency_penalty=None,     # Frequency penalty (OpenAI)
    presence_penalty=None,      # Presence penalty (OpenAI)
)

AbstractLLMProvider Methods

Every provider implements these methods:

Method	Returns	Description
`await list_models()`	`list[str]`	Available model identifiers
`estimate_token_count(text, model)`	`int`	Token estimate without API call
`prepare_tool(tool)`	`Any`	Convert `ToolDefinition` to provider format
`await generate_text_response(prompt, config)`	`TokenResponse` or `AsyncIterator[TokenResponse]`	Text generation (streaming when `config.stream=True`)
`await generate_tool_parameters(prompt, tools, config)`	`ToolCallResponse`	Function/tool calling
`await generate_native_response(prompt, tools, config)`	`NativeResponse`	Text + thinking + tool calls combined
`await generate_structured_response(prompt, schema, config)`	`StructuredResponse`	JSON-schema-constrained output
`await transcribe(audio_path)`	`str`	Audio-to-text transcription

Cost estimation (non-abstract, returns None if pricing unavailable):

Method	Returns
`get_cost_per_1k_input_tokens(model)`	`float \| None`
`get_cost_per_1k_output_tokens(model)`	`float \| None`
`estimate_cost(text, model, output_tokens)`	`float \| None`

Response Types

TokenResponse -- single response or streaming chunk:

response.content      # str -- text content
response.stop_reason  # str | None -- "end_turn", "max_tokens", "tool_use"

ToolCallResponse -- tool invocation results:

response.text_content  # str -- any text alongside tool calls
response.tool_calls    # list[ToolCall] -- each has .tool_name, .tool_id, .parameters
response.stop_reason   # str | None
response.usage         # TokenUsage | None

StructuredResponse -- JSON-schema-constrained output:

response.structured_output  # dict[str, Any] -- parsed JSON
response.raw_output         # str -- raw model text
response.usage              # TokenUsage | None

NativeResponse -- complete model output:

response.text_content  # str
response.thinking      # list[ThinkingResponse] -- reasoning blocks
response.tool_calls    # list[ToolCall]
response.usage         # TokenUsage | None

TokenUsage -- token accounting:

usage.input_tokens                  # int
usage.output_tokens                 # int
usage.cache_creation_input_tokens   # int (Anthropic prompt caching)
usage.cache_read_input_tokens       # int
usage.total_tokens                  # property: input + output

ProviderRegistry

from quartermaster_providers import ProviderRegistry
from quartermaster_providers.providers import OpenAIProvider, AnthropicProvider

registry = ProviderRegistry()
registry.register("openai", OpenAIProvider, api_key="sk-...")
registry.register("anthropic", AnthropicProvider, api_key="sk-ant-...")

# Get by name
provider = registry.get("openai")

# Auto-resolve from model name (gpt-* -> openai, claude-* -> anthropic, etc.)
provider = registry.get_for_model("gpt-4o")
provider = registry.get_for_model("claude-sonnet-4-20250514")

# List registered providers
registry.list_providers()  # ["anthropic", "openai"]

Model-to-provider inference patterns: gpt-*/o1-*/o3-* -> openai, claude-* -> anthropic, gemini-* -> google, llama-*/mixtral-* -> groq, grok-* -> xai.

Token Counting and Cost Estimation

from quartermaster_providers.providers import OpenAIProvider

provider = OpenAIProvider(api_key="sk-...")

tokens = provider.estimate_token_count("Hello, world!", "gpt-4o")
print(f"Estimated tokens: {tokens}")

cost = provider.estimate_cost("Hello, world!", "gpt-4o", output_tokens=100)
if cost is not None:
    print(f"Estimated cost: ${cost:.6f}")

Error Handling

All providers raise consistent exceptions from quartermaster_providers.exceptions:

from quartermaster_providers.exceptions import (
    ProviderError,          # Base exception (has .provider, .status_code)
    AuthenticationError,    # Invalid/missing API key (401)
    RateLimitError,         # Rate limited (429, has .retry_after)
    InvalidModelError,      # Model not available (404, has .model)
    InvalidRequestError,    # Malformed request (400)
    ContentFilterError,     # Blocked by safety filter (400)
    ContextLengthError,     # Input exceeds context window (400)
    ServiceUnavailableError,  # Provider temporarily down (503)
)

Testing

Use MockProvider for unit tests without real API calls:

from quartermaster_providers import LLMConfig, TokenResponse
from quartermaster_providers.testing import MockProvider

mock = MockProvider(responses=[
    TokenResponse(content="Paris", stop_reason="end_turn"),
    TokenResponse(content="Berlin", stop_reason="end_turn"),
])

config = LLMConfig(model="mock", provider="mock")

response = await mock.generate_text_response("Capital of France?", config)
assert response.content == "Paris"
assert mock.call_count == 1
assert mock.last_prompt == "Capital of France?"

# InMemoryHistory for conversation testing
from quartermaster_providers.testing import InMemoryHistory

history = InMemoryHistory()
history.add_message("user", "Hello")
history.add_message("assistant", "Hi there!")
assert len(history) == 2

Contributing

Contributions welcome. See CONTRIBUTING.md for guidelines.

License

Apache License 2.0. See LICENSE for details.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.8.2

Apr 29, 2026

0.8.1

Apr 29, 2026

0.8.0

Apr 29, 2026

0.7.1

Apr 29, 2026

0.7.0

Apr 24, 2026

0.6.3

Apr 24, 2026

0.6.2

Apr 23, 2026

0.6.1

Apr 23, 2026

0.6.0

Apr 23, 2026

0.5.1

Apr 23, 2026

0.5.0

Apr 23, 2026

0.4.11

Apr 22, 2026

0.4.10

Apr 22, 2026

0.4.9

Apr 22, 2026

0.4.8

Apr 22, 2026

0.4.7

Apr 22, 2026

0.4.6

Apr 22, 2026

0.4.5

Apr 22, 2026

0.4.4

Apr 22, 2026

0.4.3

Apr 22, 2026

0.4.2

Apr 16, 2026

0.4.1

Apr 16, 2026

0.4.0

Apr 16, 2026

This version

0.3.1

Apr 15, 2026

0.3.0

Apr 15, 2026

0.2.1

Apr 15, 2026

0.2.0

Apr 15, 2026

0.1.6

Apr 15, 2026

0.1.5

Apr 15, 2026

0.1.4

Apr 15, 2026

0.1.3

Apr 15, 2026

0.1.2

Apr 15, 2026

0.1.1

Apr 15, 2026

0.1.0

Apr 15, 2026

0.0.1

Apr 14, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

quartermaster_providers-0.3.1.tar.gz (86.4 kB view details)

Uploaded Apr 15, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

quartermaster_providers-0.3.1-py3-none-any.whl (50.4 kB view details)

Uploaded Apr 15, 2026 Python 3

File details

Details for the file quartermaster_providers-0.3.1.tar.gz.

File metadata

Download URL: quartermaster_providers-0.3.1.tar.gz
Upload date: Apr 15, 2026
Size: 86.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for quartermaster_providers-0.3.1.tar.gz
Algorithm	Hash digest
SHA256	`b6a9e89f29a7bdb3bf7aa440d9151bc1f83ee7c53b0177aa9e7f6cdba636e519`
MD5	`b91d9f29ca49cc8dae365312ae882b1d`
BLAKE2b-256	`fea12873fddd16cdd48a0aaa1b1b8ced3f018dd0f2455716d041c183a3550836`

See more details on using hashes here.

File details

Details for the file quartermaster_providers-0.3.1-py3-none-any.whl.

File metadata

Download URL: quartermaster_providers-0.3.1-py3-none-any.whl
Upload date: Apr 15, 2026
Size: 50.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for quartermaster_providers-0.3.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ce1e87a1ab1b38c933e0c8c962e995fc718d871be9826fe40a41328c235e2920`
MD5	`69626a12ebe2a05bff172611ce653313`
BLAKE2b-256	`479ee219172ceaebed0f2f183b95ec0a7ebb854e50db350b31234daf9ce104a2`

See more details on using hashes here.

quartermaster-providers 0.3.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

quartermaster-providers

Features

Installation

Supported Providers

Local / Self-Hosted Providers

Sync OllamaProvider.chat() shim

Quick Start

Text Generation

Tool Calling

Tool Calling with quartermaster-tools

Streaming

Structured Output

API Reference

LLMConfig

AbstractLLMProvider Methods

Response Types

ProviderRegistry

Token Counting and Cost Estimation

Error Handling

Testing

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

Sync `OllamaProvider.chat()` shim