Unified multi-LLM provider abstraction supporting OpenAI, Anthropic, Google, Groq, xAI and more
Project description
quartermaster-providers
Unified multi-LLM provider abstraction for Python. Write once, run against OpenAI, Anthropic, Google, Groq, xAI, or any OpenAI-compatible endpoint.
Features
- 6 Providers: OpenAI, Anthropic, Google, Groq, xAI, plus a generic OpenAI-compatible adapter
- Streaming: Async generators for token-by-token responses
- Tool Calling: Unified interface for function/tool invocation across all providers
- Structured Output: JSON-schema-constrained response generation
- Extended Thinking: Claude and o-series reasoning chains
- Vision: Image understanding on supported models
- Transcription: Audio-to-text via OpenAI Whisper
- Token Counting: Estimate tokens and cost before making requests
- Provider Registry: Register providers once, resolve by name or model pattern
- Testing Utilities:
MockProviderandInMemoryHistoryfor unit tests - Type-Safe: Dataclass responses with full type hints
New in v0.4.0
CircuitBreaker--CircuitBreaker(failure_threshold=, recovery_timeout=)wraps any provider; raisesCircuitOpenErrorwhen the failure threshold is reached and recovers after the timeout.
Installation
pip install quartermaster-providers
Install with provider-specific extras:
pip install quartermaster-providers[openai]
pip install quartermaster-providers[anthropic]
pip install quartermaster-providers[openai,anthropic,google]
pip install quartermaster-providers[all]
Supported Providers
| Provider | Class | Models (examples) |
|---|---|---|
| OpenAI | OpenAIProvider |
gpt-4o, gpt-4-turbo, o1, o3-mini |
| Anthropic | AnthropicProvider |
claude-sonnet-4-20250514, claude-3-haiku |
GoogleProvider |
gemini-1.5-pro, gemini-pro | |
| Groq | GroqProvider |
llama-3-70b, mixtral-8x7b |
| xAI | XAIProvider |
grok-2, grok-2-mini |
| Quartermaster | QuartermasterProvider |
All models via one API key |
| Custom | OpenAICompatibleProvider |
Any OpenAI-compatible API |
Local / Self-Hosted Providers
| Provider | Class | Description |
|---|---|---|
| Ollama | OllamaProvider |
Local models via Ollama |
| vLLM | VLLMProvider |
High-throughput inference server |
| LM Studio | LMStudioProvider |
Desktop LLM app |
| TGI | TGIProvider |
HuggingFace Text Generation Inference |
| LocalAI | LocalAIProvider |
OpenAI-compatible local server |
| llama.cpp | LlamaCppProvider |
llama.cpp HTTP server |
Register local providers with one line — the module-level helper builds a
registry, normalises base_url (auto-appends /v1 if missing), honours the
OLLAMA_HOST env var, and remembers the default model so callers don't have
to repeat it:
from quartermaster_providers import register_local
provider_registry = register_local(
"ollama",
base_url="http://localhost:11434", # or set $OLLAMA_HOST
default_model="gemma4:26b",
)
provider = provider_registry.get("ollama")
Quick Start
Text Generation
import asyncio
from quartermaster_providers import LLMConfig
from quartermaster_providers.providers import OpenAIProvider
async def main():
provider = OpenAIProvider(api_key="sk-...")
config = LLMConfig(
model="gpt-4o",
provider="openai",
temperature=0.7,
max_output_tokens=1024,
)
response = await provider.generate_text_response(
prompt="Explain gradient descent in two sentences.",
config=config,
)
print(response.content) # str
print(response.stop_reason) # "end_turn", "max_tokens", etc.
asyncio.run(main())
Tool Calling
import asyncio
from quartermaster_providers import LLMConfig, ToolDefinition
from quartermaster_providers.providers import AnthropicProvider
async def main():
provider = AnthropicProvider(api_key="sk-ant-...")
config = LLMConfig(model="claude-sonnet-4-20250514", provider="anthropic")
tools = [
ToolDefinition(
name="get_weather",
description="Get current weather for a location",
input_schema={
"type": "object",
"properties": {
"location": {"type": "string", "description": "City name"},
},
"required": ["location"],
},
),
]
response = await provider.generate_tool_parameters(
prompt="What is the weather in Tokyo?",
tools=tools,
config=config,
)
for call in response.tool_calls:
print(f"{call.tool_name}({call.parameters})")
# get_weather({'location': 'Tokyo'})
print(f"Usage: {response.usage.total_tokens} tokens")
asyncio.run(main())
Tool Calling with quartermaster-tools
Tools created with @tool() integrate directly via ToolDescriptor:
from quartermaster_tools import tool
@tool()
def get_weather(city: str) -> dict:
"""Get current weather for a city.
Args:
city: The city name to look up.
"""
return {"city": city, "temperature": 22}
# Convert to provider-compatible format
tool_def = get_weather.info().to_anthropic_tools()
# Or for OpenAI:
tool_def = get_weather.info().to_openai_tools()
Streaming
import asyncio
from quartermaster_providers import LLMConfig
from quartermaster_providers.providers import OpenAIProvider
async def main():
provider = OpenAIProvider(api_key="sk-...")
config = LLMConfig(model="gpt-4o", provider="openai", stream=True)
async for chunk in await provider.generate_text_response(
prompt="Write a haiku about Python.",
config=config,
):
print(chunk.content, end="", flush=True)
asyncio.run(main())
Structured Output
import asyncio
from quartermaster_providers import LLMConfig
from quartermaster_providers.providers import OpenAIProvider
async def main():
provider = OpenAIProvider(api_key="sk-...")
config = LLMConfig(model="gpt-4o", provider="openai")
schema = {
"type": "object",
"properties": {
"title": {"type": "string"},
"summary": {"type": "string"},
"topics": {"type": "array", "items": {"type": "string"}},
},
"required": ["title", "summary", "topics"],
}
response = await provider.generate_structured_response(
prompt="Analyze the concept of reinforcement learning.",
response_schema=schema,
config=config,
)
print(response.structured_output["title"])
print(response.structured_output["topics"])
asyncio.run(main())
API Reference
LLMConfig
Controls request behavior across all providers.
from quartermaster_providers import LLMConfig
config = LLMConfig(
model="gpt-4o", # Provider model identifier
provider="openai", # Provider name
stream=False, # Stream token-by-token
temperature=0.7, # 0.0 (deterministic) to 2.0 (creative)
system_message=None, # System prompt
max_input_tokens=None, # Input token limit
max_output_tokens=None, # Output token limit
max_messages=None, # Conversation context limit
vision=False, # Enable image understanding
thinking_enabled=False, # Extended thinking (Claude, o-series)
thinking_budget=None, # Max thinking tokens
top_p=None, # Nucleus sampling
top_k=None, # Top-k sampling
frequency_penalty=None, # Frequency penalty (OpenAI)
presence_penalty=None, # Presence penalty (OpenAI)
)
AbstractLLMProvider Methods
Every provider implements these methods:
| Method | Returns | Description |
|---|---|---|
await list_models() |
list[str] |
Available model identifiers |
estimate_token_count(text, model) |
int |
Token estimate without API call |
prepare_tool(tool) |
Any |
Convert ToolDefinition to provider format |
await generate_text_response(prompt, config) |
TokenResponse or AsyncIterator[TokenResponse] |
Text generation (streaming when config.stream=True) |
await generate_tool_parameters(prompt, tools, config) |
ToolCallResponse |
Function/tool calling |
await generate_native_response(prompt, tools, config) |
NativeResponse |
Text + thinking + tool calls combined |
await generate_structured_response(prompt, schema, config) |
StructuredResponse |
JSON-schema-constrained output |
await transcribe(audio_path) |
str |
Audio-to-text transcription |
Cost estimation (non-abstract, returns None if pricing unavailable):
| Method | Returns |
|---|---|
get_cost_per_1k_input_tokens(model) |
float | None |
get_cost_per_1k_output_tokens(model) |
float | None |
estimate_cost(text, model, output_tokens) |
float | None |
Response Types
TokenResponse -- single response or streaming chunk:
response.content # str -- text content
response.stop_reason # str | None -- "end_turn", "max_tokens", "tool_use"
ToolCallResponse -- tool invocation results:
response.text_content # str -- any text alongside tool calls
response.tool_calls # list[ToolCall] -- each has .tool_name, .tool_id, .parameters
response.stop_reason # str | None
response.usage # TokenUsage | None
StructuredResponse -- JSON-schema-constrained output:
response.structured_output # dict[str, Any] -- parsed JSON
response.raw_output # str -- raw model text
response.usage # TokenUsage | None
NativeResponse -- complete model output:
response.text_content # str
response.thinking # list[ThinkingResponse] -- reasoning blocks
response.tool_calls # list[ToolCall]
response.usage # TokenUsage | None
TokenUsage -- token accounting:
usage.input_tokens # int
usage.output_tokens # int
usage.cache_creation_input_tokens # int (Anthropic prompt caching)
usage.cache_read_input_tokens # int
usage.total_tokens # property: input + output
ProviderRegistry
Register providers once and resolve by name or model pattern:
from quartermaster_providers import ProviderRegistry
from quartermaster_providers.providers import OpenAIProvider, AnthropicProvider
registry = ProviderRegistry()
registry.register("openai", OpenAIProvider, api_key="sk-...")
registry.register("anthropic", AnthropicProvider, api_key="sk-ant-...")
# Get by name
provider = registry.get("openai")
# Auto-resolve from model name (gpt-* -> openai, claude-* -> anthropic, etc.)
provider = registry.get_for_model("gpt-4o")
provider = registry.get_for_model("claude-sonnet-4-20250514")
# List registered providers
registry.list_providers() # ["anthropic", "openai"]
Model-to-provider inference patterns: gpt-*/o1-*/o3-* -> openai, claude-* -> anthropic, gemini-* -> google, llama-*/mixtral-* -> groq, grok-* -> xai.
Token Counting and Cost Estimation
from quartermaster_providers.providers import OpenAIProvider
provider = OpenAIProvider(api_key="sk-...")
tokens = provider.estimate_token_count("Hello, world!", "gpt-4o")
print(f"Estimated tokens: {tokens}")
cost = provider.estimate_cost("Hello, world!", "gpt-4o", output_tokens=100)
if cost is not None:
print(f"Estimated cost: ${cost:.6f}")
Error Handling
All providers raise consistent exceptions from quartermaster_providers.exceptions:
from quartermaster_providers.exceptions import (
ProviderError, # Base exception (has .provider, .status_code)
AuthenticationError, # Invalid/missing API key (401)
RateLimitError, # Rate limited (429, has .retry_after)
InvalidModelError, # Model not available (404, has .model)
InvalidRequestError, # Malformed request (400)
ContentFilterError, # Blocked by safety filter (400)
ContextLengthError, # Input exceeds context window (400)
ServiceUnavailableError, # Provider temporarily down (503)
)
Testing
Use MockProvider for unit tests without real API calls:
from quartermaster_providers import LLMConfig, TokenResponse
from quartermaster_providers.testing import MockProvider
mock = MockProvider(responses=[
TokenResponse(content="Paris", stop_reason="end_turn"),
TokenResponse(content="Berlin", stop_reason="end_turn"),
])
config = LLMConfig(model="mock", provider="mock")
response = await mock.generate_text_response("Capital of France?", config)
assert response.content == "Paris"
assert mock.call_count == 1
assert mock.last_prompt == "Capital of France?"
# InMemoryHistory for conversation testing
from quartermaster_providers.testing import InMemoryHistory
history = InMemoryHistory()
history.add_message("user", "Hello")
history.add_message("assistant", "Hi there!")
assert len(history) == 2
Contributing
Contributions welcome. See CONTRIBUTING.md for guidelines.
License
Apache License 2.0. See LICENSE for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file quartermaster_providers-0.8.2.tar.gz.
File metadata
- Download URL: quartermaster_providers-0.8.2.tar.gz
- Upload date:
- Size: 109.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0503a1ecc20cbdd551367384a98858ea8e1063aa461062bb4c5ea1e2d53aedb8
|
|
| MD5 |
311208a1f86aac408406eef8b6d5b181
|
|
| BLAKE2b-256 |
9d8344c6d67fb02d2b7e905c2b8fd70a0c02e17591729634973a99378ecee395
|
File details
Details for the file quartermaster_providers-0.8.2-py3-none-any.whl.
File metadata
- Download URL: quartermaster_providers-0.8.2-py3-none-any.whl
- Upload date:
- Size: 62.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e2e416749c3b3a33164b5137ee3c4ac02e6ba3db4ab3629a925ef922549660af
|
|
| MD5 |
3f138fd46206b871af8c6daca5821655
|
|
| BLAKE2b-256 |
8c6ddacdf1fd6c94c8eb85e00b28abd90a8b42f55b8c7a5652d87b73f1a61aaa
|