Online LLM clients for OpenAI, Google Gemini, Mistral, Anthropic Claude, and OpenRouter
Project description
covenance
Type-safe LLM outputs across any provider. Track every call and its cost.
from covenance import ask_llm
review = ask_llm("Write a short review of Inception", model="gpt-4.1-nano")
is_positive = ask_llm(
"Is this review positive? '{review}'",
model="gemini-2.5-flash-lite",
response_type=bool)
print(is_positive) # True/False
Usecases
- Structured outputs that work - Same code, any provider. Pydantic models, primitives, lists, tuples.
- Zero routing code - Model name determines provider automatically (
gemini-*,claude-*,gpt-*) - Convenience - you get TPM (Token Per Minute) limit retries automatically, as well as if the LLM fails to return the type you have requested.
- Visibility: Know what you're calling and spending - Every call logged with token counts and cost.
print_usage()for totals,print_call_timeline()for a visual waterfall.
Installation
Install only the providers you need:
pip install covenance[openai] # OpenAI, Grok, OpenRouter
pip install covenance[anthropic] # Anthropic Claude
pip install covenance[google] # Google Gemini
pip install covenance[mistral] # Mistral
# Multiple providers
pip install covenance[openai,anthropic]
# All providers
pip install covenance[all]
Structured outputs
Pass response_type to get validated, typed results:
# Pydantic models
class Evaluation(BaseModel):
reasoning: str
is_correct: bool
result = ask_llm("Is 2+2=5?", model="gemini-2.5-flash-lite", response_type=Evaluation)
print(result.reasoning) # "2+2 equals 4, not 5"
print(result.is_correct) # False
# Primitives
answer = ask_llm("Is Python interpreted?", model="gpt-4.1-nano", response_type=bool)
print(answer) # True
# Collections
items = ask_llm("List 3 prime numbers", model="claude-sonnet-4-20250514", response_type=list[int])
print(items) # [2, 3, 5]
Works identically across OpenAI, Gemini, Anthropic, Mistral, Grok, and OpenRouter.
Cost tracking
Every call is recorded with token counts and cost:
from covenance import ask_llm, print_usage, print_call_timeline, get_records
ask_llm("Hello", model="gpt-4.1-nano")
ask_llm("Hello", model="gemini-2.5-flash-lite")
print_usage()
# ==================================================
# LLM Usage Summary (default client)
# ==================================================
# Calls: 2
# Tokens: 45 (In: 12, Out: 33)
# Cost: $0.0001
# Models: gemini/gemini-2.5-flash-lite, openai/gpt-4.1-nano
# Access individual records
for record in get_records():
print(f"{record.model}: {record.cost_usd}")
Persist records by setting COVENANCE_RECORDS_DIR or calling set_llm_call_records_dir().
Call timeline
Visualize call sequences and parallelism in your terminal:
from covenance import print_call_timeline
print_call_timeline()
# LLM Call Timeline (4.4s total, 5 calls)
# |0s 4.4s|
# gpt-4.1-nano 1.3s |████████████████ |
# g2.5-flash-l 1.1s | ████████████ |
# g2.5-flash-l 1.1s | ████████████ |
# g2.5-flash-l 1.5s | ████████████████ |
# g2.5-flash-l 1.5s | █████████████████|
Each line is a call, sorted by start time. Blocks show when each call was active - parallel calls appear as overlapping bars on different rows.
Consensus for quality
Run parallel LLM calls and integrate results for higher quality:
from covenance import llm_consensus
result = llm_consensus(
"Explain quantum entanglement",
model="gpt-4.1-nano",
response_type=Evaluation,
num_candidates=3, # 3 parallel calls + integration
)
Supported providers
Provider is determined by model name prefix:
| Prefix | Provider |
|---|---|
gpt-*, o1-*, o3-* |
OpenAI |
gemini-* |
Google Gemini |
claude-* |
Anthropic |
mistral-*, codestral-* |
Mistral |
grok-* |
xAI Grok |
org/model (contains /) |
OpenRouter |
Structured output reliability
Providers differ in how they enforce JSON schema compliance:
| Provider | Method | Guarantee |
|---|---|---|
| OpenAI | Constrained decoding | 100% schema-valid JSON |
| Google Gemini | Controlled generation | 100% schema-valid JSON |
| Grok | Constrained decoding | 100% schema-valid JSON |
| Anthropic | Structured outputs beta | 100% schema-valid JSON* |
| Mistral | Best-effort | Probabilistic |
| OpenRouter | Varies | Depends on underlying model |
*Anthropic structured outputs requires SDK >= 0.74.1 (uses anthropic-beta: structured-outputs-2025-11-13). Mistral uses probabilistic generation. Covenance retries automatically (up to 3 times) on JSON parse errors for Mistral.
API keys
Set environment variables for the providers you use:
OPENAI_API_KEYGOOGLE_API_KEY(orGEMINI_API_KEY)ANTHROPIC_API_KEYMISTRAL_API_KEYOPENROUTER_API_KEYXAI_API_KEY(for Grok)
A .env file in the working directory is loaded automatically.
Isolated clients
Use Covenance instances for separate API keys and call records per subsystem:
from covenance import Covenance
from pydantic import BaseModel
# Each client tracks its own usage
question_client = Covenance(label="questions")
review_client = Covenance(label="review")
answer = question_client.ask_llm("Who is David Blaine?", model="gpt-4.1-nano")
class Evaluation(BaseModel):
reasoning: str
is_correct: bool
eval = review_client.llm_consensus(
f"Is this accurate? '''{answer}'''",
model="gemini-2.5-flash-lite",
response_type=Evaluation,
)
question_client.print_usage() # Shows only the question call
review_client.print_usage() # Shows only the review call
How it works: dual backend
Covenance uses two backends for structured output and picks the better one per provider:
- Native SDK — calls the provider's API directly (e.g., OpenAI Responses API with
responses.parse) - pydantic-ai — uses pydantic-ai as a unified layer
The default routing:
| Provider | Backend | Why |
|---|---|---|
| OpenAI | Native | Responses API with constrained decoding handles enums, recursive types, and large schemas more reliably |
| Grok | Native | OpenAI-compatible API, same benefits |
| Gemini | pydantic-ai | Native SDK hits RecursionError on self-referencing types (e.g., tree nodes) |
| Anthropic | pydantic-ai | No native client implemented |
| Mistral | pydantic-ai | Similar pass rates; pydantic-ai handles recursive types better |
| OpenRouter | pydantic-ai | No native client implemented |
These defaults are based on a stress test suite that runs 14 test categories across providers with both backends. The results for the cheapest model per provider:
OpenAI (gpt-4.1-nano): native 14/14, pydantic-ai 10/14
Gemini (gemini-2.5-flash-lite): native 11/14, pydantic-ai 13/14
Mistral (mistral-small-latest): native 9/14, pydantic-ai 8/14
Where native beats pydantic-ai on OpenAI: enum adherence (strict values vs. hallucinated ones), recursive types (deeper trees), real-world schemas (fewer empty fields), and extreme schema limits (100+ fields with Literal types).
Where pydantic-ai beats native on Gemini: recursive/self-referencing types (native Google SDK crashes with RecursionError).
Overriding the backend
Each Covenance instance has a backends object with a field per provider. You can inspect and override them:
from covenance import Covenance
client = Covenance()
print(client.backends)
# Backends(native=[openai, grok], pydantic=[gemini, anthropic, mistral, openrouter])
# Override a specific provider
client.backends.anthropic = "native"
# Force all providers to one backend (useful for benchmarking)
client.backends.set_all("native")
Only "native" and "pydantic" are accepted — anything else raises ValueError.
Every call records which backend was used:
for record in client.get_records():
print(f"{record.model}: {record.backend}") # "native" or "pydantic"
The backend also shows in print_call_timeline() as (N) or (P):
print_call_timeline()
# LLM Call Timeline (2.1s total, 2 calls)
# |0s 2.1s|
# gpt-4.1-nano(N) 0.8s |█████████████████ |
# g2.5-flash-l(P) 1.1s | ██████████████████████████ |
To see routing decisions in real time, enable debug logging:
import logging
logging.basicConfig(level=logging.DEBUG)
# DEBUG:covenance:ask_llm: model=gpt-4.1-nano provider=openai backend=native
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file covenance-0.0.9.tar.gz.
File metadata
- Download URL: covenance-0.0.9.tar.gz
- Upload date:
- Size: 33.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
df7b05a9115134c3c47540cc6d4be574111ea52b3c463f134e8821aa8b0f049f
|
|
| MD5 |
fcfb5798387355775fc73951fc8fd607
|
|
| BLAKE2b-256 |
7bf776f6e05d0902b5d2d72f4ddbc834fa80fad4c39b626f865eb1631b8429d1
|
File details
Details for the file covenance-0.0.9-py3-none-any.whl.
File metadata
- Download URL: covenance-0.0.9-py3-none-any.whl
- Upload date:
- Size: 45.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
dc9fc6a6346773e837e59f9d4177ce3e00aa16435875604b7ef0bf1e5c1c2054
|
|
| MD5 |
66843cdcc55e54b57ca009dd07bd2228
|
|
| BLAKE2b-256 |
2de8fac7584c3e941dc45103d988f173e0932bd2578f275ee2e4df543a5eb566
|