Unified Python interface for OpenAI, Anthropic, Google, and Ollama LLMs
Project description
LLMRing
A Python library for LLM integration with unified interface and MCP support. Supports OpenAI, Anthropic, Google Gemini, and Ollama with consistent APIs.
Features
- Unified Interface: Single API for all major LLM providers
- Streaming Support: Streaming for all providers
- Native Tool Calling: Provider-native function calling with consistent interface
- Unified Structured Output: JSON schema works across all providers with automatic adaptation
- Conversational Configuration: MCP chat interface for natural language lockfile setup
- Aliases: Semantic aliases (
deep,fast,balanced) with registry-based recommendations - Cost Tracking: Cost calculation with on-demand receipt generation
- Registry Integration: Centralized model capabilities and pricing
- Fallback Models: Automatic failover to alternative models
- Type Safety: Typed exceptions and error handling
- MCP Integration: Model Context Protocol support for tool ecosystems
- MCP Chat Client: Chat interface with persistent history for any MCP server
Quick Start
Installation
# With uv (recommended)
uv add llmring
# With pip
pip install llmring
Including Lockfiles in Your Package:
To ship your llmring.lock with your package (like llmring does), add to your pyproject.toml:
[tool.hatch.build]
include = [
"src/yourpackage/**/*.py",
"src/yourpackage/**/*.lock", # Include lockfiles
]
Basic Usage
from llmring.service import LLMRing
from llmring.schemas import LLMRequest, Message
# Initialize service with context manager (auto-closes resources)
async with LLMRing() as service:
# Simple chat
request = LLMRequest(
model="fast",
messages=[
Message(role="system", content="You are a helpful assistant."),
Message(role="user", content="Hello!")
]
)
response = await service.chat(request)
print(response.content)
Claude Code Skills
LLMRing provides expert guidance skills for Claude Code that teach Claude how to work with the library effectively. When you use Claude Code with LLMRing, these skills automatically activate to provide:
- ✅ Production-ready code examples
- ✅ Best practices and patterns
- ✅ Common pitfalls to avoid
- ✅ Multi-provider integration guidance
- ✅ Configuration and setup help
No manual needed - just ask Claude naturally and the right skills load automatically!
Installation
# In Claude Code terminal, add the marketplace
/plugin marketplace add juanre/ai-tools
# Install all llmring skills (recommended)
/plugin install llmring@juanre-ai-tools
# Or install individual skills
/plugin install llmring-chat@juanre-ai-tools
/plugin install llmring-streaming@juanre-ai-tools
Available Skills
| Skill | Description | Install |
|---|---|---|
llmring |
All llmring skills (recommended) | /plugin install llmring@juanre-ai-tools |
llmring-chat |
Basic chat completions | /plugin install llmring-chat@juanre-ai-tools |
llmring-streaming |
Streaming responses | /plugin install llmring-streaming@juanre-ai-tools |
llmring-tools |
Function calling and tool use | /plugin install llmring-tools@juanre-ai-tools |
llmring-structured |
JSON schema and typed responses | /plugin install llmring-structured@juanre-ai-tools |
llmring-lockfile |
Aliases, profiles, and configuration | /plugin install llmring-lockfile@juanre-ai-tools |
llmring-providers |
Multi-provider switching and fallbacks | /plugin install llmring-providers@juanre-ai-tools |
How It Works
Example: Setting up streaming
You ask:
"Help me add streaming responses from OpenAI"
What happens:
- Claude sees "streaming", "OpenAI"
- Automatically loads
llmring-streamingskill - Provides expert guidance with working code
- Shows you exactly what you need
Result: Production-ready streaming implementation with best practices built-in!
Example: Configuring lockfiles
You ask:
"Set up model aliases for development and production"
What happens:
- Claude sees "aliases", "development", "production"
- Loads
llmring-lockfileskill - Guides you through profiles and configuration
- Shows you how to use
llmring lock chatfor conversational setup
Result: Complete lockfile configuration with environment-specific settings!
Overview
Streaming
async with LLMRing() as service:
# Streaming for all providers
request = LLMRequest(
model="balanced",
messages=[Message(role="user", content="Count to 10")]
)
accumulated_usage = None
async for chunk in service.chat_stream(request):
print(chunk.content, end="", flush=True)
# Capture final usage stats
if chunk.usage:
accumulated_usage = chunk.usage
print() # Newline after streaming
if accumulated_usage:
print(f"Tokens used: {accumulated_usage.get('total_tokens', 0)}")
Tool Calling
async with LLMRing() as service:
tools = [{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get weather for a location",
"parameters": {
"type": "object",
"properties": {
"location": {"type": "string"}
},
"required": ["location"]
}
}
}]
request = LLMRequest(
model="balanced",
messages=[Message(role="user", content="What's the weather in NYC?")],
tools=tools
)
response = await service.chat(request)
if response.tool_calls:
print("Function called:", response.tool_calls[0]["function"]["name"])
Resource Management
Context Manager (Recommended)
from llmring import LLMRing, LLMRequest, Message
# Automatic resource cleanup with context manager
async with LLMRing() as service:
request = LLMRequest(
model="fast",
messages=[Message(role="user", content="Hello!")]
)
response = await service.chat(request)
# Resources are automatically cleaned up when exiting the context
Manual Cleanup
# Manual resource management
service = LLMRing()
try:
response = await service.chat(request)
finally:
await service.close() # Ensure resources are cleaned up
Advanced Features
Unified Structured Output
# JSON schema API works across all providers
request = LLMRequest(
model="balanced", # Works with any provider
messages=[Message(role="user", content="Generate a person")],
response_format={
"type": "json_schema",
"json_schema": {
"name": "person",
"schema": {
"type": "object",
"properties": {
"name": {"type": "string"},
"age": {"type": "integer"},
"email": {"type": "string"}
},
"required": ["name", "age"]
}
},
"strict": True # Validates across all providers
}
)
response = await service.chat(request)
print("JSON:", response.content) # Valid JSON string
print("Data:", response.parsed) # Python dict ready to use
Provider-Specific Parameters
# Anthropic: Prompt caching for 90% cost savings
request = LLMRequest(
model="balanced",
messages=[
Message(
role="system",
content="Very long system prompt...", # 1024+ tokens
metadata={"cache_control": {"type": "ephemeral"}}
),
Message(role="user", content="Hello")
]
)
# Extra parameters for provider-specific features
request = LLMRequest(
model="fast",
messages=[Message(role="user", content="Hello")],
extra_params={
"logprobs": True,
"top_logprobs": 5,
"presence_penalty": 0.1,
"seed": 12345
}
)
Model Aliases and Lockfiles
LLMRing uses lockfiles to map semantic aliases to models, with support for fallback pools and environment-specific profiles:
# Initialize lockfile (explicit creation at current directory)
llmring lock init
# Conversational configuration with AI advisor (recommended)
llmring lock chat # Natural language interface for lockfile management
# View current aliases
llmring aliases
Lockfile Resolution Order:
- Explicit path via
lockfile_pathparameter (file must exist) LLMRING_LOCKFILE_PATHenvironment variable (file must exist)./llmring.lockin current directory (if exists)- Bundled lockfile at
src/llmring/llmring.lock(minimal fallback with advisor alias)
Packaging Your Own Lockfile: Libraries using LLMRing can ship with their own lockfiles. See Lockfile Documentation for details on:
- Including lockfiles in your package distribution
- Lockfile resolution order and precedence
- Creating lockfiles with fallback models
- Environment-specific profiles and configuration
Conversational Configuration via llmring lock chat:
- Describe your requirements in natural language
- Get AI-powered recommendations based on registry analysis
- Configure aliases with multiple fallback models
- Understand cost implications and tradeoffs
- Set up environment-specific profiles
# Use semantic aliases (always current, with fallbacks)
request = LLMRequest(
model="deep", # → most capable reasoning model
messages=[Message(role="user", content="Hello")]
)
# Or use other aliases:
# model="fast" → cost-effective quick responses
# model="balanced" → optimal all-around model
# model="advisor" → Claude Opus 4.1 - powers conversational config
Key features:
- Registry-based recommendations
- Fallback models provide automatic failover
- Cost analysis and recommendations
- Environment-specific configurations for dev/staging/prod
Using LLMRing in Libraries
If you're building a library that uses LLMRing, follow this pattern to ship with defaults while allowing users to override your model choices:
Library Pattern
Libraries should:
- Ship with a bundled
llmring.lockin their package - Accept optional
lockfile_pathparameter - Validate required aliases on initialization
- Document which aliases they require
This allows:
- Library works out of the box with defaults
- Users can override with their own lockfile
- Clear errors if user's lockfile is incomplete
Simple Library Example
# my-library/src/my_library/__init__.py
from pathlib import Path
from llmring import LLMRing
# Library's bundled lockfile (shipped with package)
DEFAULT_LOCKFILE = Path(__file__).parent / "llmring.lock"
REQUIRED_ALIASES = ["summarizer"]
class MyLibrary:
"""Example library using llmring with configurable lockfile."""
def __init__(self, lockfile_path=None):
"""Initialize library with optional custom lockfile.
Args:
lockfile_path: Path to lockfile. If None, uses library's bundled lockfile.
Users can override to control model choices.
Raises:
ValueError: If lockfile missing required aliases
"""
# Use provided lockfile or library's default
lockfile = lockfile_path or DEFAULT_LOCKFILE
# Initialize LLMRing with explicit lockfile
self.ring = LLMRing(lockfile_path=lockfile)
# Validate required aliases exist (fail fast with clear error)
self.ring.require_aliases(REQUIRED_ALIASES, context="my-library")
def summarize(self, text: str) -> str:
"""Summarize text using 'summarizer' alias."""
response = self.ring.chat("summarizer", messages=[
{"role": "user", "content": f"Summarize: {text}"}
])
return response.content
Library's lockfile (my-library/src/my_library/llmring.lock):
version = "1.0"
default_profile = "default"
[profiles.default]
name = "default"
[[profiles.default.bindings]]
alias = "summarizer"
models = [
"anthropic:claude-3-5-haiku-20241022",
"openai:gpt-4o-mini",
"google:gemini-1.5-flash"
]
User Override Pattern
Users can use library defaults:
from my_library import MyLibrary
# Uses library's bundled lockfile automatically
lib = MyLibrary()
result = lib.summarize("Some text")
Or override with their own lockfile:
# Create custom lockfile: ./my-llmring.lock
# [profiles.default]
# [[profiles.default.bindings]]
# alias = "summarizer"
# models = ["anthropic:claude-3-5-sonnet-20241022", "openai:gpt-4o"]
# Use custom lockfile
lib = MyLibrary(lockfile_path="./my-llmring.lock")
result = lib.summarize("Some text")
Library Composition
When Library B uses Library A, pass the same lockfile to both:
# library-b/src/library_b/__init__.py
from pathlib import Path
from llmring import LLMRing
from library_a import LibraryA
DEFAULT_LOCKFILE = Path(__file__).parent / "llmring.lock"
REQUIRED_ALIASES = ["analyzer"]
class LibraryB:
def __init__(self, lockfile_path=None):
"""Initialize Library B (which uses Library A).
Args:
lockfile_path: Lockfile controlling models for both libraries.
Must include aliases required by both Library A and Library B.
"""
lockfile = lockfile_path or DEFAULT_LOCKFILE
# Pass lockfile to Library A (controls Library A's model choices)
self.lib_a = LibraryA(lockfile_path=lockfile)
# Initialize our own LLMRing with same lockfile
self.ring = LLMRing(lockfile_path=lockfile)
self.ring.require_aliases(REQUIRED_ALIASES, context="library-b")
def analyze(self, text: str):
# Use Library A (which uses our lockfile)
summary = self.lib_a.summarize(text)
# Do our own analysis
analysis = self.ring.chat("analyzer", messages=[...])
return {"summary": summary, "analysis": analysis}
Library B's lockfile must include aliases for both libraries:
# library-b/src/library_b/llmring.lock
[profiles.default]
name = "default"
# Library A's requirement (we choose which model)
[[profiles.default.bindings]]
alias = "summarizer"
models = [
"anthropic:claude-3-5-sonnet-20241022",
"openai:gpt-4o"
]
# Library B's requirement
[[profiles.default.bindings]]
alias = "analyzer"
models = [
"openai:gpt-4o",
"google:gemini-1.5-pro"
]
Users can override the entire chain:
# User's lockfile with their preferred models for BOTH libraries
lib_b = LibraryB(lockfile_path="./user-models.lock")
# This lockfile controls both Library A and Library B
Validation Helpers
LLMRing provides validation helpers for library authors:
from llmring import LLMRing
ring = LLMRing(lockfile_path="./my.lock")
# Check if alias exists (returns bool, never raises)
if ring.has_alias("summarizer"):
# Safe to use
response = ring.chat("summarizer", messages=[...])
# Validate required aliases (raises ValueError with helpful message if missing)
ring.require_aliases(
["summarizer", "analyzer"],
context="my-library" # Included in error message
)
# Raises: "Lockfile missing required aliases for my-library: analyzer.
# Lockfile path: /path/to/lockfile.lock
# Please ensure your lockfile defines these aliases."
Packaging Lockfiles
Include lockfiles in your package distribution:
pyproject.toml:
[tool.hatch.build]
include = [
"src/my_library/**/*.py",
"src/my_library/**/*.lock", # Include lockfiles
]
Or with setuptools in MANIFEST.in:
include src/my_library/*.lock
Library Best Practices
- Ship with bundled lockfile - Include your defaults in the package
- Accept
lockfile_pathparameter - Let users override everything - Validate early - Use
require_aliases()in__init__ - Document requirements - List required aliases in README
- Use semantic names - Aliases like "summarizer" are clearer than model IDs
- Pass lockfile down - When using other libraries, pass your lockfile to them
Profiles: Environment-Specific Configurations
LLMRing supports profiles to manage different model configurations for different environments (dev, staging, prod, etc.):
# Use different models based on environment
# Development: Use cheaper/faster models
# Production: Use higher-quality models
# Set profile via environment variable
export LLMRING_PROFILE=dev # or prod, staging, etc.
# Or specify profile in code
async with LLMRing() as service:
# Uses 'dev' profile bindings
response = await service.chat(request, profile="dev")
Profile Configuration in Lockfiles:
# llmring.lock (truncated for brevity)
version = "1.0"
default_profile = "default"
[profiles.default]
name = "default"
[[profiles.default.bindings]]
alias = "assistant"
models = ["anthropic:claude-3-5-sonnet-20241022"]
[profiles.dev]
name = "dev"
[[profiles.dev.bindings]]
alias = "assistant"
models = ["openai:gpt-4o-mini"] # Cheaper for development
[profiles.test]
name = "test"
[[profiles.test.bindings]]
alias = "assistant"
models = ["ollama:llama3"] # Local model for testing
Using Profiles with CLI:
# Bind aliases to specific profiles
llmring bind assistant "openai:gpt-4o-mini" --profile dev
llmring bind assistant "anthropic:claude-3-5-sonnet-20241022" --profile prod
# List aliases in a profile
llmring aliases --profile dev
# Use profile for chat
llmring chat "Hello" --profile dev
# Set default profile via environment
export LLMRING_PROFILE=dev
llmring chat "Hello" # Now uses dev profile
Profile Selection Priority:
- Explicit parameter:
profile="dev"or--profile dev(highest priority) - Environment variable:
LLMRING_PROFILE=dev - Default:
defaultprofile (if not specified)
Common Use Cases:
- Development: Use cheaper models to reduce costs during development
- Testing: Use local models (Ollama) or mock responses
- Staging: Use production models but with different rate limits
- Production: Use highest quality models for best user experience
- A/B Testing: Test different models for the same alias
Fallback Models
Aliases can specify multiple models for automatic failover:
# In llmring.lock
[profiles.default]
name = "default"
[[profiles.default.bindings]]
alias = "assistant"
models = [
"anthropic:claude-3-5-sonnet-20241022", # Primary
"openai:gpt-4o", # First fallback
"google:gemini-1.5-pro" # Second fallback
]
If the primary model fails (rate limit, availability, etc.), LLMRing automatically tries the fallbacks.
Advanced: Direct Model References
While aliases are recommended, you can still use direct provider:model references when needed:
# Direct model reference (escape hatch)
request = LLMRequest(
model="anthropic:claude-3-5-sonnet", # Direct provider:model reference
messages=[Message(role="user", content="Hello")]
)
# Or specify exact model versions
request = LLMRequest(
model="openai:gpt-4o", # Specific model version when needed
messages=[Message(role="user", content="Hello")]
)
Terminology:
- Alias: Semantic name like
fast,balanced,deep(recommended) - Model Reference: Full
provider:modelformat likeopenai:gpt-4o(escape hatch) - Raw SDK Access: Bypassing LLMRing entirely using provider clients directly (see Provider Guide)
Recommendation: Use aliases for maintainability and cost optimization. Use direct model references only when you need a specific model version or provider-specific features.
Raw SDK Access
When you need direct access to the underlying SDKs:
# Access provider SDK clients directly
openai_client = service.get_provider("openai").client # openai.AsyncOpenAI
anthropic_client = service.get_provider("anthropic").client # anthropic.AsyncAnthropic
google_client = service.get_provider("google").client # google.genai.Client
ollama_client = service.get_provider("ollama").client # ollama.AsyncClient
# Use SDK features not exposed by LLMRing
response = await openai_client.chat.completions.create(
model="fast", # Use alias or provider:model format when needed
messages=[{"role": "user", "content": "Hello"}],
logprobs=True,
top_logprobs=10,
parallel_tool_calls=False,
# Any OpenAI parameter
)
# Anthropic with all SDK features
response = await anthropic_client.messages.create(
model="balanced", # Use alias or provider:model format when needed
messages=[{"role": "user", "content": "Hello"}],
max_tokens=100,
top_p=0.9,
top_k=40,
system=[{
"type": "text",
"text": "You are helpful",
"cache_control": {"type": "ephemeral"}
}]
)
# Google with native SDK features
response = google_client.models.generate_content(
model="balanced", # Use alias or provider:model format when needed
contents="Hello",
generation_config={
"temperature": 0.7,
"top_p": 0.8,
"top_k": 40,
"candidate_count": 3
},
safety_settings=[{
"category": "HARM_CATEGORY_HARASSMENT",
"threshold": "BLOCK_MEDIUM_AND_ABOVE"
}]
)
When to use raw clients:
- SDK features not exposed by LLMRing
- Provider-specific optimizations
- Complex configurations
- Performance-critical applications
Provider Support
| Provider | Models | Streaming | Tools | Special Features |
|---|---|---|---|---|
| OpenAI | GPT-4o, GPT-4o-mini, o1 | Yes | Native | JSON schema, PDF processing |
| Anthropic | Claude 3.5 Sonnet/Haiku | Yes | Native | Prompt caching, large context |
| Gemini 1.5/2.0 Pro/Flash | Yes | Native | Multimodal, 2M+ context | |
| Ollama | Llama, Mistral, etc. | Yes | Prompt-based | Local models, custom options |
Setup
Environment Variables
# Add to your .env file
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
GOOGLE_GEMINI_API_KEY=AIza...
# Optional
OLLAMA_BASE_URL=http://localhost:11434 # Default
Conversational Setup
# Create optimized configuration with AI advisor
llmring lock chat
# This opens an interactive chat where you can describe your needs
# and get personalized recommendations based on the registry
Dependencies
# Required for specific providers
pip install openai>=1.0 # OpenAI
pip install anthropic>=0.67 # Anthropic
pip install google-genai # Google Gemini
pip install ollama>=0.4 # Ollama
MCP Integration
from llmring.mcp.client import create_enhanced_llm
# Create MCP-enabled LLM with tools
llm = await create_enhanced_llm(
model="fast",
mcp_server_path="path/to/mcp/server"
)
# Now has access to MCP tools
response = await llm.chat([
Message(role="user", content="Use available tools to help me")
])
Documentation
- Lockfile Documentation - Complete guide to lockfiles, aliases, and profiles
- Conversational Lockfile - Natural language lockfile management
- MCP Integration - Model Context Protocol and chat client
- API Reference - Core API documentation
- Provider Guide - Provider-specific features
- Structured Output - Unified JSON schema support
- File Utilities - Vision and multimodal file handling
- CLI Reference - Command-line interface guide
- Receipts & Cost Tracking - On-demand receipt generation and cost tracking
- Migration to On-Demand Receipts - Upgrade guide from automatic to on-demand receipts
- Examples - Working code examples:
- Quick Start - Basic usage patterns
- MCP Chat - MCP integration
- Streaming - Streaming with tools
Development
# Install for development
uv sync --group dev
# Run tests
uv run pytest
# Lint and format
uv run ruff check src/
uv run ruff format src/
Error Handling
LLMRing uses typed exceptions for better error handling:
from llmring.exceptions import (
ProviderAuthenticationError,
ModelNotFoundError,
ProviderRateLimitError,
ProviderTimeoutError
)
try:
response = await service.chat(request)
except ProviderAuthenticationError:
print("Invalid API key")
except ModelNotFoundError:
print("Model not supported")
except ProviderRateLimitError as e:
print(f"Rate limited, retry after {e.retry_after}s")
Key Features Summary
- Unified Interface: Switch providers without code changes
- Performance: Streaming, prompt caching, optimized requests
- Reliability: Circuit breakers, retries, typed error handling
- Observability: Cost tracking, on-demand receipt generation, batch certification
- Flexibility: Provider-specific features and raw SDK access
- Standards: Type-safe, well-tested
License
MIT License - see LICENSE file for details.
Contributing
- Fork the repository
- Create a feature branch
- Add tests for your changes
- Ensure all tests pass:
uv run pytest - Submit a pull request
Examples
See the examples/ directory for complete working examples:
- Basic chat and streaming
- Tool calling and function execution
- Provider-specific features
- MCP integration
- On-demand receipt generation and cost tracking
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file llmring-1.3.0.tar.gz.
File metadata
- Download URL: llmring-1.3.0.tar.gz
- Upload date:
- Size: 228.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.7.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d79e4c54db00940c8cf1ee7cc31d5626ac194f1dca8a74cd1db495fb92fb6983
|
|
| MD5 |
a822e1d41cd5db3bcf1a5f19b613fda0
|
|
| BLAKE2b-256 |
32aa710d759bb8d01d9f41aca2f4b84a91c7b858757d3af285f9a77dadf0bbb4
|
File details
Details for the file llmring-1.3.0-py3-none-any.whl.
File metadata
- Download URL: llmring-1.3.0-py3-none-any.whl
- Upload date:
- Size: 276.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.7.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f7c84e193c681e1985b71a632f0e0e488bad0947486c6cd97f9b6cbca8da5f75
|
|
| MD5 |
ee766b1133baf14470d8d81caf4df170
|
|
| BLAKE2b-256 |
9ec25b5530e1176e7a086bf579836918b3b1e1f8657e5f50e31672c3af42d752
|