Unified Python interface for OpenAI, Anthropic, Google, and Ollama LLMs

These details have not been verified by PyPI

Project links

Project description

LLMRing

A Python library for LLM integration with unified interface and MCP support. Supports OpenAI, Anthropic, Google Gemini, and Ollama with consistent APIs.

Features

Unified Interface: Single API for all major LLM providers
Streaming Support: Streaming for all providers
Native Tool Calling: Provider-native function calling with consistent interface
Unified Structured Output: JSON schema works across all providers with automatic adaptation
Conversational Configuration: MCP chat interface for natural language lockfile setup
Aliases: Semantic aliases (deep, fast, balanced) with registry-based recommendations
Cost Tracking: Cost calculation with on-demand receipt generation
Registry Integration: Centralized model capabilities and pricing
Fallback Models: Automatic failover to alternative models
Type Safety: Typed exceptions and error handling
MCP Integration: Model Context Protocol support for tool ecosystems
MCP Chat Client: Chat interface with persistent history for any MCP server

Quick Start

Installation

# With uv (recommended)
uv add llmring

# With pip
pip install llmring

Including Lockfiles in Your Package:

To ship your llmring.lock with your package (like llmring does), add to your pyproject.toml:

[tool.hatch.build]
include = [
    "src/yourpackage/**/*.py",
    "src/yourpackage/**/*.lock",  # Include lockfiles
]

Basic Usage

from llmring.service import LLMRing
from llmring.schemas import LLMRequest, Message

# Initialize service with context manager (auto-closes resources)
async with LLMRing() as service:
    # Simple chat
    request = LLMRequest(
        model="fast",
        messages=[
            Message(role="system", content="You are a helpful assistant."),
            Message(role="user", content="Hello!")
        ]
    )

    response = await service.chat(request)
    print(response.content)

Streaming

async with LLMRing() as service:
    # Streaming for all providers
    request = LLMRequest(
        model="balanced",
        messages=[Message(role="user", content="Count to 10")]
    )

    accumulated_usage = None
    async for chunk in service.chat_stream(request):
        print(chunk.content, end="", flush=True)
        # Capture final usage stats
        if chunk.usage:
            accumulated_usage = chunk.usage

    print()  # Newline after streaming
    if accumulated_usage:
        print(f"Tokens used: {accumulated_usage.get('total_tokens', 0)}")

Tool Calling

async with LLMRing() as service:
    tools = [{
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get weather for a location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {"type": "string"}
                },
                "required": ["location"]
            }
        }
    }]

    request = LLMRequest(
        model="balanced",
        messages=[Message(role="user", content="What's the weather in NYC?")],
        tools=tools
    )

    response = await service.chat(request)
    if response.tool_calls:
        print("Function called:", response.tool_calls[0]["function"]["name"])

Resource Management

Context Manager (Recommended)

from llmring import LLMRing, LLMRequest, Message

# Automatic resource cleanup with context manager
async with LLMRing() as service:
    request = LLMRequest(
        model="fast",
        messages=[Message(role="user", content="Hello!")]
    )
    response = await service.chat(request)
    # Resources are automatically cleaned up when exiting the context

Manual Cleanup

# Manual resource management
service = LLMRing()
try:
    response = await service.chat(request)
finally:
    await service.close()  # Ensure resources are cleaned up

Advanced Features

Unified Structured Output

# JSON schema API works across all providers
request = LLMRequest(
    model="balanced",  # Works with any provider
    messages=[Message(role="user", content="Generate a person")],
    response_format={
        "type": "json_schema",
        "json_schema": {
            "name": "person",
            "schema": {
                "type": "object",
                "properties": {
                    "name": {"type": "string"},
                    "age": {"type": "integer"},
                    "email": {"type": "string"}
                },
                "required": ["name", "age"]
            }
        },
        "strict": True  # Validates across all providers
    }
)

response = await service.chat(request)
print("JSON:", response.content)   # Valid JSON string
print("Data:", response.parsed)    # Python dict ready to use

Provider-Specific Parameters

# Anthropic: Prompt caching for 90% cost savings
request = LLMRequest(
    model="balanced",
    messages=[
        Message(
            role="system",
            content="Very long system prompt...",  # 1024+ tokens
            metadata={"cache_control": {"type": "ephemeral"}}
        ),
        Message(role="user", content="Hello")
    ]
)

# Extra parameters for provider-specific features
request = LLMRequest(
    model="fast",
    messages=[Message(role="user", content="Hello")],
    extra_params={
        "logprobs": True,
        "top_logprobs": 5,
        "presence_penalty": 0.1,
        "seed": 12345
    }
)

Model Aliases and Lockfiles

LLMRing uses lockfiles to map semantic aliases to models, with support for fallback models and environment-specific profiles:

# Initialize lockfile (explicit creation at current directory)
llmring lock init

# Conversational configuration with AI advisor (recommended)
llmring lock chat  # Natural language interface for lockfile management

# Analyze your configuration
llmring lock analyze

# View current aliases
llmring aliases

Lockfile Resolution Order:

Explicit path via lockfile_path parameter (file must exist)
LLMRING_LOCKFILE_PATH environment variable (file must exist)
./llmring.lock in current directory (if exists)
Bundled lockfile at src/llmring/llmring.lock (minimal fallback with advisor alias)

Packaging Your Own Lockfile: Libraries using LLMRing can ship with their own lockfiles. See Lockfile Documentation for details on:

Including lockfiles in your package distribution
Lockfile resolution order and precedence
Creating lockfiles with fallback models
Environment-specific profiles and configuration

Conversational Configuration via llmring lock chat:

Describe your requirements in natural language
Get AI-powered recommendations based on registry analysis
Configure aliases with multiple fallback models
Understand cost implications and tradeoffs
Set up environment-specific profiles

# Use semantic aliases (always current, with fallbacks)
request = LLMRequest(
    model="deep",      # → most capable reasoning model
    messages=[Message(role="user", content="Hello")]
)
# Or use other aliases:
# model="fast"      → cost-effective quick responses
# model="balanced"  → optimal all-around model
# model="advisor"   → Claude Opus 4.1 - powers conversational config

Key features:

Registry-based recommendations
Fallback models provide automatic failover
Cost analysis and recommendations
Environment-specific configurations for dev/staging/prod

Profiles: Environment-Specific Configurations

LLMRing supports profiles to manage different model configurations for different environments (dev, staging, prod, etc.):

# Use different models based on environment
# Development: Use cheaper/faster models
# Production: Use higher-quality models

# Set profile via environment variable
export LLMRING_PROFILE=dev  # or prod, staging, etc.

# Or specify profile in code
async with LLMRing() as service:
    # Uses 'dev' profile bindings
    response = await service.chat(request, profile="dev")

Profile Configuration in Lockfiles:

# llmring.lock - Different models per environment
[profiles.default]
[[profiles.default.bindings]]
alias = "assistant"
models = ["anthropic:claude-3-5-sonnet"]  # Production quality

[profiles.dev]
[[profiles.dev.bindings]]
alias = "assistant"
models = ["openai:gpt-4o-mini"]  # Cheaper for development

[profiles.test]
[[profiles.test.bindings]]
alias = "assistant"
models = ["ollama:llama3"]  # Local model for testing

Using Profiles with CLI:

# Bind aliases to specific profiles
llmring bind assistant "openai:gpt-4o-mini" --profile dev
llmring bind assistant "anthropic:claude-3-5-sonnet" --profile prod

# List aliases in a profile
llmring aliases --profile dev

# Use profile for chat
llmring chat "Hello" --profile dev

# Set default profile via environment
export LLMRING_PROFILE=dev
llmring chat "Hello"  # Now uses dev profile

Profile Selection Priority:

Explicit parameter: profile="dev" or --profile dev (highest priority)
Environment variable: LLMRING_PROFILE=dev
Default: default profile (if not specified)

Common Use Cases:

Development: Use cheaper models to reduce costs during development
Testing: Use local models (Ollama) or mock responses
Staging: Use production models but with different rate limits
Production: Use highest quality models for best user experience
A/B Testing: Test different models for the same alias

Fallback Models

Aliases can specify multiple models for automatic failover:

# In llmring.lock
[[bindings]]
alias = "assistant"
models = [
    "anthropic:claude-3-5-sonnet",  # Primary
    "openai:gpt-4o",                 # First fallback
    "google:gemini-1.5-pro"          # Second fallback
]

If the primary model fails (rate limit, availability, etc.), LLMRing automatically tries the fallbacks.

Advanced: Direct Model References

While aliases are recommended, you can still use direct provider:model references when needed:

# Direct model reference (escape hatch)
request = LLMRequest(
    model="anthropic:claude-3-5-sonnet",  # Direct provider:model reference
    messages=[Message(role="user", content="Hello")]
)

# Or specify exact model versions
request = LLMRequest(
    model="openai:gpt-4o",  # Specific model version when needed
    messages=[Message(role="user", content="Hello")]
)

Terminology:

Alias: Semantic name like fast, balanced, deep (recommended)
Model Reference: Full provider:model format like openai:gpt-4o (escape hatch)
Raw SDK Access: Bypassing LLMRing entirely using provider clients directly (see Provider Guide)

Recommendation: Use aliases for maintainability and cost optimization. Use direct model references only when you need a specific model version or provider-specific features.

Raw SDK Access

When you need direct access to the underlying SDKs:

# Access provider SDK clients directly
openai_client = service.get_provider("openai").client      # openai.AsyncOpenAI
anthropic_client = service.get_provider("anthropic").client # anthropic.AsyncAnthropic
google_client = service.get_provider("google").client       # google.genai.Client
ollama_client = service.get_provider("ollama").client       # ollama.AsyncClient

# Use SDK features not exposed by LLMRing
response = await openai_client.chat.completions.create(
    model="fast",  # Use alias or provider:model format when needed
    messages=[{"role": "user", "content": "Hello"}],
    logprobs=True,
    top_logprobs=10,
    parallel_tool_calls=False,
    # Any OpenAI parameter
)

# Anthropic with all SDK features
response = await anthropic_client.messages.create(
    model="balanced",  # Use alias or provider:model format when needed
    messages=[{"role": "user", "content": "Hello"}],
    max_tokens=100,
    top_p=0.9,
    top_k=40,
    system=[{
        "type": "text",
        "text": "You are helpful",
        "cache_control": {"type": "ephemeral"}
    }]
)

# Google with native SDK features
response = google_client.models.generate_content(
    model="balanced",  # Use alias or provider:model format when needed
    contents="Hello",
    generation_config={
        "temperature": 0.7,
        "top_p": 0.8,
        "top_k": 40,
        "candidate_count": 3
    },
    safety_settings=[{
        "category": "HARM_CATEGORY_HARASSMENT",
        "threshold": "BLOCK_MEDIUM_AND_ABOVE"
    }]
)

When to use raw clients:

SDK features not exposed by LLMRing
Provider-specific optimizations
Complex configurations
Performance-critical applications

Provider Support

Provider	Models	Streaming	Tools	Special Features
OpenAI	GPT-4o, GPT-4o-mini, o1	Yes	Native	JSON schema, PDF processing
Anthropic	Claude 3.5 Sonnet/Haiku	Yes	Native	Prompt caching, large context
Google	Gemini 1.5/2.0 Pro/Flash	Yes	Native	Multimodal, 2M+ context
Ollama	Llama, Mistral, etc.	Yes	Prompt-based	Local models, custom options

Setup

Environment Variables

# Add to your .env file
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
GOOGLE_GEMINI_API_KEY=AIza...

# Optional
OLLAMA_BASE_URL=http://localhost:11434  # Default

Conversational Setup

# Create optimized configuration with AI advisor
llmring lock chat

# This opens an interactive chat where you can describe your needs
# and get personalized recommendations based on the registry

Dependencies

# Required for specific providers
pip install openai>=1.0     # OpenAI
pip install anthropic>=0.67  # Anthropic
pip install google-genai    # Google Gemini
pip install ollama>=0.4     # Ollama

MCP Integration

from llmring.mcp.client import create_enhanced_llm

# Create MCP-enabled LLM with tools
llm = await create_enhanced_llm(
    model="fast",
    mcp_server_path="path/to/mcp/server"
)

# Now has access to MCP tools
response = await llm.chat([
    Message(role="user", content="Use available tools to help me")
])

Documentation

Lockfile Documentation - Complete guide to lockfiles, aliases, and profiles
Conversational Lockfile - Natural language lockfile management
MCP Integration - Model Context Protocol and chat client
API Reference - Core API documentation
Provider Guide - Provider-specific features
Structured Output - Unified JSON schema support
File Utilities - Vision and multimodal file handling
CLI Reference - Command-line interface guide
Receipts & Cost Tracking - On-demand receipt generation and cost tracking
Migration to On-Demand Receipts - Upgrade guide from automatic to on-demand receipts
Examples - Working code examples:
- Quick Start - Basic usage patterns
- MCP Chat - MCP integration
- Streaming - Streaming with tools

Development

# Install for development
uv sync --group dev

# Run tests
uv run pytest

# Lint and format
uv run ruff check src/
uv run ruff format src/

Error Handling

LLMRing uses typed exceptions for better error handling:

from llmring.exceptions import (
    ProviderAuthenticationError,
    ModelNotFoundError,
    ProviderRateLimitError,
    ProviderTimeoutError
)

try:
    response = await service.chat(request)
except ProviderAuthenticationError:
    print("Invalid API key")
except ModelNotFoundError:
    print("Model not supported")
except ProviderRateLimitError as e:
    print(f"Rate limited, retry after {e.retry_after}s")

Key Features Summary

Unified Interface: Switch providers without code changes
Performance: Streaming, prompt caching, optimized requests
Reliability: Circuit breakers, retries, typed error handling
Observability: Cost tracking, on-demand receipt generation, batch certification
Flexibility: Provider-specific features and raw SDK access
Standards: Type-safe, well-tested

License

MIT License - see LICENSE file for details.

Contributing

Fork the repository
Create a feature branch
Add tests for your changes
Ensure all tests pass: uv run pytest
Submit a pull request

Examples

See the examples/ directory for complete working examples:

Basic chat and streaming
Tool calling and function execution
Provider-specific features
MCP integration
On-demand receipt generation and cost tracking

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.4.0

Jan 3, 2026

1.3.0

Nov 2, 2025

1.2.0

Oct 26, 2025

This version

1.1.1

Oct 14, 2025

1.1.0

Sep 29, 2025

1.0.0

Sep 29, 2025

0.4.0

Sep 29, 2025

0.3.0

Aug 20, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llmring-1.1.1.tar.gz (212.0 kB view details)

Uploaded Oct 14, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

llmring-1.1.1-py3-none-any.whl (266.4 kB view details)

Uploaded Oct 14, 2025 Python 3

File details

Details for the file llmring-1.1.1.tar.gz.

File metadata

Download URL: llmring-1.1.1.tar.gz
Upload date: Oct 14, 2025
Size: 212.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.7.4

File hashes

Hashes for llmring-1.1.1.tar.gz
Algorithm	Hash digest
SHA256	`7155ae7e0acdb832acf8350604456be951befc6bd3d2f2f2bc2e252f5277f0f5`
MD5	`0cbf93aa74643eaf091f2125c2e0959c`
BLAKE2b-256	`94870b65912cdbcc3cbcc31757b0fe70ab81c16f57a2fdb559aeda0c9df9b940`

See more details on using hashes here.

File details

Details for the file llmring-1.1.1-py3-none-any.whl.

File metadata

Download URL: llmring-1.1.1-py3-none-any.whl
Upload date: Oct 14, 2025
Size: 266.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.7.4

File hashes

Hashes for llmring-1.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f44f984d546b2d3b639d79d8eb85b204b90ebc2d82c0df1e2b73c02b549e05b2`
MD5	`4e0531b8ae8fa0406f0b6e7e47988037`
BLAKE2b-256	`71ef188e3949d3a1737e3d14331cfcd03ac5c72fd065791dfa3b618e8bb77618`

See more details on using hashes here.

llmring 1.1.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

LLMRing

Features

Quick Start

Installation

Basic Usage

Streaming

Tool Calling

Resource Management

Context Manager (Recommended)

Manual Cleanup

Advanced Features

Unified Structured Output

Provider-Specific Parameters

Model Aliases and Lockfiles

Profiles: Environment-Specific Configurations

Fallback Models

Advanced: Direct Model References

Raw SDK Access

Provider Support

Setup

Environment Variables

Conversational Setup

Dependencies

MCP Integration

Documentation

Development

Error Handling

Key Features Summary

License

Contributing

Examples

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes