Unified Python interface for OpenAI, Anthropic, Google, and Ollama LLMs

These details have not been verified by PyPI

Project links

Project description

LLMRing

A comprehensive Python library for LLM integration with unified interface, advanced features, and MCP support. Supports OpenAI, Anthropic, Google Gemini, and Ollama with consistent APIs.

✨ Key Features

🔄 Unified Interface: Single API for all major LLM providers
⚡ Streaming Support: Real streaming for all providers (not simulated)
🛠️ Native Tool Calling: Provider-native function calling with consistent interface
📋 Unified Structured Output: JSON schema works across all providers with automatic adaptation
🧠 Intelligent Configuration: AI-powered lockfile creation with registry analysis
📋 Smart Aliases: Always-current semantic aliases (deep, fast, balanced) via intelligent recommendations
💰 Cost Tracking: Automatic cost calculation and receipt generation
🎯 Registry Integration: Centralized model capabilities and pricing
🔧 Advanced Features:
- OpenAI: JSON schema, o1 models, PDF processing
- Anthropic: Prompt caching (90% cost savings)
- Google: Native function calling, multimodal, 2M+ context
- Ollama: Local models, streaming, custom options
🔒 Type Safety: Comprehensive typed exceptions and error handling
🌐 MCP Integration: Model Context Protocol support for tool ecosystems
💬 MCP Chat Client: Generic chat interface with persistent history for any MCP server

🚀 Quick Start

Installation

# With uv (recommended)
uv add llmring

# With pip
pip install llmring

Basic Usage

from llmring.service import LLMRing
from llmring.schemas import LLMRequest, Message

# Initialize service with context manager (auto-closes resources)
async with LLMRing() as service:
    # Simple chat
    request = LLMRequest(
        model="fast",
        messages=[
            Message(role="system", content="You are a helpful assistant."),
            Message(role="user", content="Hello!")
        ]
    )

    response = await service.chat(request)
    print(response.content)

Streaming

async with LLMRing() as service:
    # Real streaming for all providers
    request = LLMRequest(
        model="balanced",
        messages=[Message(role="user", content="Count to 10")],
        stream=True
    )

    async for chunk in await service.chat(request):
        print(chunk.delta, end="", flush=True)

Tool Calling

async with LLMRing() as service:
    tools = [{
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get weather for a location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {"type": "string"}
                },
                "required": ["location"]
            }
        }
    }]

    request = LLMRequest(
        model="balanced",
        messages=[Message(role="user", content="What's the weather in NYC?")],
        tools=tools
    )

    response = await service.chat(request)
    if response.tool_calls:
        print("Function called:", response.tool_calls[0]["function"]["name"])

📚 Resource Management

Context Manager (Recommended)

# Automatic resource cleanup with context manager
async with LLMRing() as service:
    response = await service.chat(request)
    # Resources are automatically cleaned up when exiting the context

Manual Cleanup

# Manual resource management
service = LLMRing()
try:
    response = await service.chat(request)
finally:
    await service.close()  # Ensure resources are cleaned up

🔧 Advanced Features

🎯 Unified Structured Output (All Providers)

# Same JSON schema API works across ALL providers!
request = LLMRequest(
    model="balanced",  # Works with any provider
    messages=[Message(role="user", content="Generate a person")],
    response_format={
        "type": "json_schema",
        "json_schema": {
            "name": "person",
            "schema": {
                "type": "object",
                "properties": {
                    "name": {"type": "string"},
                    "age": {"type": "integer"},
                    "email": {"type": "string"}
                },
                "required": ["name", "age"]
            }
        },
        "strict": True  # Validates across all providers
    }
)

response = await service.chat(request)
print("JSON:", response.content)   # Valid JSON string
print("Data:", response.parsed)    # Python dict ready to use

Provider-Specific Parameters

# Anthropic: Prompt caching for 90% cost savings
request = LLMRequest(
    model="balanced",
    messages=[
        Message(
            role="system",
            content="Very long system prompt...",  # 1024+ tokens
            metadata={"cache_control": {"type": "ephemeral"}}
        ),
        Message(role="user", content="Hello")
    ]
)

# Extra parameters for provider-specific features
request = LLMRequest(
    model="fast",
    messages=[Message(role="user", content="Hello")],
    extra_params={
        "logprobs": True,
        "top_logprobs": 5,
        "presence_penalty": 0.1,
        "seed": 12345
    }
)

🧠 Intelligent Model Aliases

LLMRing features intelligent lockfile creation that analyzes the current registry and recommends optimal aliases:

# Interactive mode - answers prompts about your needs
llmring lock init --interactive

# With requirements from a file
llmring lock init --interactive --requirements-file requirements.txt

# With requirements directly in command
llmring lock init --interactive --requirements "I need cost-effective models for coding"

# Analyze your configuration
llmring lock analyze

# Optimize existing lockfile
llmring lock optimize

Interactive Mode prompts you for:

Use cases: What you'll primarily use LLMs for (coding, writing, analysis, etc.)
Budget preference: Low cost, balanced, or maximum performance
Capabilities: Vision, function calling, or auto-detect from registry
Usage volume: Expected monthly request volume
Custom aliases: Specific aliases you want (fast, deep, coder, writer, vision, etc.)

# Use semantic aliases (always current, registry-based)
request = LLMRequest(
    model="deep",      # → most capable reasoning model
    model="fast",      # → cost-effective quick responses
    model="balanced",  # → optimal all-around model
    model="advisor",   # → Claude Opus 4.1 - powers intelligent lockfile creation
    messages=[Message(role="user", content="Hello")]
)

Key Benefits:

Always current: Aliases point to latest registry models, not outdated hardcoded ones
Intelligent selection: AI advisor analyzes registry and recommends optimal configuration
Cost-aware: Transparent cost analysis and recommendations
Self-hosted: Uses LLMRing's own API to power intelligent lockfile creation

🚪 Advanced: Direct Model Access

While aliases are recommended, you can still use direct provider:model format when needed:

# Direct model specification (escape hatch)
request = LLMRequest(
    model="anthropic:claude-3-5-sonnet",  # Direct provider:model format
    messages=[Message(role="user", content="Hello")]
)

# Or mix aliases with direct models
request = LLMRequest(
    model="openai:gpt-4o",  # Specific model when needed
    messages=[Message(role="user", content="Hello")]
)

Recommendation: Use aliases for maintainability and cost optimization. Use direct model strings only when you need a specific model version or provider-specific features.

🚪 Raw SDK Access (Escape Hatch)

When you need the full power of the underlying SDKs:

# Access any provider's raw client for maximum SDK features
openai_client = service.get_provider("openai").client      # openai.AsyncOpenAI
anthropic_client = service.get_provider("anthropic").client # anthropic.AsyncAnthropic
google_client = service.get_provider("google").client       # google.genai.Client
ollama_client = service.get_provider("ollama").client       # ollama.AsyncClient

# Use any SDK feature not exposed by LLMRing
response = await openai_client.chat.completions.create(
    model="fast",  # Use alias or provider:model format when needed
    messages=[{"role": "user", "content": "Hello"}],
    logprobs=True,
    top_logprobs=10,
    parallel_tool_calls=False,
    # Any OpenAI parameter
)

# Anthropic with all SDK features
response = await anthropic_client.messages.create(
    model="balanced",  # Use alias or provider:model format when needed
    messages=[{"role": "user", "content": "Hello"}],
    max_tokens=100,
    top_p=0.9,
    top_k=40,
    system=[{
        "type": "text",
        "text": "You are helpful",
        "cache_control": {"type": "ephemeral"}
    }]
)

# Google with native SDK features
response = google_client.models.generate_content(
    model="balanced",  # Use alias or provider:model format when needed
    contents="Hello",
    generation_config={
        "temperature": 0.7,
        "top_p": 0.8,
        "top_k": 40,
        "candidate_count": 3
    },
    safety_settings=[{
        "category": "HARM_CATEGORY_HARASSMENT",
        "threshold": "BLOCK_MEDIUM_AND_ABOVE"
    }]
)

When to use raw clients:

Advanced SDK features not in LLMRing
Provider-specific optimizations
Complex configurations
Performance-critical applications

🌐 Provider Support

Provider	Models	Streaming	Tools	Special Features
OpenAI	GPT-4o, GPT-4o-mini, o1	✅ Real	✅ Native	JSON schema, PDF processing
Anthropic	Claude 3.5 Sonnet/Haiku	✅ Real	✅ Native	Prompt caching, large context
Google	Gemini 1.5/2.0 Pro/Flash	✅ Real	✅ Native	Multimodal, 2M+ context
Ollama	Llama, Mistral, etc.	✅ Real	🔧 Prompt	Local models, custom options

📦 Setup

Environment Variables

# Add to your .env file
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
GOOGLE_GEMINI_API_KEY=AIza...

# Optional
OLLAMA_BASE_URL=http://localhost:11434  # Default

Intelligent Setup

# Create optimized configuration with AI advisor
llmring lock init --interactive

# The advisor analyzes the current registry and your API keys
# to recommend optimal model aliases for your workflow

Dependencies

# Required for specific providers
pip install openai>=1.0     # OpenAI
pip install anthropic>=0.67  # Anthropic
pip install google-genai    # Google Gemini
pip install ollama>=0.4     # Ollama

🔗 MCP Integration

from llmring.mcp.client.enhanced_llm import create_enhanced_llm

# Create MCP-enabled LLM with tool ecosystem
llm = await create_enhanced_llm(
    model="fast",
    mcp_server_path="path/to/mcp/server"
)

# Now has access to MCP tools
response = await llm.chat([
    Message(role="user", content="Use available tools to help me")
])

📚 Documentation

Provider Usage Guide - Provider-specific features and examples
API Reference - Detailed API documentation
Structured Output - Unified JSON schema across all providers
MCP Integration - Model Context Protocol guide
MCP Chat Client - Generic MCP chat client with persistent history
Conversational Lockfile - Natural language lockfile management
Examples - Working code examples

🧪 Development

# Install for development
uv sync --group dev

# Run tests
uv run pytest

# Lint and format
uv run ruff check src/
uv run ruff format src/

🛠️ Error Handling

LLMRing uses typed exceptions for better error handling:

from llmring.exceptions import (
    ProviderAuthenticationError,
    ModelNotFoundError,
    ProviderRateLimitError,
    ProviderTimeoutError
)

try:
    response = await service.chat(request)
except ProviderAuthenticationError:
    print("Invalid API key")
except ModelNotFoundError:
    print("Model not supported")
except ProviderRateLimitError as e:
    print(f"Rate limited, retry after {e.retry_after}s")

🎯 Key Benefits

🔄 Unified Interface: Switch providers without code changes
⚡ Performance: Real streaming, prompt caching, optimized requests
🛡️ Reliability: Circuit breakers, retries, typed error handling
📊 Observability: Cost tracking, usage analytics, receipt generation
🔧 Flexibility: Provider-specific features + raw SDK access
📏 Standards: Type-safe, well-tested, production-ready

📄 License

MIT License - see LICENSE file for details.

🤝 Contributing

Fork the repository
Create a feature branch
Add tests for your changes
Ensure all tests pass: uv run pytest
Submit a pull request

🌟 Examples

See the examples/ directory for complete working examples:

Basic chat and streaming
Tool calling and function execution
Provider-specific features
MCP integration
Cost tracking and receipts

LLMRing: The comprehensive LLM library for Python developers 🚀

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.4.0

Jan 3, 2026

1.3.0

Nov 2, 2025

1.2.0

Oct 26, 2025

1.1.1

Oct 14, 2025

1.1.0

Sep 29, 2025

This version

1.0.0

Sep 29, 2025

0.4.0

Sep 29, 2025

0.3.0

Aug 20, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llmring-1.0.0.tar.gz (180.9 kB view details)

Uploaded Sep 29, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

llmring-1.0.0-py3-none-any.whl (227.6 kB view details)

Uploaded Sep 29, 2025 Python 3

File details

Details for the file llmring-1.0.0.tar.gz.

File metadata

Download URL: llmring-1.0.0.tar.gz
Upload date: Sep 29, 2025
Size: 180.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.7.4

File hashes

Hashes for llmring-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`973b10ba5d2cf8ec64bcf451fa0cc9bf5b09713d95597f652fa315ee7bd3faa8`
MD5	`d4011ca96f30a9893bc237f979582a5c`
BLAKE2b-256	`430b7f37ee4639f9f9c3898ffe2d08b9778addddcc3faad6fda4ac1da8bcd87f`

See more details on using hashes here.

File details

Details for the file llmring-1.0.0-py3-none-any.whl.

File metadata

Download URL: llmring-1.0.0-py3-none-any.whl
Upload date: Sep 29, 2025
Size: 227.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.7.4

File hashes

Hashes for llmring-1.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`6402d8f43a702b875c18da0c8e44da9409a65133459378273e721c411d8f592d`
MD5	`850fc72034ae8350d0d567361bff1f22`
BLAKE2b-256	`dfebbbd6c5a8123456c51b7cda0a0bb302fe79c92ef29b99d4e6aee154013178`

See more details on using hashes here.

llmring 1.0.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

LLMRing

✨ Key Features

🚀 Quick Start

Installation

Basic Usage

Streaming

Tool Calling

📚 Resource Management

Context Manager (Recommended)

Manual Cleanup

🔧 Advanced Features

🎯 Unified Structured Output (All Providers)

Provider-Specific Parameters

🧠 Intelligent Model Aliases

🚪 Advanced: Direct Model Access

🚪 Raw SDK Access (Escape Hatch)

🌐 Provider Support

📦 Setup

Environment Variables

Intelligent Setup

Dependencies

🔗 MCP Integration

📚 Documentation

🧪 Development

🛠️ Error Handling

🎯 Key Benefits

📄 License

🤝 Contributing

🌟 Examples

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes