Skip to main content

Unified Python interface for OpenAI, Anthropic, Google, and Ollama LLMs

Project description

LLMRing

A comprehensive Python library for LLM integration with unified interface, advanced features, and MCP support. Supports OpenAI, Anthropic, Google Gemini, and Ollama with consistent APIs.

โœจ Key Features

  • ๐Ÿ”„ Unified Interface: Single API for all major LLM providers
  • โšก Streaming Support: Real streaming for all providers (not simulated)
  • ๐Ÿ› ๏ธ Native Tool Calling: Provider-native function calling with consistent interface
  • ๐Ÿ“‹ Unified Structured Output: JSON schema works across all providers with automatic adaptation
  • ๐Ÿ“‹ Alias Management: Semantic model aliases via lockfile (deep, fast, balanced)
  • ๐Ÿ’ฐ Cost Tracking: Automatic cost calculation and receipt generation
  • ๐ŸŽฏ Registry Integration: Centralized model capabilities and pricing
  • ๐Ÿ”ง Advanced Features:
    • OpenAI: JSON schema, o1 models, PDF processing
    • Anthropic: Prompt caching (90% cost savings)
    • Google: Native function calling, multimodal, 2M+ context
    • Ollama: Local models, streaming, custom options
  • ๐Ÿ”’ Type Safety: Comprehensive typed exceptions and error handling
  • ๐ŸŒ MCP Integration: Model Context Protocol support for tool ecosystems

๐Ÿš€ Quick Start

Installation

# With uv (recommended)
uv add llmring

# With pip
pip install llmring

Basic Usage

from llmring.service import LLMRing
from llmring.schemas import LLMRequest, Message

# Initialize service (auto-detects API keys)
service = LLMRing()

# Simple chat
request = LLMRequest(
    model="openai:gpt-4o",
    messages=[
        Message(role="system", content="You are a helpful assistant."),
        Message(role="user", content="Hello!")
    ]
)

response = await service.chat(request)
print(response.content)

Streaming

# Real streaming for all providers
request = LLMRequest(
    model="anthropic:claude-3-5-sonnet",
    messages=[Message(role="user", content="Count to 10")],
    stream=True
)

async for chunk in await service.chat(request):
    print(chunk.delta, end="", flush=True)

Tool Calling

tools = [{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Get weather for a location",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {"type": "string"}
            },
            "required": ["location"]
        }
    }
}]

request = LLMRequest(
    model="google:gemini-1.5-pro",
    messages=[Message(role="user", content="What's the weather in NYC?")],
    tools=tools
)

response = await service.chat(request)
if response.tool_calls:
    print("Function called:", response.tool_calls[0]["function"]["name"])

๐Ÿ”ง Advanced Features

๐ŸŽฏ Unified Structured Output (All Providers)

# Same JSON schema API works across ALL providers!
request = LLMRequest(
    model="anthropic:claude-3-5-sonnet",  # Works with any provider
    messages=[Message(role="user", content="Generate a person")],
    response_format={
        "type": "json_schema",
        "json_schema": {
            "name": "person",
            "schema": {
                "type": "object",
                "properties": {
                    "name": {"type": "string"},
                    "age": {"type": "integer"},
                    "email": {"type": "string"}
                },
                "required": ["name", "age"]
            }
        },
        "strict": True  # Validates across all providers
    }
)

response = await service.chat(request)
print("JSON:", response.content)   # Valid JSON string
print("Data:", response.parsed)    # Python dict ready to use

Provider-Specific Parameters

# Anthropic: Prompt caching for 90% cost savings
request = LLMRequest(
    model="anthropic:claude-3-5-sonnet",
    messages=[
        Message(
            role="system",
            content="Very long system prompt...",  # 1024+ tokens
            metadata={"cache_control": {"type": "ephemeral"}}
        ),
        Message(role="user", content="Hello")
    ]
)

# Extra parameters for provider-specific features
request = LLMRequest(
    model="openai:gpt-4o",
    messages=[Message(role="user", content="Hello")],
    extra_params={
        "logprobs": True,
        "top_logprobs": 5,
        "presence_penalty": 0.1,
        "seed": 12345
    }
)

Model Aliases

# Initialize lockfile with smart defaults
llmring lock init

# Use semantic aliases instead of specific models
request = LLMRequest(
    model="deep",      # โ†’ claude-3-opus (powerful reasoning)
    model="fast",      # โ†’ gpt-4o-mini (quick responses)
    model="balanced",  # โ†’ claude-3-5-sonnet (best overall)
    messages=[Message(role="user", content="Hello")]
)

๐Ÿšช Raw SDK Access (Escape Hatch)

When you need the full power of the underlying SDKs:

# Access any provider's raw client for maximum SDK features
openai_client = service.get_provider("openai").client      # openai.AsyncOpenAI
anthropic_client = service.get_provider("anthropic").client # anthropic.AsyncAnthropic
google_client = service.get_provider("google").client       # google.genai.Client
ollama_client = service.get_provider("ollama").client       # ollama.AsyncClient

# Use any SDK feature not exposed by LLMRing
response = await openai_client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello"}],
    logprobs=True,
    top_logprobs=10,
    parallel_tool_calls=False,
    # Any OpenAI parameter
)

# Anthropic with all SDK features
response = await anthropic_client.messages.create(
    model="claude-3-5-sonnet-20241022",
    messages=[{"role": "user", "content": "Hello"}],
    max_tokens=100,
    top_p=0.9,
    top_k=40,
    system=[{
        "type": "text",
        "text": "You are helpful",
        "cache_control": {"type": "ephemeral"}
    }]
)

# Google with native SDK features
response = google_client.models.generate_content(
    model="gemini-1.5-pro",
    contents="Hello",
    generation_config={
        "temperature": 0.7,
        "top_p": 0.8,
        "top_k": 40,
        "candidate_count": 3
    },
    safety_settings=[{
        "category": "HARM_CATEGORY_HARASSMENT",
        "threshold": "BLOCK_MEDIUM_AND_ABOVE"
    }]
)

When to use raw clients:

  • Advanced SDK features not in LLMRing
  • Provider-specific optimizations
  • Complex configurations
  • Performance-critical applications

๐ŸŒ Provider Support

Provider Models Streaming Tools Special Features
OpenAI GPT-4o, GPT-4o-mini, o1 โœ… Real โœ… Native JSON schema, PDF processing
Anthropic Claude 3.5 Sonnet/Haiku โœ… Real โœ… Native Prompt caching, large context
Google Gemini 1.5/2.0 Pro/Flash โœ… Real โœ… Native Multimodal, 2M+ context
Ollama Llama, Mistral, etc. โœ… Real ๐Ÿ”ง Prompt Local models, custom options

๐Ÿ“ฆ Setup

Environment Variables

# Add to your .env file
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
GOOGLE_GEMINI_API_KEY=AIza...

# Optional
OLLAMA_BASE_URL=http://localhost:11434  # Default

Dependencies

# Required for specific providers
pip install openai>=1.0     # OpenAI
pip install anthropic>=0.67  # Anthropic
pip install google-genai    # Google Gemini
pip install ollama>=0.4     # Ollama

๐Ÿ”— MCP Integration

from llmring.mcp.client.enhanced_llm import create_enhanced_llm

# Create MCP-enabled LLM with tool ecosystem
llm = await create_enhanced_llm(
    model="openai:gpt-4o",
    mcp_server_path="path/to/mcp/server"
)

# Now has access to MCP tools
response = await llm.chat([
    Message(role="user", content="Use available tools to help me")
])

๐Ÿ“š Documentation

๐Ÿงช Development

# Install for development
uv sync --group dev

# Run tests
uv run pytest

# Lint and format
uv run ruff check src/
uv run ruff format src/

๐Ÿ› ๏ธ Error Handling

LLMRing uses typed exceptions for better error handling:

from llmring.exceptions import (
    ProviderAuthenticationError,
    ModelNotFoundError,
    ProviderRateLimitError,
    ProviderTimeoutError
)

try:
    response = await service.chat(request)
except ProviderAuthenticationError:
    print("Invalid API key")
except ModelNotFoundError:
    print("Model not supported")
except ProviderRateLimitError as e:
    print(f"Rate limited, retry after {e.retry_after}s")

๐ŸŽฏ Key Benefits

  • ๐Ÿ”„ Unified Interface: Switch providers without code changes
  • โšก Performance: Real streaming, prompt caching, optimized requests
  • ๐Ÿ›ก๏ธ Reliability: Circuit breakers, retries, typed error handling
  • ๐Ÿ“Š Observability: Cost tracking, usage analytics, receipt generation
  • ๐Ÿ”ง Flexibility: Provider-specific features + raw SDK access
  • ๐Ÿ“ Standards: Type-safe, well-tested, production-ready

๐Ÿ“„ License

MIT License - see LICENSE file for details.

๐Ÿค Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Add tests for your changes
  4. Ensure all tests pass: uv run pytest
  5. Submit a pull request

๐ŸŒŸ Examples

See the examples/ directory for complete working examples:

  • Basic chat and streaming
  • Tool calling and function execution
  • Provider-specific features
  • MCP integration
  • Cost tracking and receipts

LLMRing: The comprehensive LLM library for Python developers ๐Ÿš€

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llmring-0.4.0.tar.gz (155.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llmring-0.4.0-py3-none-any.whl (199.2 kB view details)

Uploaded Python 3

File details

Details for the file llmring-0.4.0.tar.gz.

File metadata

  • Download URL: llmring-0.4.0.tar.gz
  • Upload date:
  • Size: 155.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.7.4

File hashes

Hashes for llmring-0.4.0.tar.gz
Algorithm Hash digest
SHA256 d2724b35b02586f0112781458e8afbc8c9c3246433a23dd6af6830624fd6ea54
MD5 2a1bbb6ee44b7e4fa6e0245467336499
BLAKE2b-256 de1d89c6e5e84df97fad90d51f9dd357ca5b4f855ca83385471ab877d453f1b6

See more details on using hashes here.

File details

Details for the file llmring-0.4.0-py3-none-any.whl.

File metadata

  • Download URL: llmring-0.4.0-py3-none-any.whl
  • Upload date:
  • Size: 199.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.7.4

File hashes

Hashes for llmring-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 9e1d8d9fef199223cb48ef4407fdc4e958ac08fdde1f5c16a7d1f6ff35ad898d
MD5 f95d248cb8de4260fe4b0df9f8b3940d
BLAKE2b-256 d474bf6abbef805df1b41f93312438431ec686ede9f354710f8ac753e27daad1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page