Add your description here

Project description

chuk_llm

A unified, production-ready Python library for Large Language Model (LLM) providers with real-time streaming, function calling, middleware support, and comprehensive provider management.

🚀 Features

Multi-Provider Support

OpenAI - GPT-4, GPT-3.5 with full API support
Anthropic - Claude 3.5 Sonnet, Claude 3 Haiku
Google Gemini - Gemini 2.0 Flash, Gemini 1.5 Pro
Groq - Lightning-fast inference with Llama models
Perplexity - Real-time web search with Sonar models
Ollama - Local model deployment and management

Core Capabilities

🌊 Real-time Streaming - True streaming without buffering
🛠️ Function Calling - Standardized tool/function execution
🔧 Middleware Stack - Logging, metrics, caching, retry logic
📊 Performance Monitoring - Built-in benchmarking and metrics
🔄 Error Handling - Automatic retries with exponential backoff
🎯 Type Safety - Full Pydantic validation and type hints
🧩 Extensible Architecture - Easy to add new providers

Advanced Features

Vision Support - Image analysis across compatible providers
JSON Mode - Structured output generation
Real-time Web Search - Live information retrieval with citations
Parallel Function Calls - Execute multiple tools simultaneously
Connection Pooling - Efficient HTTP connection management
Configuration Management - Environment-based provider setup
Capability Detection - Automatic feature detection per provider

📦 Installation

pip install chuk_llm

Optional Dependencies

# For all providers
pip install chuk_llm[all]

# For specific providers
pip install chuk_llm[openai]       # OpenAI support
pip install chuk_llm[anthropic]    # Anthropic support  
pip install chuk_llm[google]       # Google Gemini support
pip install chuk_llm[groq]         # Groq support
pip install chuk_llm[perplexity]   # Perplexity support
pip install chuk_llm[ollama]       # Ollama support

🚀 Quick Start

Basic Usage

import asyncio
from chuk_llm.llm.llm_client import get_llm_client

async def main():
    # Get a client for any provider
    client = get_llm_client("openai", model="gpt-4o-mini")
    
    # Simple completion
    response = await client.create_completion([
        {"role": "user", "content": "Hello! How are you?"}
    ])
    
    print(response["response"])

asyncio.run(main())

Perplexity Web Search Example

async def perplexity_search_example():
    # Use Perplexity for real-time web information
    client = get_llm_client("perplexity", model="sonar-pro")
    
    messages = [
        {"role": "user", "content": "What are the latest developments in AI today?"}
    ]
    
    response = await client.create_completion(messages)
    print(response["response"])  # Includes real-time web search results with citations

asyncio.run(perplexity_search_example())

Streaming Responses

async def streaming_example():
    client = get_llm_client("openai", model="gpt-4o-mini")
    
    messages = [
        {"role": "user", "content": "Write a short story about AI"}
    ]
    
    async for chunk in client.create_completion(messages, stream=True):
        if chunk.get("response"):
            print(chunk["response"], end="", flush=True)

asyncio.run(streaming_example())

Function Calling

async def function_calling_example():
    client = get_llm_client("openai", model="gpt-4o-mini")
    
    # Define tools
    tools = [
        {
            "type": "function",
            "function": {
                "name": "get_weather",
                "description": "Get weather information",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "location": {"type": "string", "description": "City name"},
                        "units": {"type": "string", "enum": ["celsius", "fahrenheit"]}
                    },
                    "required": ["location"]
                }
            }
        }
    ]
    
    response = await client.create_completion(
        messages=[{"role": "user", "content": "What's the weather in Paris?"}],
        tools=tools
    )
    
    if response.get("tool_calls"):
        for tool_call in response["tool_calls"]:
            print(f"Function: {tool_call['function']['name']}")
            print(f"Arguments: {tool_call['function']['arguments']}")

asyncio.run(function_calling_example())

🔧 Configuration

Environment Variables

# API Keys
export OPENAI_API_KEY="your-openai-key"
export ANTHROPIC_API_KEY="your-anthropic-key"
export GOOGLE_API_KEY="your-google-key"
export GROQ_API_KEY="your-groq-key"
export PERPLEXITY_API_KEY="your-perplexity-key"

# Custom endpoints
export OPENAI_API_BASE="https://api.openai.com/v1"
export PERPLEXITY_API_BASE="https://api.perplexity.ai"
export OLLAMA_API_BASE="http://localhost:11434"

Provider Configuration

from chuk_llm.llm.configuration.provider_config import ProviderConfig

# Custom configuration
config = ProviderConfig({
    "openai": {
        "api_key": "your-key",
        "api_base": "https://custom-endpoint.com",
        "default_model": "gpt-4o"
    },
    "anthropic": {
        "api_key": "your-anthropic-key",
        "default_model": "claude-3-5-sonnet-20241022"
    },
    "perplexity": {
        "api_key": "your-perplexity-key",
        "default_model": "sonar-pro"
    }
})

client = get_llm_client("openai", config=config)

🛠️ Advanced Usage

Middleware Stack

from chuk_llm.llm.middleware import LoggingMiddleware, MetricsMiddleware
from chuk_llm.llm.core.enhanced_base import get_enhanced_llm_client

# Create client with middleware
client = get_enhanced_llm_client(
    provider="openai",
    model="gpt-4o-mini",
    enable_logging=True,
    enable_metrics=True,
    enable_caching=True
)

# Use normally - middleware runs automatically
response = await client.create_completion(messages)

# Access metrics
if hasattr(client, 'middleware_stack'):
    for middleware in client.middleware_stack.middlewares:
        if hasattr(middleware, 'get_metrics'):
            print(middleware.get_metrics())

Multi-Provider Chat

from chuk_llm.llm.features import multi_provider_chat

# Compare responses across providers
responses = await multi_provider_chat(
    message="Explain quantum computing",
    providers=["openai", "anthropic", "perplexity", "groq"],
    model_map={
        "openai": "gpt-4o-mini",
        "anthropic": "claude-3-5-sonnet-20241022",
        "perplexity": "sonar-pro",
        "groq": "llama-3.3-70b-versatile"
    }
)

for provider, response in responses.items():
    print(f"{provider}: {response[:100]}...")

Real-time Information with Perplexity

async def current_events_example():
    # Perplexity excels at current information
    client = get_llm_client("perplexity", model="sonar-reasoning-pro")
    
    messages = [
        {"role": "user", "content": "What are the latest tech industry layoffs this week?"}
    ]
    
    response = await client.create_completion(messages)
    print("Real-time information with citations:")
    print(response["response"])

asyncio.run(current_events_example())

Unified Interface

from chuk_llm.llm.features import UnifiedLLMInterface

# High-level interface
interface = UnifiedLLMInterface("openai", "gpt-4o-mini")

# Simple chat
response = await interface.simple_chat("Hello!")

# Chat with options
response = await interface.chat(
    messages=[{"role": "user", "content": "Explain AI"}],
    temperature=0.7,
    max_tokens=500,
    json_mode=True
)

System Prompt Generation

from chuk_llm.llm.system_prompt_generator import (
    SystemPromptGenerator, 
    PromptStyle, 
    PromptContext
)

# Create generator
generator = SystemPromptGenerator(PromptStyle.FUNCTION_FOCUSED)

# Define tools
tools = {
    "functions": [
        {
            "name": "calculate",
            "description": "Perform calculations",
            "parameters": {
                "type": "object",
                "properties": {
                    "expression": {"type": "string"}
                }
            }
        }
    ]
}

# Generate optimized prompt
prompt = generator.generate_for_provider(
    provider="openai",
    model="gpt-4o",
    tools=tools,
    user_instructions="You are a math tutor."
)

# Use in completion
messages = [
    {"role": "system", "content": prompt},
    {"role": "user", "content": "What is 15 * 23?"}
]

📊 Benchmarking

from benchmarks.llm_benchmark import LLMBenchmark

# Create benchmark
benchmark = LLMBenchmark()

# Test multiple providers
results = await benchmark.benchmark_multiple([
    ("openai", "gpt-4o-mini"),
    ("anthropic", "claude-3-5-sonnet-20241022"),
    ("perplexity", "sonar-pro"),
    ("groq", "llama-3.3-70b-versatile")
])

# Generate report
report = benchmark.generate_report(results)
print(report)

🔍 Provider Capabilities

from chuk_llm.llm.configuration.capabilities import PROVIDER_CAPABILITIES, Feature

# Check what a provider supports
openai_caps = PROVIDER_CAPABILITIES["openai"]
print(f"Supports streaming: {openai_caps.supports(Feature.STREAMING)}")
print(f"Supports vision: {openai_caps.supports(Feature.VISION)}")
print(f"Max context: {openai_caps.max_context_length}")

# Find best provider for requirements
from chuk_llm.llm.configuration.capabilities import CapabilityChecker

best = CapabilityChecker.get_best_provider({
    Feature.STREAMING, 
    Feature.TOOLS, 
    Feature.VISION
})
print(f"Best provider: {best}")

🌐 Provider Models

OpenAI

GPT-4 - gpt-4o, gpt-4o-mini, gpt-4-turbo
GPT-3.5 - gpt-3.5-turbo

Anthropic

Claude 3.5 - claude-3-5-sonnet-20241022, claude-3-5-haiku-20241022
Claude 3 - claude-3-opus-20240229, claude-3-sonnet-20240229

Google Gemini

Gemini 2.0 - gemini-2.0-flash-exp
Gemini 1.5 - gemini-1.5-pro, gemini-1.5-flash

Groq

Llama 3.3 - llama-3.3-70b-versatile
Llama 3.1 - llama-3.1-70b-versatile, llama-3.1-8b-instant
Mixtral - mixtral-8x7b-32768

Perplexity 🔍

Perplexity offers specialized models optimized for real-time web search and reasoning with citations.

Search Models (Online)

sonar-pro - Premier search model built on Llama 3.3 70B, optimized for answer quality and speed (1200 tokens/sec)
sonar - Cost-effective model for quick factual queries and current events
llama-3.1-sonar-small-128k-online - 8B parameter model with 128k context, web search enabled
llama-3.1-sonar-large-128k-online - 70B parameter model with 128k context, web search enabled

Reasoning Models

sonar-reasoning-pro - Expert reasoning with Chain of Thought (CoT) and search capabilities
sonar-reasoning - Fast real-time reasoning model for quick problem-solving

Research Models

sonar-research - Deep research model conducting exhaustive searches and comprehensive reports

Chat Models (No Search)

llama-3.1-sonar-small-128k-chat - 8B parameter chat model without web search
llama-3.1-sonar-large-128k-chat - 70B parameter chat model without web search

Ollama

Local Models - Any compatible GGUF model (Llama, Mistral, CodeLlama, etc.)

🏗️ Architecture

Core Components

BaseLLMClient - Abstract interface for all providers
MiddlewareStack - Request/response processing pipeline
ProviderConfig - Configuration management system
ConnectionPool - HTTP connection optimization
SystemPromptGenerator - Dynamic prompt generation

Provider Implementations

Each provider implements the BaseLLMClient interface with:

Standardized message format (ChatML)
Real-time streaming support
Function calling normalization
Error handling and retries

Middleware System

# Custom middleware example
from chuk_llm.llm.middleware import Middleware

class CustomMiddleware(Middleware):
    async def process_request(self, messages, tools=None, **kwargs):
        # Pre-process request
        return messages, tools, kwargs
    
    async def process_response(self, response, duration, is_streaming=False):
        # Post-process response
        return response

🧪 Testing & Diagnostics

# Extended streaming test
from diagnostics.streaming_extended import test_extended_streaming

await test_extended_streaming()

# Health check
from chuk_llm.llm.connection_pool import get_llm_health_status

health = await get_llm_health_status()
print(health)

📈 Performance

Streaming Performance

Zero-buffering streaming - Chunks delivered in real-time
Parallel requests - Multiple concurrent streams
Connection pooling - Reduced latency

Benchmarks

Provider Comparison (avg response time):
├── Groq: 0.8s (ultra-fast inference)
├── Perplexity: 1.0s (real-time search + generation)
├── OpenAI: 1.2s (balanced performance)
├── Anthropic: 1.5s (high quality)
├── Gemini: 1.8s (multimodal)
└── Ollama: 2.5s (local processing)

Real-time Web Search Performance

Perplexity's Sonar models deliver blazing fast search results at 1200 tokens per second, nearly 10x faster than comparable models like Gemini 2.0 Flash.

🔒 Security & Safety

API key management - Environment variable support
Request validation - Input sanitization
Error handling - No sensitive data leakage
Rate limiting - Built-in provider limit awareness
Tool name sanitization - Safe function calling

🤝 Contributing

Fork the repository
Create a feature branch
Add tests for new functionality
Ensure all tests pass
Submit a pull request

Adding New Providers

# Implement BaseLLMClient
class NewProviderClient(BaseLLMClient):
    def create_completion(self, messages, tools=None, *, stream=False, **kwargs):
        # Implementation here
        pass

# Add to provider config
DEFAULTS["newprovider"] = {
    "client": "chuk_llm.llm.providers.newprovider_client:NewProviderClient",
    "api_key_env": "NEWPROVIDER_API_KEY",
    "default_model": "default-model"
}

📚 Documentation

📄 License

MIT License - see LICENSE file for details.

🙏 Acknowledgments

OpenAI for the ChatML format and function calling standards
Anthropic for advanced reasoning capabilities
Google for multimodal AI innovations
Groq for ultra-fast inference
Perplexity for real-time web search and information retrieval
Ollama for local AI deployment

chuk_llm - Unified LLM interface for production applications

Project details

Release history Release notifications | RSS feed

0.20

Apr 23, 2026

0.19

Apr 23, 2026

0.18

Feb 18, 2026

0.17.1

Dec 14, 2025

0.17

Dec 14, 2025

0.16.3

Dec 8, 2025

0.16.2

Dec 7, 2025

0.16.1

Dec 7, 2025

0.16

Dec 7, 2025

0.15.1

Dec 8, 2025

0.15

Dec 6, 2025

0.14.3

Nov 24, 2025

0.14.2

Nov 24, 2025

0.14.1

Nov 24, 2025

0.14

Nov 23, 2025

0.12.7

Oct 14, 2025

0.12.6

Oct 14, 2025

0.12.5

Oct 14, 2025

0.12.4

Oct 14, 2025

0.12.3

Oct 13, 2025

0.12.2

Sep 2, 2025

0.12.1

Sep 2, 2025

0.12

Sep 1, 2025

0.11

Aug 31, 2025

0.10

Aug 29, 2025

0.9.9

Aug 13, 2025

0.9.8

Aug 13, 2025

0.9.7

Aug 13, 2025

0.9.6

Aug 12, 2025

0.9.5

Aug 12, 2025

0.9.4

Aug 12, 2025

0.9.3

Aug 12, 2025

0.9.2

Aug 12, 2025

0.9.1

Aug 8, 2025

0.9

Aug 7, 2025

0.8.13

Jul 17, 2025

0.8.12

Jul 17, 2025

0.8.11

Jul 17, 2025

0.8.10

Jul 17, 2025

0.8.9

Jul 16, 2025

0.8.8

Jul 16, 2025

0.8.7

Jul 16, 2025

0.8.6

Jul 16, 2025

0.8.5

Jul 16, 2025

0.8.4

Jul 15, 2025

0.8.3

Jul 9, 2025

0.8.2

Jul 4, 2025

0.8.1

Jun 30, 2025

0.8

Jun 24, 2025

0.7.1

Jun 30, 2025

0.7

Jun 22, 2025

0.6

Jun 21, 2025

0.5

Jun 21, 2025

0.4

Jun 20, 2025

0.3

Jun 17, 2025

0.2

Jun 17, 2025

0.1.9

Jun 13, 2025

This version

0.1.8

Jun 2, 2025

0.1.7

Jun 2, 2025

0.1.6

Jun 2, 2025

0.1.5

May 27, 2025

0.1.4

May 27, 2025

0.1.3

May 27, 2025

0.1.2

May 25, 2025

0.1.1

May 20, 2025

0.1.0

May 19, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

chuk_llm-0.1.8.tar.gz (59.7 kB view details)

Uploaded Jun 2, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

chuk_llm-0.1.8-py3-none-any.whl (69.3 kB view details)

Uploaded Jun 2, 2025 Python 3

File details

Details for the file chuk_llm-0.1.8.tar.gz.

File metadata

Download URL: chuk_llm-0.1.8.tar.gz
Upload date: Jun 2, 2025
Size: 59.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.2

File hashes

Hashes for chuk_llm-0.1.8.tar.gz
Algorithm	Hash digest
SHA256	`ae52a1781bc7ce5b6e908ae0cf2a01aea3566e0c95b11db27fe0f989761fc1ad`
MD5	`51d160133ff39087e40ca6fca491413d`
BLAKE2b-256	`e0d3e03fc70c2c09c643f90b3a2f8053d5c089459f54e560fc301245308da1b6`

See more details on using hashes here.

File details

Details for the file chuk_llm-0.1.8-py3-none-any.whl.

File metadata

Download URL: chuk_llm-0.1.8-py3-none-any.whl
Upload date: Jun 2, 2025
Size: 69.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.2

File hashes

Hashes for chuk_llm-0.1.8-py3-none-any.whl
Algorithm	Hash digest
SHA256	`3ab8426a8dedb1a17ea1cd1a058fad91b1fbd34e0eb8005d519c9724b3ed909f`
MD5	`36f05c67677176a8f7516219fb3e5da1`
BLAKE2b-256	`e6a85d3e1021daab306887e372612aace0f0b68336f21b6942c0ea9192756d33`

See more details on using hashes here.

chuk-llm 0.1.8

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

chuk_llm

🚀 Features

Multi-Provider Support

Core Capabilities

Advanced Features

📦 Installation

Optional Dependencies

🚀 Quick Start

Basic Usage

Perplexity Web Search Example

Streaming Responses

Function Calling

🔧 Configuration

Environment Variables

Provider Configuration

🛠️ Advanced Usage

Middleware Stack

Multi-Provider Chat

Real-time Information with Perplexity

Unified Interface

System Prompt Generation

📊 Benchmarking

🔍 Provider Capabilities

🌐 Provider Models

OpenAI

Anthropic

Google Gemini

Groq

Perplexity 🔍

Search Models (Online)

Reasoning Models

Research Models

Chat Models (No Search)

Ollama

🏗️ Architecture

Core Components

Provider Implementations

Middleware System

🧪 Testing & Diagnostics

📈 Performance

Streaming Performance

Benchmarks

Real-time Web Search Performance

🔒 Security & Safety

🤝 Contributing

Adding New Providers

📚 Documentation

📄 License

🙏 Acknowledgments

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes