Add your description here
Project description
chuk_llm
A unified, production-ready Python library for Large Language Model (LLM) providers with real-time streaming, function calling, middleware support, and comprehensive provider management.
๐ Features
Multi-Provider Support
- OpenAI - GPT-4, GPT-3.5 with full API support
- Anthropic - Claude 3.5 Sonnet, Claude 3 Haiku
- Google Gemini - Gemini 2.0 Flash, Gemini 1.5 Pro
- Groq - Lightning-fast inference with Llama models
- Perplexity - Real-time web search with Sonar models
- Ollama - Local model deployment and management
Core Capabilities
- ๐ Real-time Streaming - True streaming without buffering
- ๐ ๏ธ Function Calling - Standardized tool/function execution
- ๐ง Middleware Stack - Logging, metrics, caching, retry logic
- ๐ Performance Monitoring - Built-in benchmarking and metrics
- ๐ Error Handling - Automatic retries with exponential backoff
- ๐ฏ Type Safety - Full Pydantic validation and type hints
- ๐งฉ Extensible Architecture - Easy to add new providers
Advanced Features
- Vision Support - Image analysis across compatible providers
- JSON Mode - Structured output generation
- Real-time Web Search - Live information retrieval with citations
- Parallel Function Calls - Execute multiple tools simultaneously
- Connection Pooling - Efficient HTTP connection management
- Configuration Management - Environment-based provider setup
- Capability Detection - Automatic feature detection per provider
๐ฆ Installation
pip install chuk_llm
Optional Dependencies
# For all providers
pip install chuk_llm[all]
# For specific providers
pip install chuk_llm[openai] # OpenAI support
pip install chuk_llm[anthropic] # Anthropic support
pip install chuk_llm[google] # Google Gemini support
pip install chuk_llm[groq] # Groq support
pip install chuk_llm[perplexity] # Perplexity support
pip install chuk_llm[ollama] # Ollama support
๐ Quick Start
Basic Usage
import asyncio
from chuk_llm.llm.llm_client import get_llm_client
async def main():
# Get a client for any provider
client = get_llm_client("openai", model="gpt-4o-mini")
# Simple completion
response = await client.create_completion([
{"role": "user", "content": "Hello! How are you?"}
])
print(response["response"])
asyncio.run(main())
Perplexity Web Search Example
async def perplexity_search_example():
# Use Perplexity for real-time web information
client = get_llm_client("perplexity", model="sonar-pro")
messages = [
{"role": "user", "content": "What are the latest developments in AI today?"}
]
response = await client.create_completion(messages)
print(response["response"]) # Includes real-time web search results with citations
asyncio.run(perplexity_search_example())
Streaming Responses
async def streaming_example():
client = get_llm_client("openai", model="gpt-4o-mini")
messages = [
{"role": "user", "content": "Write a short story about AI"}
]
async for chunk in client.create_completion(messages, stream=True):
if chunk.get("response"):
print(chunk["response"], end="", flush=True)
asyncio.run(streaming_example())
Function Calling
async def function_calling_example():
client = get_llm_client("openai", model="gpt-4o-mini")
# Define tools
tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get weather information",
"parameters": {
"type": "object",
"properties": {
"location": {"type": "string", "description": "City name"},
"units": {"type": "string", "enum": ["celsius", "fahrenheit"]}
},
"required": ["location"]
}
}
}
]
response = await client.create_completion(
messages=[{"role": "user", "content": "What's the weather in Paris?"}],
tools=tools
)
if response.get("tool_calls"):
for tool_call in response["tool_calls"]:
print(f"Function: {tool_call['function']['name']}")
print(f"Arguments: {tool_call['function']['arguments']}")
asyncio.run(function_calling_example())
๐ง Configuration
Environment Variables
# API Keys
export OPENAI_API_KEY="your-openai-key"
export ANTHROPIC_API_KEY="your-anthropic-key"
export GOOGLE_API_KEY="your-google-key"
export GROQ_API_KEY="your-groq-key"
export PERPLEXITY_API_KEY="your-perplexity-key"
# Custom endpoints
export OPENAI_API_BASE="https://api.openai.com/v1"
export PERPLEXITY_API_BASE="https://api.perplexity.ai"
export OLLAMA_API_BASE="http://localhost:11434"
Provider Configuration
from chuk_llm.llm.configuration.provider_config import ProviderConfig
# Custom configuration
config = ProviderConfig({
"openai": {
"api_key": "your-key",
"api_base": "https://custom-endpoint.com",
"default_model": "gpt-4o"
},
"anthropic": {
"api_key": "your-anthropic-key",
"default_model": "claude-3-5-sonnet-20241022"
},
"perplexity": {
"api_key": "your-perplexity-key",
"default_model": "sonar-pro"
}
})
client = get_llm_client("openai", config=config)
๐ ๏ธ Advanced Usage
Middleware Stack
from chuk_llm.llm.middleware import LoggingMiddleware, MetricsMiddleware
from chuk_llm.llm.core.enhanced_base import get_enhanced_llm_client
# Create client with middleware
client = get_enhanced_llm_client(
provider="openai",
model="gpt-4o-mini",
enable_logging=True,
enable_metrics=True,
enable_caching=True
)
# Use normally - middleware runs automatically
response = await client.create_completion(messages)
# Access metrics
if hasattr(client, 'middleware_stack'):
for middleware in client.middleware_stack.middlewares:
if hasattr(middleware, 'get_metrics'):
print(middleware.get_metrics())
Multi-Provider Chat
from chuk_llm.llm.features import multi_provider_chat
# Compare responses across providers
responses = await multi_provider_chat(
message="Explain quantum computing",
providers=["openai", "anthropic", "perplexity", "groq"],
model_map={
"openai": "gpt-4o-mini",
"anthropic": "claude-3-5-sonnet-20241022",
"perplexity": "sonar-pro",
"groq": "llama-3.3-70b-versatile"
}
)
for provider, response in responses.items():
print(f"{provider}: {response[:100]}...")
Real-time Information with Perplexity
async def current_events_example():
# Perplexity excels at current information
client = get_llm_client("perplexity", model="sonar-reasoning-pro")
messages = [
{"role": "user", "content": "What are the latest tech industry layoffs this week?"}
]
response = await client.create_completion(messages)
print("Real-time information with citations:")
print(response["response"])
asyncio.run(current_events_example())
Unified Interface
from chuk_llm.llm.features import UnifiedLLMInterface
# High-level interface
interface = UnifiedLLMInterface("openai", "gpt-4o-mini")
# Simple chat
response = await interface.simple_chat("Hello!")
# Chat with options
response = await interface.chat(
messages=[{"role": "user", "content": "Explain AI"}],
temperature=0.7,
max_tokens=500,
json_mode=True
)
System Prompt Generation
from chuk_llm.llm.system_prompt_generator import (
SystemPromptGenerator,
PromptStyle,
PromptContext
)
# Create generator
generator = SystemPromptGenerator(PromptStyle.FUNCTION_FOCUSED)
# Define tools
tools = {
"functions": [
{
"name": "calculate",
"description": "Perform calculations",
"parameters": {
"type": "object",
"properties": {
"expression": {"type": "string"}
}
}
}
]
}
# Generate optimized prompt
prompt = generator.generate_for_provider(
provider="openai",
model="gpt-4o",
tools=tools,
user_instructions="You are a math tutor."
)
# Use in completion
messages = [
{"role": "system", "content": prompt},
{"role": "user", "content": "What is 15 * 23?"}
]
๐ Benchmarking
from benchmarks.llm_benchmark import LLMBenchmark
# Create benchmark
benchmark = LLMBenchmark()
# Test multiple providers
results = await benchmark.benchmark_multiple([
("openai", "gpt-4o-mini"),
("anthropic", "claude-3-5-sonnet-20241022"),
("perplexity", "sonar-pro"),
("groq", "llama-3.3-70b-versatile")
])
# Generate report
report = benchmark.generate_report(results)
print(report)
๐ Provider Capabilities
from chuk_llm.llm.configuration.capabilities import PROVIDER_CAPABILITIES, Feature
# Check what a provider supports
openai_caps = PROVIDER_CAPABILITIES["openai"]
print(f"Supports streaming: {openai_caps.supports(Feature.STREAMING)}")
print(f"Supports vision: {openai_caps.supports(Feature.VISION)}")
print(f"Max context: {openai_caps.max_context_length}")
# Find best provider for requirements
from chuk_llm.llm.configuration.capabilities import CapabilityChecker
best = CapabilityChecker.get_best_provider({
Feature.STREAMING,
Feature.TOOLS,
Feature.VISION
})
print(f"Best provider: {best}")
๐ Provider Models
OpenAI
- GPT-4 - gpt-4o, gpt-4o-mini, gpt-4-turbo
- GPT-3.5 - gpt-3.5-turbo
Anthropic
- Claude 3.5 - claude-3-5-sonnet-20241022, claude-3-5-haiku-20241022
- Claude 3 - claude-3-opus-20240229, claude-3-sonnet-20240229
Google Gemini
- Gemini 2.0 - gemini-2.0-flash-exp
- Gemini 1.5 - gemini-1.5-pro, gemini-1.5-flash
Groq
- Llama 3.3 - llama-3.3-70b-versatile
- Llama 3.1 - llama-3.1-70b-versatile, llama-3.1-8b-instant
- Mixtral - mixtral-8x7b-32768
Perplexity ๐
Perplexity offers specialized models optimized for real-time web search and reasoning with citations.
Search Models (Online)
- sonar-pro - Premier search model built on Llama 3.3 70B, optimized for answer quality and speed (1200 tokens/sec)
- sonar - Cost-effective model for quick factual queries and current events
- llama-3.1-sonar-small-128k-online - 8B parameter model with 128k context, web search enabled
- llama-3.1-sonar-large-128k-online - 70B parameter model with 128k context, web search enabled
Reasoning Models
- sonar-reasoning-pro - Expert reasoning with Chain of Thought (CoT) and search capabilities
- sonar-reasoning - Fast real-time reasoning model for quick problem-solving
Research Models
- sonar-research - Deep research model conducting exhaustive searches and comprehensive reports
Chat Models (No Search)
- llama-3.1-sonar-small-128k-chat - 8B parameter chat model without web search
- llama-3.1-sonar-large-128k-chat - 70B parameter chat model without web search
Ollama
- Local Models - Any compatible GGUF model (Llama, Mistral, CodeLlama, etc.)
๐๏ธ Architecture
Core Components
BaseLLMClient- Abstract interface for all providersMiddlewareStack- Request/response processing pipelineProviderConfig- Configuration management systemConnectionPool- HTTP connection optimizationSystemPromptGenerator- Dynamic prompt generation
Provider Implementations
Each provider implements the BaseLLMClient interface with:
- Standardized message format (ChatML)
- Real-time streaming support
- Function calling normalization
- Error handling and retries
Middleware System
# Custom middleware example
from chuk_llm.llm.middleware import Middleware
class CustomMiddleware(Middleware):
async def process_request(self, messages, tools=None, **kwargs):
# Pre-process request
return messages, tools, kwargs
async def process_response(self, response, duration, is_streaming=False):
# Post-process response
return response
๐งช Testing & Diagnostics
# Extended streaming test
from diagnostics.streaming_extended import test_extended_streaming
await test_extended_streaming()
# Health check
from chuk_llm.llm.connection_pool import get_llm_health_status
health = await get_llm_health_status()
print(health)
๐ Performance
Streaming Performance
- Zero-buffering streaming - Chunks delivered in real-time
- Parallel requests - Multiple concurrent streams
- Connection pooling - Reduced latency
Benchmarks
Provider Comparison (avg response time):
โโโ Groq: 0.8s (ultra-fast inference)
โโโ Perplexity: 1.0s (real-time search + generation)
โโโ OpenAI: 1.2s (balanced performance)
โโโ Anthropic: 1.5s (high quality)
โโโ Gemini: 1.8s (multimodal)
โโโ Ollama: 2.5s (local processing)
Real-time Web Search Performance
Perplexity's Sonar models deliver blazing fast search results at 1200 tokens per second, nearly 10x faster than comparable models like Gemini 2.0 Flash.
๐ Security & Safety
- API key management - Environment variable support
- Request validation - Input sanitization
- Error handling - No sensitive data leakage
- Rate limiting - Built-in provider limit awareness
- Tool name sanitization - Safe function calling
๐ค Contributing
- Fork the repository
- Create a feature branch
- Add tests for new functionality
- Ensure all tests pass
- Submit a pull request
Adding New Providers
# Implement BaseLLMClient
class NewProviderClient(BaseLLMClient):
def create_completion(self, messages, tools=None, *, stream=False, **kwargs):
# Implementation here
pass
# Add to provider config
DEFAULTS["newprovider"] = {
"client": "chuk_llm.llm.providers.newprovider_client:NewProviderClient",
"api_key_env": "NEWPROVIDER_API_KEY",
"default_model": "default-model"
}
๐ Documentation
๐ License
MIT License - see LICENSE file for details.
๐ Acknowledgments
- OpenAI for the ChatML format and function calling standards
- Anthropic for advanced reasoning capabilities
- Google for multimodal AI innovations
- Groq for ultra-fast inference
- Perplexity for real-time web search and information retrieval
- Ollama for local AI deployment
chuk_llm - Unified LLM interface for production applications
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file chuk_llm-0.1.8.tar.gz.
File metadata
- Download URL: chuk_llm-0.1.8.tar.gz
- Upload date:
- Size: 59.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ae52a1781bc7ce5b6e908ae0cf2a01aea3566e0c95b11db27fe0f989761fc1ad
|
|
| MD5 |
51d160133ff39087e40ca6fca491413d
|
|
| BLAKE2b-256 |
e0d3e03fc70c2c09c643f90b3a2f8053d5c089459f54e560fc301245308da1b6
|
File details
Details for the file chuk_llm-0.1.8-py3-none-any.whl.
File metadata
- Download URL: chuk_llm-0.1.8-py3-none-any.whl
- Upload date:
- Size: 69.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3ab8426a8dedb1a17ea1cd1a058fad91b1fbd34e0eb8005d519c9724b3ed909f
|
|
| MD5 |
36f05c67677176a8f7516219fb3e5da1
|
|
| BLAKE2b-256 |
e6a85d3e1021daab306887e372612aace0f0b68336f21b6942c0ea9192756d33
|