A unified Python library for Large Language Model (LLM) providers with real-time streaming, function calling, middleware support, automatic session tracking, dynamic model discovery, and intelligent system prompt generation.

These details have not been verified by PyPI

Project links

Project description

chuk-llm

The intelligent model capability engine. Production-ready Python library with dynamic model discovery, capability-based selection, real-time streaming, and Pydantic-native architecture.

from chuk_llm import quick_question
print(quick_question("What is 2+2?"))  # "2 + 2 equals 4."

✨ What's New in v0.14

Revolutionary Registry System:

🧠 Dynamic Model Discovery - No more hardcoded model lists, automatic capability detection
🎯 Intelligent Selection - Find models by capabilities, cost, and quality tier
🔍 Smart Queries - find_best(requires_tools=True, quality_tier="cheap")
🏗️ Pydantic V2 Native - Type-safe models throughout, no dictionary goop
⚡ Async-First Architecture - True async/await with sync wrappers for convenience
📊 Layered Capability Resolution - Heuristics → YAML cache → Provider APIs
🚀 Zero-Config - Pull a new Ollama model, use it immediately

Latest Models (December 2025):

🤖 Gemini 2.5/3 Pro - 1M token context, adaptive thinking, multimodal (gemini-2.5-flash, gemini-3-pro-preview)
🚀 Mistral Large 3 - 675B MoE, 41B active, Apache 2.0 (mistral-large-2512, ministral-8b-2512, ministral-14b-2512)
💡 DeepSeek V3.2 - 671B MoE, ultra-efficient at $0.27/M tokens (deepseek-chat, deepseek-reasoner)

Performance:

⚡ 52x faster imports - Lazy loading reduces import time from 735ms to 14ms
🚀 112x faster client creation - Automatic thread-safe caching
📊 <0.015% overhead - Negligible library overhead vs API latency

See REGISTRY_COMPLETE.md for architecture details.

Why chuk-llm?

🧠 Intelligent: Dynamic registry selects models by capabilities, not names
🔍 Auto-Discovery: Pull new models, use immediately - no configuration needed
⚡ Lightning Fast: Massive performance improvements (see Performance)
🛠️ Clean Tools API: Function calling without complexity - tools are just parameters
🏗️ Type-Safe: Pydantic V2 models throughout, no dictionary goop
⚡ Async-Native: True async/await with sync wrappers when needed
📊 Built-in Analytics: Automatic cost and usage tracking with session isolation
🎯 Production-Ready: Thread-safe caching, connection pooling, negligible overhead

Quick Start

Installation

# Core functionality
pip install chuk_llm

# Or with extras
pip install chuk_llm[redis]  # Persistent sessions
pip install chuk_llm[cli]    # Enhanced CLI experience
pip install chuk_llm[all]    # Everything

Basic Usage

# Simplest approach - auto-detects available providers
from chuk_llm import quick_question
answer = quick_question("Explain quantum computing in one sentence")

# Provider-specific (auto-generated functions!)
from chuk_llm import ask_openai_sync, ask_claude_sync, ask_ollama_llama3_2_sync

response = ask_openai_sync("Tell me a joke")
response = ask_claude_sync("Write a haiku")
response = ask_ollama_llama3_2_sync("Explain Python")  # Auto-discovered!

Latest Models (December 2025)

from chuk_llm import ask

# Gemini 3 Pro - Advanced reasoning with 1M context
response = await ask(
    "Explain consciousness vs intelligence in AI",
    provider="gemini",
    model="gemini-3-pro-preview"
)

# Mistral Large 3 - 675B MoE, Apache 2.0
response = await ask(
    "Write a Python function for binary search",
    provider="mistral",
    model="mistral-large-2512"
)

# Ministral 8B - Fast, efficient, cost-effective
response = await ask(
    "Summarize this text",
    provider="mistral",
    model="ministral-8b-2512"
)

# DeepSeek V3.2 - Ultra-efficient at $0.27/M tokens
response = await ask(
    "Solve this math problem step by step",
    provider="deepseek",
    model="deepseek-chat"
)

Async & Streaming

import asyncio
from chuk_llm import ask, stream

async def main():
    # Async call
    response = await ask("What's the capital of France?")
    
    # Real-time streaming
    async for chunk in stream("Write a story"):
        print(chunk, end="", flush=True)

asyncio.run(main())

Function Calling (Tools)

from chuk_llm import ask
from chuk_llm.api.tools import tools_from_functions

def get_weather(location: str) -> dict:
    return {"temp": 22, "location": location, "condition": "sunny"}

# Tools are just a parameter!
toolkit = tools_from_functions(get_weather)
response = await ask(
    "What's the weather in Paris?",
    tools=toolkit.to_openai_format()
)
print(response)  # Returns dict with tool_calls when tools provided

CLI Usage

# Quick commands with global aliases
chuk-llm ask_gpt "What is Python?"
chuk-llm ask_claude "Explain quantum computing"

# Auto-discovered Ollama models work instantly
chuk-llm ask_ollama_gemma3 "Hello world"
chuk-llm stream_ollama_mistral "Write a long story"

# llama.cpp with automatic model resolution
chuk-llm ask "What is Python?" --provider llamacpp --model qwen3
chuk-llm ask "Count to 5" --provider llamacpp --model llama3.2

# Discover new models
chuk-llm discover ollama

🧠 Dynamic Registry System

The registry is the intelligent core of chuk-llm. Instead of hardcoding model names, it dynamically discovers models and their capabilities, then selects the best one for your needs.

Intelligent Model Selection

from chuk_llm.registry import get_registry
from chuk_llm import ask

# Get the registry (auto-discovers all available models)
registry = await get_registry()

# Find the best cheap model with tool support
model = await registry.find_best(
    requires_tools=True,
    quality_tier="cheap"
)
print(f"Selected: {model.spec.provider}:{model.spec.name}")
# Selected: groq:llama-3.3-70b-versatile

# Use the selected model with ask()
response = await ask(
    "Summarize this document",
    provider=model.spec.provider,
    model=model.spec.name
)

# Find best model for vision with large context
model = await registry.find_best(
    requires_vision=True,
    min_context=128_000,
    quality_tier="balanced"
)
# Returns: openai:gpt-4o-mini or gemini:gemini-2.0-flash-exp

# Custom queries with multiple requirements
from chuk_llm.registry import ModelQuery

results = await registry.query(ModelQuery(
    requires_tools=True,
    requires_vision=True,
    min_context=100_000,
    max_cost_per_1m_input=2.0,
    quality_tier="balanced"
))

How It Works

3-Tier Capability Resolution:

Heuristic Resolver - Infers capabilities from model name patterns (e.g., "gpt-4" → likely supports tools)
YAML Cache - Tested capabilities stored in registry/capabilities/*.yaml for fast, reliable access
Provider APIs - Queries provider APIs dynamically (Ollama /api/tags, Gemini models API, etc.)

Dynamic Discovery Sources:

OpenAI /v1/models API
Anthropic known models
Google Gemini models API
Ollama /api/tags (local models)
llama.cpp /v1/models (local GGUF + Ollama bridge)
DeepSeek /v1/models API
Moonshot AI /v1/models API
Groq, Mistral, Perplexity, and more

Provider APIs are cached on disk and refreshed periodically (or via chuk-llm discover), so new models appear without needing a chuk-llm release.

Benefits:

✅ No hardcoded model lists - Pull new Ollama models, use immediately
✅ Capability-based selection - Declare requirements, not model names
✅ Cost-aware - Find cheapest model that meets requirements
✅ Quality tiers - BEST, BALANCED, CHEAP classification
✅ Extensible - Add custom sources and resolvers via protocols

Key Features

🔍 Automatic Model Discovery

Pull new Ollama models and use them immediately - no configuration needed:

# Terminal 1: Pull a new model
ollama pull llama3.2
ollama pull mistral-small:latest

# Terminal 2: Use immediately in Python
from chuk_llm import ask_ollama_llama3_2_sync, ask_ollama_mistral_small_latest_sync
response = ask_ollama_llama3_2_sync("Hello!")

# Or via CLI
chuk-llm ask_ollama_mistral_small_latest "Tell me a joke"

🦙 llama.cpp Integration

Run local GGUF models with advanced control via llama.cpp server. Reuse Ollama's downloaded models without re-downloading!

CLI Usage (✨ Now fully supported!):

# Simple usage - model names automatically resolve to GGUF files
chuk-llm ask "What is Python?" --provider llamacpp --model qwen3
chuk-llm ask "Count to 5" --provider llamacpp --model llama3.2

# Streaming (default)
chuk-llm ask "Write a story" --provider llamacpp --model qwen3

# Non-streaming
chuk-llm ask "Quick question" --provider llamacpp --model qwen3 --no-stream

Python API (Simple - Recommended):

from chuk_llm import ask

# Model names automatically resolve to Ollama's GGUF files!
response = await ask(
    "What is Python?",
    provider="llamacpp",
    model="qwen3"  # Auto-resolves to ~/.ollama/models/blobs/sha256-xxx
)
print(response)

# Streaming
from chuk_llm import stream
async for chunk in stream("Tell me a story", provider="llamacpp", model="llama3.2"):
    print(chunk, end="", flush=True)

Python API (Advanced - Full Control):

from chuk_llm.registry.resolvers.llamacpp_ollama import discover_ollama_models
from chuk_llm.llm.providers.llamacpp_client import LlamaCppLLMClient
from chuk_llm.core import Message, MessageRole

# Discover Ollama models (finds GGUF blobs in ~/.ollama/models/blobs/)
models = discover_ollama_models()
print(f"Found {len(models)} Ollama models")  # e.g., "Found 48 Ollama models"

# Create client with auto-managed server
client = LlamaCppLLMClient(
    model=str(models[0].gguf_path),  # Reuse Ollama's GGUF!
    ctx_size=8192,
    n_gpu_layers=-1,  # Use all GPU layers
)

messages = [Message(role=MessageRole.USER, content="Hello!")]
result = await client.create_completion(messages=messages)
print(result["response"])

# Cleanup
await client.stop_server()

Key Features:

✅ CLI Support - Full integration with chuk-llm CLI (model name resolution)
✅ Ollama Bridge - Automatically discovers and reuses Ollama's downloaded models (no re-download!)
✅ Auto-Resolution - Model names (qwen3, llama3.2) resolve to GGUF file paths automatically
✅ Process Management - Auto-managed server lifecycle (start/stop/health checks)
✅ OpenAI-Compatible - Uses standard OpenAI client (streaming, tools, etc.)
✅ High Performance - Benchmarks show llama.cpp is 1.53x faster than Ollama (311 vs 204 tok/s)
✅ Advanced Control - Custom sampling, grammars, GPU layers, context size
✅ Cross-Platform - Works on macOS, Linux, Windows

Performance Comparison (same GGUF file, qwen3:0.6b):

llama.cpp: 311.4 tok/s
Ollama: 204.2 tok/s
llama.cpp is 1.53x faster!

See examples/providers/llamacpp_ollama_usage_examples.py and examples/providers/benchmark_ollama_vs_llamacpp.py for full examples.

📊 Automatic Session Tracking

Every call is automatically tracked for analytics:

from chuk_llm import ask_sync, get_session_stats

ask_sync("What's the capital of France?")
ask_sync("What's 2+2?")

stats = get_session_stats()
print(f"Total cost: ${stats['estimated_cost']:.6f}")
print(f"Total tokens: {stats['total_tokens']}")

🎭 Stateful Conversations

Build conversational AI with memory:

from chuk_llm import conversation

async with conversation() as chat:
    await chat.ask("My name is Alice")
    response = await chat.ask("What's my name?")
    # AI responds: "Your name is Alice"

⚡ Concurrent Execution

Run multiple queries in parallel for massive speedups:

import asyncio
from chuk_llm import ask

# 3-7x faster than sequential!
responses = await asyncio.gather(
    ask("What is AI?"),
    ask("Capital of Japan?"),
    ask("Meaning of life?")
)

Supported Providers

All providers are dynamically discovered via the registry system - no hardcoded model lists!

Provider	Discovery Method	Special Features	Status
OpenAI	`/v1/models` API	GPT-5 / GPT-5.1, o3-family reasoning, industry standard	✅ Dynamic
Azure OpenAI	Deployment config	SOC2, HIPAA compliant, VNet, multi-region	✅ Dynamic
Anthropic	Known models†	Claude 3.5 Sonnet, advanced reasoning, 200K context	✅ Static
Google Gemini	Models API	Gemini 2.5/3 Pro, 1M token context, adaptive thinking, multimodal	✅ Dynamic
Groq	`/v1/models` API	Llama 3.3, ultra-fast (our benchmarks: ~526 tok/s)	✅ Dynamic
Ollama	`/api/tags`	Any local model, auto-discovery, offline, privacy	✅ Dynamic
llama.cpp	`/v1/models`	Local GGUF models, Ollama bridge, advanced control	✅ Dynamic
IBM watsonx	Known models†	Granite 3.3, enterprise, on-prem, compliance	✅ Static
Perplexity	Known models†	Sonar, real-time web search, citations	✅ Static
Mistral	Known models†	Large 3 (675B MoE), Ministral 3 (3B/8B/14B), Apache 2.0	✅ Static
DeepSeek	`/v1/models` API	DeepSeek V3.2 (671B MoE), ultra-efficient, $0.27/M tokens	✅ Dynamic
Moonshot AI	`/v1/models` API	Kimi K2, 256K context, coding, Chinese language	✅ Dynamic
OpenRouter	Known models†	Access to 100+ models via single API	✅ Static

† Static = discovered from curated model list + provider docs, not via /models endpoint

Capabilities (auto-detected by registry):

✅ Streaming responses
✅ Function calling / tool use
✅ Vision / multimodal inputs
✅ JSON mode / structured outputs
✅ Async and sync interfaces
✅ Automatic client caching
✅ Session tracking
✅ Conversation management

Configuration

Environment Variables

# API Keys - Cloud Providers
export OPENAI_API_KEY="sk-..."
export ANTHROPIC_API_KEY="sk-ant-..."
export GOOGLE_API_KEY="..."        # For Gemini 2.5/3 models
export GROQ_API_KEY="..."
export DEEPSEEK_API_KEY="..."      # For DeepSeek V3.2 (chat/reasoner)
export MOONSHOT_API_KEY="..."
export MISTRAL_API_KEY="..."       # For Mistral Large 3 & Ministral 3

# Azure Configuration
export AZURE_OPENAI_API_KEY="..."
export AZURE_OPENAI_ENDPOINT="https://your-resource.openai.azure.com"

# Local Servers
# (No API keys needed for Ollama or llama.cpp)

# Session Storage (optional)
export SESSION_PROVIDER=redis  # Default: memory
export SESSION_REDIS_URL=redis://localhost:6379/0

# Performance Settings
export CHUK_LLM_CACHE_CLIENTS=1      # Enable client caching (default: 1)
export CHUK_LLM_AUTO_DISCOVER=true   # Auto-discover new models (default: true)

Python Configuration

from chuk_llm import configure

configure(
    provider="azure_openai",
    model="gpt-4o-mini",
    temperature=0.7
)

# All subsequent calls use these settings
response = ask_sync("Hello!")

Client Caching (Advanced)

Automatic client caching is enabled by default for maximum performance:

from chuk_llm.llm.client import get_client

# First call creates client (~12ms)
client1 = get_client("openai", model="gpt-4o")

# Subsequent calls return cached instance (~125µs)
client2 = get_client("openai", model="gpt-4o")
assert client1 is client2  # Same instance!

# Disable caching for specific call
client3 = get_client("openai", model="gpt-4o", use_cache=False)

# Monitor cache performance
from chuk_llm.client_registry import print_registry_stats
print_registry_stats()
# Cache statistics:
# - Total clients: 1
# - Cache hits: 1
# - Cache misses: 1
# - Hit rate: 50.0%

Advanced Features

🛠️ Function Calling / Tool Use

ChukLLM provides a clean, unified API for function calling. Recommended approach: Use the Tools class for automatic execution.

from chuk_llm import Tools, tool

# Recommended: Class-based tools with auto-execution
class MyTools(Tools):
    @tool(description="Get weather for a city")
    def get_weather(self, location: str) -> dict:
        return {"temp": 22, "location": location, "condition": "sunny"}

    @tool  # Description auto-extracted from docstring
    def calculate(self, expr: str) -> float:
        """Evaluate a mathematical expression"""
        return eval(expr)

# Auto-executes tools and returns final response
tools = MyTools()
response = await tools.ask("What's the weather in Paris and what's 2+2?")
print(response)  # "The weather in Paris is 22°C and sunny. 2+2 equals 4."

# Sync version
response = tools.ask_sync("Calculate 15 * 4")
print(response)  # "15 * 4 equals 60"

Alternative: Direct API usage (for more control):

from chuk_llm import ask
from chuk_llm.api.tools import tools_from_functions

def get_weather(location: str) -> dict:
    """Get weather information for a location"""
    return {"temp": 22, "location": location}

# Create toolkit
toolkit = tools_from_functions(get_weather)

# Returns dict with tool_calls - you handle execution
response = await ask(
    "What's the weather in Paris?",
    tools=toolkit.to_openai_format()
)
print(response)  # {"response": "...", "tool_calls": [...]}

Streaming with Tools

from chuk_llm import stream

# Streaming with tools
async for chunk in stream(
    "What's the weather in Tokyo?", 
    tools=toolkit.to_openai_format(),
    return_tool_calls=True  # Include tool calls in stream
):
    if isinstance(chunk, dict):
        print(f"Tool call: {chunk['tool_calls']}")
    else:
        print(chunk, end="", flush=True)

🌳 Conversation Branching

async with conversation() as chat:
    await chat.ask("Planning a vacation")
    
    # Explore different options
    async with chat.branch() as japan_branch:
        await japan_branch.ask("Tell me about Japan")
    
    async with chat.branch() as italy_branch:
        await italy_branch.ask("Tell me about Italy")
    
    # Main conversation unaffected by branches
    await chat.ask("I'll go with Japan!")

📈 Provider Comparison

from chuk_llm import compare_providers

results = compare_providers(
    "Explain quantum computing",
    ["openai", "anthropic", "groq", "ollama"]
)

for provider, response in results.items():
    print(f"{provider}: {response[:100]}...")

🎯 Intelligent System Prompts

ChukLLM automatically generates optimized system prompts based on provider capabilities:

# Each provider gets optimized prompts
response = ask_claude_sync("Help me code", tools=tools)
# Claude gets: "You are Claude, an AI assistant created by Anthropic..."

response = ask_openai_sync("Help me code", tools=tools)  
# OpenAI gets: "You are a helpful assistant with function calling..."

CLI Commands

# Quick access to any model
chuk-llm ask_gpt "Your question"
chuk-llm ask_claude "Your question"
chuk-llm ask_ollama_llama3_2 "Your question"

# llama.cpp with automatic model resolution
chuk-llm ask "Your question" --provider llamacpp --model qwen3
chuk-llm ask "Your question" --provider llamacpp --model llama3.2

# Discover and test
chuk-llm discover ollama        # Find new models
chuk-llm test llamacpp          # Test llamacpp provider
chuk-llm test azure_openai      # Test connection
chuk-llm providers              # List all providers
chuk-llm models ollama          # Show available models
chuk-llm functions              # List all generated functions

# Advanced usage
chuk-llm ask "Question" --provider azure_openai --model gpt-4o-mini --json
chuk-llm ask "Question" --provider llamacpp --model qwen3 --no-stream
chuk-llm ask "Question" --stream --verbose

# Function calling / Tool use from CLI
chuk-llm ask "Calculate 15 * 4" --tools calculator_tools.py
chuk-llm stream "What's the weather?" --tools weather_tools.py --return-tool-calls

# Zero-install with uvx
uvx chuk-llm ask_claude "Hello world"
uvx chuk-llm ask "Question" --provider llamacpp --model qwen3

Performance

chuk-llm is designed for high throughput with negligible overhead:

Key Metrics

Operation	Time	Notes
Import	14ms	52x faster than eager loading
Client creation (cached)	125µs	112x faster, thread-safe
Request overhead	50-140µs	<0.015% of typical API call

Production Features

Automatic client caching - Thread-safe, 112x faster repeated operations
Lazy imports - Only load what you use
Connection pooling - Efficient HTTP/2 reuse
Async-native - Built on asyncio for maximum throughput
Smart caching - Model discovery results cached on disk

Benchmarks

Run comprehensive benchmarks:

uv run python benchmarks/benchmark_client_registry.py
uv run python benchmarks/llm_benchmark.py

See PERFORMANCE_OPTIMIZATIONS.md for detailed analysis and micro-benchmarks.

Architecture

ChukLLM uses a registry-driven, async-native architecture designed for scale:

🏗️ Core Design Principles

Dynamic Registry - Models discovered and selected by capabilities, not names
Pydantic V2 Native - Type-safe models throughout, no dictionary goop
Async-First - Built on asyncio with sync wrappers for convenience
Stateless Clients - Clients don't store conversation history; your application manages state
Lazy Loading - Modules load on-demand for instant imports (14ms)
Automatic Caching - Thread-safe client registry eliminates duplicate initialization

🔄 Request Flow

User Code
    ↓
import chuk_llm (14ms - lazy loading)
    ↓
get_client() (2µs - cached registry lookup)
    ↓
[Cached Client Instance]
    ↓
async ask() (~50µs - minimal overhead)
    ↓
Provider SDK (~50µs - efficient request building)
    ↓
HTTP Request (50-500ms - network I/O)
    ↓
Response Parsing (~50µs - orjson)
    ↓
Return to User

Total chuk-llm Overhead: ~150µs (<0.015% of API call)

🔐 Session Isolation

Important: Conversation history is NOT shared between calls. Each conversation is independent:

from chuk_llm.llm.client import get_client
from chuk_llm.core.models import Message

client = get_client("openai", model="gpt-4o")

# Conversation 1
conv1 = [Message(role="user", content="My name is Alice")]
response1 = await client.create_completion(conv1)

# Conversation 2 (completely separate)
conv2 = [Message(role="user", content="What's my name?")]
response2 = await client.create_completion(conv2)
# AI won't know the name - conversations are isolated!

Key Insights:

✅ Clients are stateless (safe to cache and share)
✅ Conversation state lives in YOUR application
✅ HTTP sessions shared for performance (connection pooling)
✅ No cross-conversation or cross-user leakage
✅ Thread-safe for concurrent use

See CONVERSATION_ISOLATION.md for detailed architecture.

📦 Module Organization

chuk-llm/
├── api/                      # Public API (ask, stream, conversation)
├── registry/                 # ⭐ Dynamic model registry (THE BRAIN)
│   ├── core.py              # ModelRegistry orchestrator
│   ├── models.py            # Pydantic models (ModelSpec, ModelCapabilities)
│   ├── sources/             # Discovery sources (OpenAI, Ollama, Gemini, etc.)
│   └── resolvers/           # Capability resolvers (Heuristic, YAML, APIs)
├── core/                     # Pydantic V2 models (Message, Tool, ContentPart)
│   ├── models.py            # Core Pydantic models
│   ├── enums.py             # Type-safe enums (Provider, Feature, etc.)
│   └── constants.py         # Constants
├── llm/
│   ├── providers/           # 15+ provider implementations
│   ├── client.py            # Client factory with registry integration
│   └── features.py          # Feature detection
├── configuration/           # Unified configuration system
└── client_registry.py       # Thread-safe client caching

Used by the CHUK Stack

chuk-llm is the canonical LLM layer for the entire CHUK ecosystem:

chuk-ai-planner uses the registry to select planning vs drafting models by capability
chuk-acp-agent uses capability-based policies per agent (e.g., "requires tools + 128k context")
chuk-mcp-remotion uses it to pick video-script models with vision + long context

Instead of hardcoding "use GPT-4o", CHUK components declare what they need, and the registry finds the best available model.

Documentation

📚 Full Documentation
🎯 Examples (33)
⚡ Performance Optimizations
🗄️ Client Registry
🔄 Lazy Imports
🔐 Conversation Isolation
📊 Registry System
🔧 Debug Tools - Test OpenAI-compatible API capabilities
🏗️ Migration Guide
🤝 Contributing

Quick Comparison

Feature	chuk-llm	LangChain	LiteLLM	OpenAI SDK
Import speed	⚡ 14ms	🐌 1-2s	🐌 500ms+	⚡ Fast
Client caching	✅ Auto (112x)	❌	❌	❌
Auto-discovery	✅	❌	❌	❌
Native streaming	✅	⚠️	✅	✅
Function calling	✅ Clean API	✅ Complex	⚠️ Basic	✅
Session tracking	✅ Built-in	⚠️ Manual	❌	❌
Session isolation	✅ Guaranteed	⚠️ Varies	⚠️ Unclear	⚠️ Manual
CLI included	✅	❌	⚠️ Basic	❌
Provider functions	✅ Auto-generated	❌	❌	❌
Conversations	✅ Branching	✅	❌	⚠️ Manual
Thread-safe	✅	⚠️ Varies	⚠️	✅
Async-native	✅	⚠️ Mixed	✅	✅
Setup complexity	Simple	Complex	Simple	Simple
Dependencies	Minimal	Heavy	Moderate	Minimal
Performance overhead	<0.015%	~2-5%	~1-2%	Minimal

Installation Options

Command	Features	Use Case
`pip install chuk_llm`	Core + Session tracking	Development
`pip install chuk_llm[redis]`	+ Redis persistence	Production
`pip install chuk_llm[cli]`	+ Rich CLI formatting	CLI tools
`pip install chuk_llm[all]`	Everything	Full features

License

Apache 2.0 License - see LICENSE file for details.

Support

🐛 Issues
💬 Discussions

Built with ❤️ for developers who just want their LLMs to work.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.20

Apr 23, 2026

This version

0.19

Apr 23, 2026

0.18

Feb 18, 2026

0.17.1

Dec 14, 2025

0.17

Dec 14, 2025

0.16.3

Dec 8, 2025

0.16.2

Dec 7, 2025

0.16.1

Dec 7, 2025

0.16

Dec 7, 2025

0.15.1

Dec 8, 2025

0.15

Dec 6, 2025

0.14.3

Nov 24, 2025

0.14.2

Nov 24, 2025

0.14.1

Nov 24, 2025

0.14

Nov 23, 2025

0.12.7

Oct 14, 2025

0.12.6

Oct 14, 2025

0.12.5

Oct 14, 2025

0.12.4

Oct 14, 2025

0.12.3

Oct 13, 2025

0.12.2

Sep 2, 2025

0.12.1

Sep 2, 2025

0.12

Sep 1, 2025

0.11

Aug 31, 2025

0.10

Aug 29, 2025

0.9.9

Aug 13, 2025

0.9.8

Aug 13, 2025

0.9.7

Aug 13, 2025

0.9.6

Aug 12, 2025

0.9.5

Aug 12, 2025

0.9.4

Aug 12, 2025

0.9.3

Aug 12, 2025

0.9.2

Aug 12, 2025

0.9.1

Aug 8, 2025

0.9

Aug 7, 2025

0.8.13

Jul 17, 2025

0.8.12

Jul 17, 2025

0.8.11

Jul 17, 2025

0.8.10

Jul 17, 2025

0.8.9

Jul 16, 2025

0.8.8

Jul 16, 2025

0.8.7

Jul 16, 2025

0.8.6

Jul 16, 2025

0.8.5

Jul 16, 2025

0.8.4

Jul 15, 2025

0.8.3

Jul 9, 2025

0.8.2

Jul 4, 2025

0.8.1

Jun 30, 2025

0.8

Jun 24, 2025

0.7.1

Jun 30, 2025

0.7

Jun 22, 2025

0.6

Jun 21, 2025

0.5

Jun 21, 2025

0.4

Jun 20, 2025

0.3

Jun 17, 2025

0.2

Jun 17, 2025

0.1.9

Jun 13, 2025

0.1.8

Jun 2, 2025

0.1.7

Jun 2, 2025

0.1.6

Jun 2, 2025

0.1.5

May 27, 2025

0.1.4

May 27, 2025

0.1.3

May 27, 2025

0.1.2

May 25, 2025

0.1.1

May 20, 2025

0.1.0

May 19, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

chuk_llm-0.19.tar.gz (304.5 kB view details)

Uploaded Apr 23, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

chuk_llm-0.19-py3-none-any.whl (328.0 kB view details)

Uploaded Apr 23, 2026 Python 3

File details

Details for the file chuk_llm-0.19.tar.gz.

File metadata

Download URL: chuk_llm-0.19.tar.gz
Upload date: Apr 23, 2026
Size: 304.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.11.11

File hashes

Hashes for chuk_llm-0.19.tar.gz
Algorithm	Hash digest
SHA256	`2b98662884825f6a6d598724564943d6d30ca724f47ea2efb445311540943c7d`
MD5	`1bddafcf3194b7aa5a0830066d6c5200`
BLAKE2b-256	`1d3bdd5953f1d86a5925f3550575ad72e6485c29da2a0181dbc4a9f045f05ea4`

See more details on using hashes here.

File details

Details for the file chuk_llm-0.19-py3-none-any.whl.

File metadata

Download URL: chuk_llm-0.19-py3-none-any.whl
Upload date: Apr 23, 2026
Size: 328.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.11.11

File hashes

Hashes for chuk_llm-0.19-py3-none-any.whl
Algorithm	Hash digest
SHA256	`58bae7be99055dc581fa128f5296ce63d09870d328dfe0b36fa372456f3f1f8d`
MD5	`7a545cf9b2297ce59db90665736b1d1c`
BLAKE2b-256	`c58f06857361902e4522610e6ecaa6f2a1f47cf1482ac6b4419fa2a8278fd1ac`

See more details on using hashes here.

chuk-llm 0.19

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

chuk-llm

✨ What's New in v0.14

Why chuk-llm?

Quick Start

Installation

Basic Usage

Latest Models (December 2025)

Async & Streaming

Function Calling (Tools)

CLI Usage

🧠 Dynamic Registry System

Intelligent Model Selection

How It Works

Key Features

🔍 Automatic Model Discovery

🦙 llama.cpp Integration

📊 Automatic Session Tracking

🎭 Stateful Conversations

⚡ Concurrent Execution

Supported Providers

Configuration

Environment Variables

Python Configuration

Client Caching (Advanced)

Advanced Features

🛠️ Function Calling / Tool Use

Streaming with Tools

CLI Commands

Performance

Key Metrics

Production Features

Benchmarks

Architecture

🏗️ Core Design Principles

🔄 Request Flow

🔐 Session Isolation

📦 Module Organization

Used by the CHUK Stack

Documentation

Quick Comparison

Installation Options

License

Support

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes