A beautiful unified LLM interface for Django with persistence and streaming
Project description
dj-llm
A beautiful unified LLM interface for Django. One API for OpenAI, Anthropic, Google, Azure OpenAI, and Ollama with persistence, streaming, and production-ready features.
Inspired by RubyLLM.
Features
- Unified API - Same interface for OpenAI, Anthropic, Google, Azure OpenAI, and Ollama
- Django Integration - Built-in model persistence with
ConversationandStoredMessage - Streaming - Easy streaming with callbacks or iterators
- Tool Calling - Define tools as classes or functions
- Structured Output - JSON schema responses with dataclasses or Pydantic
- Async Support - Full async/await support with
aask()andastream() - Cost Tracking - Automatic cost estimation per request
- Observability - Logging hooks for metrics and monitoring
- Retry Logic - Exponential backoff for transient failures
- Local Models - Run local LLMs with Ollama
- Fluent Interface - Chain configuration methods for clean code
- Minimal Dependencies - Just Django and httpx
Installation
pip install dj-llm
Add to your INSTALLED_APPS:
INSTALLED_APPS = [
# ...
'django_llm',
]
Run migrations:
python manage.py migrate
Quick Start
import django_llm
# Simple chat
response = django_llm.chat().ask("What is Python?")
print(response.content)
# With specific model
from django_llm import Chat
chat = Chat(model="gpt-4o")
response = chat.ask("Hello!")
# Streaming
for chunk in chat.stream("Tell me a story"):
print(chunk, end="", flush=True)
Configuration
Environment Variables
# Cloud providers
export OPENAI_API_KEY="sk-..."
export ANTHROPIC_API_KEY="sk-ant-..."
export GOOGLE_API_KEY="..." # or GEMINI_API_KEY
# Azure OpenAI
export AZURE_OPENAI_ENDPOINT="https://your-resource.openai.azure.com"
export AZURE_OPENAI_API_KEY="..."
export AZURE_OPENAI_API_VERSION="2024-08-01-preview" # optional
# Ollama (local)
export OLLAMA_BASE_URL="http://localhost:11434/v1" # optional, this is the default
Programmatic Configuration
import django_llm
django_llm.configure(
openai_api_key="sk-...",
default_model="gpt-4o",
timeout=60.0,
max_retries=3,
)
Providers
OpenAI
chat = Chat(model="gpt-4o")
response = chat.ask("Hello!")
Anthropic (Claude)
chat = Chat(model="claude-sonnet-4-20250514")
response = chat.ask("Hello!")
Google (Gemini)
chat = Chat(model="gemini-2.0-flash")
response = chat.ask("Hello!")
Azure OpenAI
Use the azure: prefix with your deployment name:
chat = Chat(model="azure:my-gpt4-deployment")
response = chat.ask("Hello!")
Ollama (Local Models)
Run LLMs locally with Ollama. Use the ollama: prefix:
# First, pull a model: ollama pull llama3.2
chat = Chat(model="ollama:llama3.2")
response = chat.ask("Hello!")
# Other popular models
chat = Chat(model="ollama:mistral")
chat = Chat(model="ollama:codellama")
chat = Chat(model="ollama:gemma2")
Fluent Interface
from django_llm import Chat
response = (
Chat()
.with_model("claude-sonnet-4-20250514")
.with_temperature(0.7)
.with_instructions("You are a helpful assistant.")
.ask("What's the weather like?")
)
Multi-turn Conversations
chat = Chat(system="You remember everything I tell you.")
chat.ask("My name is Alice.")
response = chat.ask("What's my name?")
# "Your name is Alice."
Streaming
With Iterator
for chunk in chat.stream("Write a poem"):
print(chunk, end="", flush=True)
With Callback
def handle_chunk(chunk):
print(chunk, end="", flush=True)
chat.on_chunk(handle_chunk).ask("Write a poem", stream=True)
Async Streaming
async for chunk in chat.astream("Write a poem"):
print(chunk, end="", flush=True)
Async Support
Full async/await support for non-blocking operations:
from django_llm import Chat
chat = Chat(model="gpt-4o")
# Async ask
response = await chat.aask("Hello!")
# Async streaming
async for chunk in chat.astream("Tell me a story"):
print(chunk, end="", flush=True)
Tool Calling
Class-based Tools
from django_llm import Tool
class WeatherTool(Tool):
name = "get_weather"
description = "Get the current weather for a location"
def execute(self, location: str, unit: str = "celsius") -> str:
# Your implementation here
return f"Weather in {location}: 22{unit[0].upper()}"
chat = Chat().with_tools([WeatherTool()])
response = chat.ask("What's the weather in Paris?")
Function-based Tools
from django_llm.tool import tool
@tool(description="Add two numbers together")
def add(a: int, b: int) -> int:
return a + b
chat = Chat().with_tools([add])
response = chat.ask("What is 2 + 3?")
Structured Output
Get typed responses with JSON schema validation:
from dataclasses import dataclass
from django_llm import Chat
@dataclass
class Person:
name: str
age: int
city: str
chat = Chat().with_schema(Person)
response = chat.ask("Extract: John is 30 years old and lives in NYC")
person = response.parsed # Person(name='John', age=30, city='NYC')
Also works with Pydantic models:
from pydantic import BaseModel
class Person(BaseModel):
name: str
age: int
chat = Chat().with_schema(Person)
Django Model Persistence
Basic Usage
from django_llm.models import Conversation
# Create a new conversation
conv = Conversation.objects.create(
model_id="gpt-4o",
system_prompt="You are a helpful assistant.",
name="Support Chat",
)
# Chat and persist
response = conv.ask("Hello!")
conv.sync_messages() # Save to database
# Later, restore the conversation
conv = Conversation.objects.get(pk=1)
response = conv.ask("What did we discuss?")
Auto-save Mode
conv = Conversation.objects.create(model_id="gpt-4o")
conv.with_auto_save(True) # Messages saved automatically
conv.ask("Hello!") # Automatically persisted
With User Association
conv = Conversation.objects.create(
model_id="gpt-4o",
name="Support Chat",
user=request.user,
)
# Query user's conversations
user_convos = request.user.conversations.all()
With Metadata
conv = Conversation.objects.create(
model_id="gpt-4o",
metadata={"topic": "support", "priority": "high"},
)
Cost Tracking
Automatic cost estimation for cloud providers:
response = chat.ask("Hello!")
# Per-message cost
print(f"Cost: ${response.cost}") # e.g., $0.000125
# Token usage
print(f"Input tokens: {response.tokens.input_tokens}")
print(f"Output tokens: {response.tokens.output_tokens}")
Note: Cost is None for Ollama (local) and models without pricing data.
Observability Hooks
Add custom logging, metrics, or APM integration:
import django_llm
def on_request(model, messages, **kwargs):
print(f"Calling {model} with {len(messages)} messages")
# statsd.increment('llm.requests', tags=[f'model:{model}'])
def on_response(model, message, duration_ms, **kwargs):
print(f"Response in {duration_ms:.0f}ms")
# statsd.timing('llm.latency', duration_ms)
if message.cost:
print(f"Cost: ${message.cost}")
def on_error(model, error, duration_ms, **kwargs):
print(f"Error after {duration_ms:.0f}ms: {error}")
# sentry_sdk.capture_exception(error)
django_llm.add_request_hook(on_request)
django_llm.add_response_hook(on_response)
django_llm.add_error_hook(on_error)
# Remove hooks when done
django_llm.clear_hooks()
Django Logging Configuration
LOGGING = {
'version': 1,
'handlers': {
'console': {'class': 'logging.StreamHandler'},
},
'loggers': {
'django_llm': {
'handlers': ['console'],
'level': 'INFO',
},
},
}
Retry Logic
Automatic retry with exponential backoff for transient failures:
django_llm.configure(
max_retries=3, # Number of retries (default: 3)
retry_base_delay=1.0, # Initial delay in seconds (default: 1.0)
retry_max_delay=60.0, # Maximum delay in seconds (default: 60.0)
)
Retries automatically on:
- Rate limit errors (429)
- Server errors (5xx)
- Connection errors
- Timeouts
Does NOT retry on:
- Authentication errors (401)
- Invalid request errors (400)
Error Handling
from django_llm import Chat
from django_llm.exceptions import (
DjangoLLMError, # Base exception
ProviderError, # Provider-specific errors
AuthenticationError, # Invalid API key
RateLimitError, # Too many requests
InvalidRequestError, # Bad request
ModelNotFoundError, # Unknown model
)
try:
chat = Chat()
response = chat.ask("Hello")
except AuthenticationError:
print("Check your API key")
except RateLimitError:
print("Too many requests, slow down")
except ProviderError as e:
print(f"Provider error: {e}")
Supported Models
OpenAI
gpt-4o,gpt-4o-minigpt-4-turbo,gpt-4gpt-3.5-turboo1,o1-mini,o3-mini
Anthropic
claude-sonnet-4-20250514claude-3-5-sonnet-20241022claude-3-5-haiku-20241022claude-3-opus-20240229
gemini-2.0-flashgemini-1.5-pro,gemini-1.5-flash
Azure OpenAI
- Use
azure:prefix with your deployment name - Example:
azure:my-gpt4-deployment
Ollama (Local)
- Use
ollama:prefix with any Ollama model - Examples:
ollama:llama3.2,ollama:mistral,ollama:codellama
Django Admin
The admin interface is automatically registered. View and search conversations and messages at /admin/django_llm/.
Testing
# Run all tests
pytest
# Run with coverage
pytest --cov=django_llm
# Run specific test file
pytest tests/test_chat.py
Dependencies
- Python 3.10+
- Django 4.2+
- httpx
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dj_llm-1.0.0.tar.gz.
File metadata
- Download URL: dj_llm-1.0.0.tar.gz
- Upload date:
- Size: 49.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
250a3eb389be2d5e407914c662813d27d11a055b3f8e4ea0c583b0447cafed7a
|
|
| MD5 |
50636bef8d52c1157d436697968b6538
|
|
| BLAKE2b-256 |
95996222be0f993de858fb03c0e6c9881cb7c05bfafe6fb8f4fac4f472ba6677
|
File details
Details for the file dj_llm-1.0.0-py3-none-any.whl.
File metadata
- Download URL: dj_llm-1.0.0-py3-none-any.whl
- Upload date:
- Size: 49.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
99e86764da8ee77f399cf444d90a42ebaa8ab841632ef507ab1d1cf2b395dfa4
|
|
| MD5 |
6bcefe43ef69576e92a9b4331a0e51cf
|
|
| BLAKE2b-256 |
23a5414225f67cf917b1e1d460522b23c01a12c8130015bd19fc1889bcaf4019
|