Multi-provider LLM orchestration with fallback chains and three-tier cognition
Project description
netrun-llm
Multi-provider LLM orchestration with automatic fallback chains and three-tier cognition system.
IMPORTANT - Version 2.0.0 Migration Notice
Version 2.0.0 introduces a namespace change from
netrun_llmtonetrun.llmas part of the Netrun namespace consolidation.Old imports (v1.x):
from netrun_llm import LLMFallbackChainNew imports (v2.x):
from netrun.llm import LLMFallbackChainThe old
netrun_llmnamespace still works but is deprecated and will be removed in v3.0.0. Please update your code.
Features
- Multi-Adapter Fallback Chains: Automatic failover between LLM providers (Claude -> GPT-4 -> Llama3)
- Three-Tier Cognition: Fast ack (<100ms), RAG response (<2s), Deep insight (<5s)
- Circuit Breaker Protection: Per-adapter circuit breakers prevent cascade failures
- Cost Tracking: Automatic cost estimation and tracking across all providers
- Async-First: Full async support with sync wrappers for compatibility
- Project-Agnostic: No Wilbur-specific dependencies, works in any Python project
Installation
# Base installation (Ollama support only)
pip install netrun-llm
# With Claude/Anthropic support
pip install netrun-llm[anthropic]
# With OpenAI support
pip install netrun-llm[openai]
# Full installation (all providers)
pip install netrun-llm[all]
Quick Start
Basic Usage with Fallback Chain
from netrun.llm import LLMFallbackChain
# Create default chain: Claude -> OpenAI -> Ollama
chain = LLMFallbackChain()
# Execute with automatic fallback
response = chain.execute("Explain quantum computing in 3 sentences")
print(f"Response: {response.content}")
print(f"Handled by: {response.adapter_name}")
print(f"Cost: ${response.cost_usd:.6f}")
print(f"Fallbacks used: {response.metadata.get('fallback_attempts', 0)}")
Three-Tier Cognition (Streaming)
import asyncio
from netrun.llm import ThreeTierCognition, CognitionTier
async def main():
cognition = ThreeTierCognition()
async for response in cognition.stream_response("What is machine learning?"):
if response.tier == CognitionTier.FAST_ACK:
print(f"[Thinking...] {response.content}")
elif response.tier == CognitionTier.RAG:
print(f"[Context] {response.content}")
elif response.tier == CognitionTier.DEEP:
print(f"[Answer] {response.content}")
asyncio.run(main())
Individual Adapters
from netrun.llm import ClaudeAdapter, OpenAIAdapter, OllamaAdapter
# Claude adapter
claude = ClaudeAdapter()
response = claude.execute("Write a haiku about Python")
print(response.content)
# OpenAI adapter
openai = OpenAIAdapter()
response = openai.execute("What is 2+2?")
print(response.content)
# Ollama adapter (local, free)
ollama = OllamaAdapter(model="llama3")
if ollama.check_availability():
response = ollama.execute("Hello, world!")
print(response.content)
Configuration
Environment Variables
# API Keys (use placeholders in code, set actual values in env)
ANTHROPIC_API_KEY=sk-ant-...
OPENAI_API_KEY=sk-...
OLLAMA_HOST=http://localhost:11434
# Optional: Default models
CLAUDE_DEFAULT_MODEL=claude-sonnet-4-5-20250929
OPENAI_DEFAULT_MODEL=gpt-4-turbo
OLLAMA_DEFAULT_MODEL=llama3
# Optional: Timeouts and limits
LLM_REQUEST_TIMEOUT=30
LLM_DEFAULT_MAX_TOKENS=4096
Using Placeholders (Security Best Practice)
from netrun.llm import ClaudeAdapter, LLMConfig
# Placeholders are resolved from environment at runtime
config = LLMConfig(
anthropic_api_key="{{ANTHROPIC_API_KEY}}", # Resolved from env
openai_api_key="{{OPENAI_API_KEY}}",
ollama_host="{{OLLAMA_HOST}}",
)
# Validate configuration
issues = config.validate()
if issues:
print(f"Configuration issues: {issues}")
Adapters
ClaudeAdapter (Anthropic)
from netrun.llm import ClaudeAdapter
adapter = ClaudeAdapter(
default_model="claude-sonnet-4-5-20250929",
max_tokens=4096,
)
response = adapter.execute(
"Analyze this code",
context={
"model": "claude-3-opus-20240229", # Override model
"temperature": 0.7,
"system": "You are a code reviewer.",
}
)
Supported Models:
- claude-sonnet-4-5-20250929 (recommended)
- claude-3-5-sonnet-20241022
- claude-3-opus-20240229
- claude-3-sonnet-20240229
- claude-3-haiku-20240307
OpenAIAdapter
from netrun.llm import OpenAIAdapter
adapter = OpenAIAdapter(
default_model="gpt-4-turbo",
max_tokens=4096,
timeout=30,
)
response = adapter.execute(
"Write a Python function to sort a list",
context={
"model": "gpt-4o",
"temperature": 0.5,
}
)
Supported Models:
- gpt-4-turbo (recommended)
- gpt-4o, gpt-4o-mini
- gpt-4
- gpt-3.5-turbo
OllamaAdapter (Local/Free)
from netrun.llm import OllamaAdapter
adapter = OllamaAdapter(
model="llama3",
host="http://localhost:11434",
fallback_hosts=["http://backup-server:11434"],
)
# Check if Ollama is running
if adapter.check_availability():
response = adapter.execute("Hello!")
print(response.content)
print(f"Cost: ${response.cost_usd}") # Always $0.00
# List available models
models = adapter.list_available_models()
print(f"Available: {models}")
Supported Models:
- llama3, llama3.1, llama3.2
- codellama
- mistral
- phi-3
- gemma2
- qwen2
Fallback Chain
Default Chain
from netrun.llm import LLMFallbackChain
# Default: Claude -> OpenAI -> Ollama
chain = LLMFallbackChain()
Custom Chain
from netrun.llm import LLMFallbackChain, ClaudeAdapter, OpenAIAdapter, OllamaAdapter
# Cost-optimized: Free first, premium last
chain = LLMFallbackChain(adapters=[
OllamaAdapter(model="llama3"), # Free
OpenAIAdapter(default_model="gpt-3.5-turbo"), # Cheap
ClaudeAdapter(), # Premium fallback
])
response = chain.execute("Simple question")
print(f"Cost: ${response.cost_usd}") # Likely $0.00 if Ollama available
Chain Metrics
metrics = chain.get_metrics()
print(f"Success rate: {metrics['success_rate']:.1f}%")
print(f"Fallback rate: {metrics['fallback_rate']:.1f}%")
print(f"Total cost: ${metrics['total_cost_usd']:.4f}")
print(f"Adapter usage: {metrics['adapter_usage']}")
Three-Tier Cognition
The cognition system provides progressive response generation with latency targets:
| Tier | Target Latency | Purpose |
|---|---|---|
| FAST_ACK | <100ms | Immediate acknowledgment |
| RAG | <2s | Knowledge-enhanced response |
| DEEP | <5s | Full LLM reasoning |
Streaming Mode
import asyncio
from netrun.llm import ThreeTierCognition, CognitionTier
async def chat():
cognition = ThreeTierCognition()
async for response in cognition.stream_response("Explain quantum computing"):
print(f"[{response.tier.name}] {response.content}")
print(f" Latency: {response.latency_ms}ms, Final: {response.is_final}")
asyncio.run(chat())
Blocking Mode
async def quick_answer():
cognition = ThreeTierCognition()
# Returns best response within timeout
response = await cognition.execute("What is 2+2?", min_confidence=0.5)
print(f"Answer: {response.content}")
print(f"Tier: {response.tier.name}, Confidence: {response.confidence}")
asyncio.run(quick_answer())
With RAG Integration
from netrun.llm import ThreeTierCognition
async def retrieve_documents(query: str) -> list[str]:
"""Your document retrieval function."""
# Could use Pinecone, Chroma, etc.
return ["Relevant document 1", "Relevant document 2"]
cognition = ThreeTierCognition(
enable_rag=True,
rag_retrieval=retrieve_documents,
)
Error Handling
from netrun.llm import (
LLMFallbackChain,
AllAdaptersFailedError,
RateLimitError,
CircuitBreakerOpenError,
)
chain = LLMFallbackChain()
try:
response = chain.execute("Test prompt")
except AllAdaptersFailedError as e:
print(f"All adapters failed: {e.failed_adapters}")
print(f"Errors: {e.errors}")
except RateLimitError as e:
print(f"Rate limited on {e.adapter_name}")
print(f"Retry after: {e.retry_after_seconds}s")
except CircuitBreakerOpenError as e:
print(f"Circuit breaker open for {e.adapter_name}")
print(f"Cooldown: {e.cooldown_remaining_seconds}s")
Pricing Reference (2025)
| Provider | Model | Input (per 1M tokens) | Output (per 1M tokens) |
|---|---|---|---|
| Claude | Sonnet 4.5/3.5 | $3.00 | $15.00 |
| Claude | Opus 3 | $15.00 | $75.00 |
| Claude | Haiku 3 | $0.25 | $1.25 |
| OpenAI | GPT-4 Turbo | $10.00 | $30.00 |
| OpenAI | GPT-4o | $5.00 | $15.00 |
| OpenAI | GPT-3.5 Turbo | $0.50 | $1.50 |
| Ollama | All models | $0.00 | $0.00 |
License
MIT License - see LICENSE file for details.
Contributing
Contributions welcome! Please see CONTRIBUTING.md for guidelines.
Support
- Documentation: https://netrunsystems.com/docs/netrun-llm
- Issues: https://github.com/netrunsystems/netrun-service-library/issues
- Email: support@netrunsystems.com
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file netrun_llm-2.0.0.tar.gz.
File metadata
- Download URL: netrun_llm-2.0.0.tar.gz
- Upload date:
- Size: 44.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
15f74fea971c3fb27393827350e57252bd771c665c1c44ff053d212453b44f56
|
|
| MD5 |
2e60a74ccadf2991c0f46d1511985805
|
|
| BLAKE2b-256 |
caae3360f54666d780eba1605e35dcd010c19fd1a7d0abc21c92e5f2e138f457
|
File details
Details for the file netrun_llm-2.0.0-py3-none-any.whl.
File metadata
- Download URL: netrun_llm-2.0.0-py3-none-any.whl
- Upload date:
- Size: 61.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
95903f6d690aef342edf4be54d669c7d256da31e6a0cf82d88459d6587fce3fe
|
|
| MD5 |
f643d8cef585eeba51ee38f8e3499398
|
|
| BLAKE2b-256 |
6590b75201f86202b024f23045bfe2327b0ccf614513b0d4af11ff148f7a2997
|