Intelligent routing and management for multiple LangChain ChatModel instances with rate limiting, fallback, and structured output support
Project description
langchain-fused-model
Intelligent routing and management for multiple LangChain ChatModel instances with advanced features like rate limiting, automatic fallback, and structured output support.
Table of Contents
- Overview
- Why langchain-fused-model
- Features
- Installation
- Quick Start
- Routing Strategies
- Structured Output
- Rate Limiting and Fallback
- LangChain Integration
- Usage Statistics
- Advanced Configuration
- Examples
- Requirements
- Contributing
- License
- Support
Overview
langchain-fused-model provides a MultiModelManager class that acts as a unified interface for managing multiple LangChain ChatModel instances. It enables dynamic model selection based on configurable strategies while maintaining full LangChain compatibility.
The manager inherits from LangChain's BaseChatModel, making it a drop-in replacement for any ChatModel in chains, agents, and other workflows.
Why langchain-fused-model
Many developers today rely on multiple large language model providers to balance cost, availability, latency, and capabilities. However, LangChain does not provide a unified interface to dynamically route across multiple models based on rate limits or priorities. This project was created to fill that gap.
Whether you're managing free-tier APIs, orchestrating across OpenAI and Anthropic, or experimenting with cost-based strategies, langchain-fused-model helps you:
- Fail gracefully when APIs are throttled or down
- Reduce latency or cost by routing requests optimally
- Extract structured outputs even from models that don't support it natively
- Scale production chains and agents with built-in observability and fallback
Features
- Multiple Routing Strategies: Priority-based, round-robin, least-used, and cost-aware routing
- Automatic Rate Limiting: Per-model rate limits (RPM/RPS) with automatic fallback
- Error Resilience: Automatic fallback to alternative models on failures
- Structured Output: Pydantic-validated responses with native support detection and fallback
- Full LangChain Compatibility: Implements BaseChatModel and Runnable interfaces
- Usage Tracking: Monitor requests, tokens, and success rates per model
- Extensible: Support for custom routing strategies and error handlers
- Production Ready: Comprehensive logging and error handling
Installation
Install from PyPI:
pip install langchain-fused-model
For development installation:
git clone https://github.com/yourusername/langchain-fused-model
cd langchain-fused-model
pip install -e .
Quick Start
Here's a simple example to get you started:
from langchain_openai import ChatOpenAI
from langchain_anthropic import ChatAnthropic
from langchain_fused_model import MultiModelManager, RoutingStrategy
# Initialize your models
models = [
ChatOpenAI(model="gpt-4"),
ChatOpenAI(model="gpt-3.5-turbo"),
ChatAnthropic(model="claude-3-opus-20240229"),
]
# Create manager with priority-based routing
manager = MultiModelManager(
models=models,
strategy=RoutingStrategy.PRIORITY
)
# Use like any LangChain ChatModel
response = manager.invoke("What is the capital of France?")
print(response.content)
Routing Strategies
The MultiModelManager supports multiple routing strategies to control how requests are distributed across models.
Priority-Based Routing
Routes requests to the highest priority available model. Perfect for preferring premium models with fallback to cheaper alternatives.
from langchain_fused_model import MultiModelManager, RoutingStrategy, ModelConfig
configs = [
ModelConfig(priority=100, max_rpm=60), # Highest priority - GPT-4
ModelConfig(priority=50, max_rpm=120), # Medium priority - GPT-3.5
ModelConfig(priority=10, max_rpm=200), # Lowest priority - Local model
]
manager = MultiModelManager(
models=models,
model_configs=configs,
strategy=RoutingStrategy.PRIORITY
)
Cost-Aware Routing
Automatically routes to the lowest cost model based on cost_per_1k_tokens. Ideal for cost optimization.
configs = [
ModelConfig(cost_per_1k_tokens=0.03), # GPT-4 - $0.03/1k tokens
ModelConfig(cost_per_1k_tokens=0.002), # GPT-3.5 - $0.002/1k tokens
ModelConfig(cost_per_1k_tokens=0.015), # Claude - $0.015/1k tokens
]
manager = MultiModelManager(
models=models,
model_configs=configs,
strategy=RoutingStrategy.COST_AWARE
)
Round-Robin Routing
Distributes requests evenly across all available models. Great for load balancing.
manager = MultiModelManager(
models=models,
strategy=RoutingStrategy.ROUND_ROBIN
)
Least-Used Routing
Routes to the model with the fewest total requests. Helps balance usage across models.
manager = MultiModelManager(
models=models,
strategy=RoutingStrategy.LEAST_USED
)
Custom Strategies
You can provide a custom routing function for advanced use cases:
def custom_strategy(models, configs, usage_stats, available_models):
"""Custom strategy: prefer models with highest success rate."""
best_model = available_models[0]
best_rate = 0.0
for idx in available_models:
stats = usage_stats.get(idx)
if stats and stats.total_requests > 0:
success_rate = stats.successful_requests / stats.total_requests
if success_rate > best_rate:
best_rate = success_rate
best_model = idx
return best_model
manager = MultiModelManager(
models=models,
strategy=custom_strategy
)
Structured Output
Get Pydantic-validated responses from any model, with automatic fallback for models without native structured output support.
from pydantic import BaseModel, Field
class Person(BaseModel):
"""Information about a person."""
name: str = Field(description="The person's full name")
age: int = Field(description="The person's age in years")
occupation: str = Field(description="The person's job or profession")
# Create structured output runnable
structured_manager = manager.with_structured_output(Person)
# Get validated Pydantic object
person = structured_manager.invoke("Tell me about Albert Einstein")
print(f"{person.name} was {person.age} years old and worked as a {person.occupation}")
# Output: Albert Einstein was 76 years old and worked as a Theoretical Physicist
The structured output handler automatically:
- Detects if the model has native structured output support
- Uses native support when available for better performance
- Falls back to prompt injection and JSON parsing when needed
- Validates all responses against your Pydantic schema
Rate Limiting and Fallback
Configure per-model rate limits and automatic fallback behavior:
from langchain_fused_model import ModelConfig
configs = [
ModelConfig(
priority=100,
max_rpm=60, # 60 requests per minute
max_rps=2, # 2 requests per second
timeout=30.0, # 30 second timeout
retry_on_errors=[TimeoutError, ConnectionError]
),
ModelConfig(
priority=50,
max_rpm=120, # Fallback model with higher limits
),
]
manager = MultiModelManager(
models=models,
model_configs=configs,
strategy=RoutingStrategy.PRIORITY,
default_fallback=True # Enable automatic fallback
)
# Automatically falls back if rate limit exceeded or errors occur
response = manager.invoke("Your prompt here")
When a model fails or hits rate limits:
- The manager automatically selects the next available model
- A cooldown period is set for rate-limited models
- The request is retried with the new model
- All failures are logged for monitoring
LangChain Integration
The MultiModelManager works seamlessly with all LangChain features:
Chains
from langchain.prompts import ChatPromptTemplate
from langchain.schema.output_parser import StrOutputParser
# Use in chains with the pipe operator
prompt = ChatPromptTemplate.from_template("Tell me a joke about {topic}")
chain = prompt | manager | StrOutputParser()
result = chain.invoke({"topic": "programming"})
print(result)
Batch Processing
# Process multiple inputs in parallel
questions = [
"What is Python?",
"What is JavaScript?",
"What is Rust?"
]
responses = manager.batch(questions)
for response in responses:
print(response.content)
Streaming (if supported by underlying models)
# Stream responses token by token
for chunk in manager.stream("Write a long story about AI"):
print(chunk.content, end="", flush=True)
Agents
from langchain.agents import AgentExecutor, create_openai_functions_agent
from langchain.tools import Tool
# Use as the LLM for agents
agent = create_openai_functions_agent(manager, tools, prompt)
agent_executor = AgentExecutor(agent=agent, tools=tools)
result = agent_executor.invoke({"input": "What's the weather in Paris?"})
Usage Statistics
Monitor model performance and usage:
# Get statistics for all models
stats = manager._usage_tracker.get_all_stats()
for model_idx, stat in stats.items():
print(f"\nModel {model_idx} ({models[model_idx]._llm_type}):")
print(f" Total requests: {stat.total_requests}")
print(f" Successful: {stat.successful_requests}")
print(f" Failed: {stat.failed_requests}")
if stat.total_requests > 0:
success_rate = stat.successful_requests / stat.total_requests
print(f" Success rate: {success_rate:.2%}")
print(f" Total tokens: {stat.total_tokens}")
if stat.last_used:
import datetime
last_used = datetime.datetime.fromtimestamp(stat.last_used)
print(f" Last used: {last_used}")
# Get statistics for a specific model
model_0_stats = manager._usage_tracker.get_stats(0)
print(f"Model 0 has handled {model_0_stats.total_requests} requests")
Advanced Configuration
Complete Configuration Example
from langchain_fused_model import MultiModelManager, ModelConfig, RoutingStrategy
configs = [
ModelConfig(
priority=100, # Highest priority
max_rpm=60, # Rate limits
max_rps=2,
cost_per_1k_tokens=0.03, # Cost tracking
timeout=30.0, # Request timeout
retry_on_errors=[ # Custom retry conditions
TimeoutError,
ConnectionError,
]
),
ModelConfig(
priority=50,
max_rpm=120,
max_rps=5,
cost_per_1k_tokens=0.002,
timeout=20.0,
),
]
manager = MultiModelManager(
models=models,
model_configs=configs,
strategy=RoutingStrategy.PRIORITY,
default_fallback=True
)
Logging Configuration
The package uses Python's standard logging module:
import logging
# Enable debug logging to see model selection decisions
logging.basicConfig(level=logging.DEBUG)
logger = logging.getLogger('langchain_fused_model')
logger.setLevel(logging.DEBUG)
# Now you'll see detailed logs about model selection and fallback
response = manager.invoke("Test prompt")
Examples
Check out the examples/ directory for Jupyter notebooks demonstrating:
- basic_usage.ipynb: Getting started with MultiModelManager
- routing_strategies.ipynb: Comparing all routing strategies
- structured_output.ipynb: Working with Pydantic models and structured data
Requirements
- Python 3.8+
- langchain-core >= 0.1.0
- pydantic >= 2.0.0
Optional dependencies for specific providers:
- langchain-openai (for OpenAI models)
- langchain-anthropic (for Anthropic models)
- langchain-google-genai (for Google models)
Contributing
Contributions are welcome! Please see CONTRIBUTING.md for guidelines.
To set up for development:
git clone https://github.com/yourusername/langchain-fused-model
cd langchain-fused-model
pip install -e ".[dev]"
pytest tests/
License
MIT License - see LICENSE file for details.
Support
- GitHub Issues: https://github.com/yourusername/langchain-fused-model/issues
- Documentation: https://github.com/yourusername/langchain-fused-model#readme
- Examples: See the
examples/directory for Jupyter notebooks
Note: This package is designed to work with any LangChain-compatible ChatModel. Make sure to install the appropriate provider packages (e.g., langchain-openai, langchain-anthropic) for the models you want to use.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file langchain_fused_model-0.1.1.tar.gz.
File metadata
- Download URL: langchain_fused_model-0.1.1.tar.gz
- Upload date:
- Size: 23.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
27a204ed1967066441f0c2bdf4027c75a56da803003ab2f8defd34a00cfc2469
|
|
| MD5 |
1ea25c3f3ef7302865558dda4a9cbb3d
|
|
| BLAKE2b-256 |
80ec82a5e6b590ee43b30f9923345830e3b73e3884d92c5168301671df6627f3
|
Provenance
The following attestation bundles were made for langchain_fused_model-0.1.1.tar.gz:
Publisher:
python-publish.yml on sezer-muhammed/langchain-fused-models
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
langchain_fused_model-0.1.1.tar.gz -
Subject digest:
27a204ed1967066441f0c2bdf4027c75a56da803003ab2f8defd34a00cfc2469 - Sigstore transparency entry: 672558794
- Sigstore integration time:
-
Permalink:
sezer-muhammed/langchain-fused-models@41af5e60b70f9aa91ee391285540080e226730dd -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/sezer-muhammed
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@41af5e60b70f9aa91ee391285540080e226730dd -
Trigger Event:
release
-
Statement type:
File details
Details for the file langchain_fused_model-0.1.1-py3-none-any.whl.
File metadata
- Download URL: langchain_fused_model-0.1.1-py3-none-any.whl
- Upload date:
- Size: 20.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f5cbfee558f63b53955a68aa8048be566002632e243dfb2eafaa755d2504dc40
|
|
| MD5 |
4663f12983908c4ba8df8cd60d819f13
|
|
| BLAKE2b-256 |
31446293eed06e1b7de82f56f1a8c3098cb20795d65a6efd9d4dd7a2005b3cc2
|
Provenance
The following attestation bundles were made for langchain_fused_model-0.1.1-py3-none-any.whl:
Publisher:
python-publish.yml on sezer-muhammed/langchain-fused-models
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
langchain_fused_model-0.1.1-py3-none-any.whl -
Subject digest:
f5cbfee558f63b53955a68aa8048be566002632e243dfb2eafaa755d2504dc40 - Sigstore transparency entry: 672558799
- Sigstore integration time:
-
Permalink:
sezer-muhammed/langchain-fused-models@41af5e60b70f9aa91ee391285540080e226730dd -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/sezer-muhammed
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@41af5e60b70f9aa91ee391285540080e226730dd -
Trigger Event:
release
-
Statement type: