Cost-aware model routing for LangChain agents based on task phase
Project description
langchain-router
Your agent doesn't need the expensive model for every call.
Most calls are just the model picking which file to read next or which pattern to search for. A smaller model does that fine. This middleware detects when the agent is doing that kind of work and routes to a fast model automatically.
Quick Install
pip install langchain-router
🤔 What is this?
Agent sessions have a pattern. The user says something, the agent thinks about it (planning). Then it reads files, searches code, runs commands (execution). Sometimes something breaks (recovery). Then the user says something again.
Planning and recovery need the primary model. Execution doesn't. RouterMiddleware detects which phase the agent is in and routes accordingly.
| What just happened | Phase | Model |
|---|---|---|
| User spoke | planning | primary |
| Tool call succeeded | execution | fast |
| Tool call failed | recovery | primary |
On a simulated 18-call session, 83% of calls route to the fast model.
from langchain.agents import create_agent
from langchain_router import RouterMiddleware
agent = create_agent(
model="anthropic:claude-sonnet-4-6",
tools=[...],
middleware=[RouterMiddleware(fast="anthropic:claude-haiku-4-5-20251001")],
)
With CollapseMiddleware
from langchain_collapse import CollapseMiddleware
middleware = [
CollapseMiddleware(),
RouterMiddleware(fast="anthropic:claude-haiku-4-5-20251001"),
]
flowchart TB
A["📥 37 messages"] --> B["CollapseMiddleware"]
B --> C["📥 9 messages"]
C --> D["RouterMiddleware"]
D --> E{"phase?"}
E --> |"execution · 83%"| F["⚡ Haiku"]
E --> |"planning"| G["🧠 Sonnet"]
E --> |"recovery"| G
style A fill:#ff6b6b,stroke:#e03131,color:#fff
style B fill:#339af0,stroke:#1c7ed6,color:#fff
style C fill:#339af0,stroke:#1c7ed6,color:#fff
style D fill:#51cf66,stroke:#2f9e44,color:#fff
style E fill:#fff3bf,stroke:#f59f00,color:#333
style F fill:#20c997,stroke:#099268,color:#fff
style G fill:#845ef7,stroke:#7048e8,color:#fff
On false positives
The error heuristic checks for error, traceback, exception, failed in tool output. Code containing those words (like def handle_error) routes to the primary model. That's the safe direction: more capability than needed, never less.
📖 Documentation
- Source (single file, ~170 lines)
- Benchmark (simulated session with cost breakdown)
- Tests (unit tests + property based invariant tests)
💁 Contributing
git clone https://github.com/johanity/langchain-router.git
cd langchain-router
pip install -e ".[test]"
pytest
📕 License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file langchain_router-0.1.0.tar.gz.
File metadata
- Download URL: langchain_router-0.1.0.tar.gz
- Upload date:
- Size: 13.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a932eade021dea20f8e2426113e20c853fc9bbc3f5a1cbc6492cefb7d7a59179
|
|
| MD5 |
33d05510383086d64059e3f342e9f8e3
|
|
| BLAKE2b-256 |
ede41ff483072525d4c9f1a345222ea757f616b5519176709cbe34cceb49cf41
|
File details
Details for the file langchain_router-0.1.0-py3-none-any.whl.
File metadata
- Download URL: langchain_router-0.1.0-py3-none-any.whl
- Upload date:
- Size: 5.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ed6847c869e772dca9d60cc8822f6a448ab06d646c62ad18016e246e8de922dc
|
|
| MD5 |
8de969dceed7a87b4d12a26342ceb7c9
|
|
| BLAKE2b-256 |
83d676820471b83491662a8e0cfd26c68cc8f6e8ee80ee7b58c9cbcfd9899590
|