AI model router MCP server with multi-provider support (Gemini, Groq, Cerebras)
Project description
RouteMCP — AI Model Router MCP Server
Servidor MCP que clasifica tareas y enruta prompts al mejor modelo de IA disponible. Soporta Google Gemini, Groq y Cerebras con failover automático.
How It Works / Cómo Funciona
- Classify — Analiza el prompt usando IA (Groq
llama-3.1-8b) con un fallback inteligente a palabras clave para detectar el tipo de tarea (code, math, reasoning, creative, vision, long_context, multilingual, general). - Route — Selecciona el mejor modelo según la tarea usando las prioridades configuradas en
config.json. - Fallback — Si el mejor modelo falla (API error, timeout, rate limit 429 con Retry-After), prueba el siguiente en la lista.
- Ask — Envía el prompt directamente a un modelo específico.
- Compare — Envía el mismo prompt a múltiples modelos en paralelo (concurrencia) y compara resultados rápidamente.
Routing Logic / Lógica de Enrutamiento (Vía config.json)
| Task Type | Preferred Models |
|---|---|
| code | gemini-2.5-pro → llama-3.3-70b → gemini-2.5-flash → cerebras-llama-3.3-70b |
| reasoning | gemini-2.5-pro → llama-3.3-70b → gemini-2.5-flash |
| math | gemini-2.5-pro → gemini-2.5-flash → llama-3.3-70b |
| creative | gemini-2.0-flash → gemini-3-flash-preview → mixtral-8x7b |
| general | gemini-2.0-flash → gemini-2.5-flash → llama-3.3-70b → gemini-3-flash-preview |
| vision | gemini-2.5-pro → gemini-2.0-flash → gemini-2.5-flash → gemini-3-flash-preview |
| long_context | gemini-2.5-pro → gemini-2.0-flash → gemini-2.5-flash → gemini-3-flash-preview |
| speed | llama-3.1-8b → cerebras-llama-3.3-70b → llama-3.3-70b → gemini-2.0-flash |
| multilingual | gemini-2.0-flash → mixtral-8x7b → gemini-2.5-flash |
Features / Funcionalidades
| Tool / Herramienta | Description / Descripción |
|---|---|
ask |
Envía un prompt a un modelo específico |
models |
Lista modelos disponibles con capacidades, contexto y costo |
classify_task |
Clasifica un prompt y muestra los modelos recomendados |
route |
Enruta automáticamente al mejor modelo según la tarea |
compare |
Compara respuestas de múltiples modelos para un mismo prompt (Ejecución en paralelo) |
Nota sobre Configuración: Toda la lógica de enrutamiento, lista de modelos y palabras clave se genera y lee desde un archivo
config.jsonen la raíz del proyecto. ¡Puedes editarlo para personalizar tus modelos sin tocar el código fuente!
Providers / Proveedores
| Provider | API Key | Models |
|---|---|---|
| Google Gemini | GEMINI_API_KEY |
gemini-2.5-pro, gemini-2.0-flash, gemini-2.5-flash, gemini-3-flash-preview |
| Groq | GROQ_API_KEY |
llama-3.3-70b, llama-3.1-8b, mixtral-8x7b |
| Cerebras | CEREBRAS_API_KEY |
cerebras-llama-3.3-70b |
Tech Stack
- Python —
>=3.11 - Framework:
mcp(FastMCP) via stdio JSON-RPC - HTTP:
httpx(async) con manejo de límites de tasa (Retry-After) - Classifier: Híbrido (LLM basado en Groq
llama-3.1-8b+ Keyword-based fallback) - Configuration: Externa vía
config.json
🔧 Recent Improvements
- Cerebras Model Fixed — Model parameter is now properly mapped (was always sending
llama3.3-70bregardless of input) - Retry Logic Deduplicated — Shared
retry_ask()helper inbase.pyreplaces 3 copies of identical retry code is_available()Cached — Provider availability cached for 60s TTL (no HTTP call on every routing decision)- Configurable Temperature/Max Tokens —
ask()andcompare()tools now accepttemperatureandmax_tokensparameters - Lazy Provider Init — Providers are created only when first needed, not at server startup
- Provider-Model Validation — Engine validates that the requested model belongs to the correct provider before forwarding
- Prompt Injection Mitigation — Classifier truncates user input to 1000 chars with clear delimiters
- Atomic Config Write —
config.jsonuses.tmp+os.replace()to prevent corruption on concurrent writes - Module-Level Imports —
classify_taskno longer re-imports modules on every call
Quick Start
# Configurar API keys
export GEMINI_API_KEY="..."
export GROQ_API_KEY="..."
export CEREBRAS_API_KEY="..."
# Instalar
pip install mcp httpx
# Ejecutar servidor
python server.py
Ejemplos
# Listar modelos disponibles
result = await session.call_tool("models", {})
# Clasificar tarea
result = await session.call_tool("classify_task", {"prompt": "write a Python function"})
# Enrutar automáticamente
result = await session.call_tool("route", {"prompt": "explain quantum computing"})
# Preguntar a un modelo específico
result = await session.call_tool("ask", {"model": "gemini-2.0-flash", "prompt": "hello"})
# Comparar modelos
result = await session.call_tool("compare", {
"prompt": "solve 2+2",
"models": "gemini-2.0-flash,llama-3.3-70b"
})
Project Structure
routemcp/
├── server.py # MCP server entry point (tools)
├── router/
│ ├── __init__.py
│ ├── engine.py # RouterEngine: routing & fallback logic, async compare
│ ├── classifier.py # Task classifier (LLM Hybrid + keyword scoring)
│ ├── models.py # Model definitions & config.json loader
│ └── providers/
│ ├── __init__.py
│ ├── base.py # AIProvider base class & ProviderError
│ ├── google_provider.py # Google Gemini API
│ ├── groq_provider.py # Groq API
│ └── cerebras_provider.py # Cerebras API
├── client.py # Test client CLI
└── pyproject.toml
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file routemcp-1.27.1.tar.gz.
File metadata
- Download URL: routemcp-1.27.1.tar.gz
- Upload date:
- Size: 13.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f48b5eda6aec8ba424c8a3a2984ee5ef2fbf0830ba19d30beadaddf5c1e17cd9
|
|
| MD5 |
0a03de4ba49b5da57f910bd07e1c7e08
|
|
| BLAKE2b-256 |
455cadd324b813922f7207a3993afe4b12fe2a8114d3a6cb15e2eda49e519734
|
File details
Details for the file routemcp-1.27.1-py3-none-any.whl.
File metadata
- Download URL: routemcp-1.27.1-py3-none-any.whl
- Upload date:
- Size: 35.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c2b3aac5abc63285fdf34bc843595ca88bf8fa206a5ad7c07317eb32a3dcd44e
|
|
| MD5 |
20d82f3a7168ae4e67e3b2e8d478abe5
|
|
| BLAKE2b-256 |
8d2afb015f0c42cf22db47dd9264c60ede7b313959320efff76d0ce45631a91c
|