Skip to main content

AI model router MCP server with multi-provider support (Gemini, Groq, Cerebras)

Project description

RouteMCP — AI Model Router MCP Server

CI PyPI Python License

Servidor MCP que clasifica tareas y enruta prompts al mejor modelo de IA disponible. Soporta Google Gemini, Groq y Cerebras con failover automático.

How It Works / Cómo Funciona

  1. Classify — Analiza el prompt usando IA (Groq llama-3.1-8b) con un fallback inteligente a palabras clave para detectar el tipo de tarea (code, math, reasoning, creative, vision, long_context, multilingual, general).
  2. Route — Selecciona el mejor modelo según la tarea usando las prioridades configuradas en config.json.
  3. Fallback — Si el mejor modelo falla (API error, timeout, rate limit 429 con Retry-After), prueba el siguiente en la lista.
  4. Ask — Envía el prompt directamente a un modelo específico.
  5. Compare — Envía el mismo prompt a múltiples modelos en paralelo (concurrencia) y compara resultados rápidamente.

Routing Logic / Lógica de Enrutamiento (Vía config.json)

Task Type Preferred Models
code gemini-2.5-pro → llama-3.3-70b → gemini-2.5-flash → cerebras-llama-3.3-70b
reasoning gemini-2.5-pro → llama-3.3-70b → gemini-2.5-flash
math gemini-2.5-pro → gemini-2.5-flash → llama-3.3-70b
creative gemini-2.0-flash → gemini-3-flash-preview → mixtral-8x7b
general gemini-2.0-flash → gemini-2.5-flash → llama-3.3-70b → gemini-3-flash-preview
vision gemini-2.5-pro → gemini-2.0-flash → gemini-2.5-flash → gemini-3-flash-preview
long_context gemini-2.5-pro → gemini-2.0-flash → gemini-2.5-flash → gemini-3-flash-preview
speed llama-3.1-8b → cerebras-llama-3.3-70b → llama-3.3-70b → gemini-2.0-flash
multilingual gemini-2.0-flash → mixtral-8x7b → gemini-2.5-flash

Features / Funcionalidades

Tool / Herramienta Description / Descripción
ask Envía un prompt a un modelo específico
models Lista modelos disponibles con capacidades, contexto y costo
classify_task Clasifica un prompt y muestra los modelos recomendados
route Enruta automáticamente al mejor modelo según la tarea
compare Compara respuestas de múltiples modelos para un mismo prompt (Ejecución en paralelo)

Nota sobre Configuración: Toda la lógica de enrutamiento, lista de modelos y palabras clave se genera y lee desde un archivo config.json en la raíz del proyecto. ¡Puedes editarlo para personalizar tus modelos sin tocar el código fuente!

Providers / Proveedores

Provider API Key Models
Google Gemini GEMINI_API_KEY gemini-2.5-pro, gemini-2.0-flash, gemini-2.5-flash, gemini-3-flash-preview
Groq GROQ_API_KEY llama-3.3-70b, llama-3.1-8b, mixtral-8x7b
Cerebras CEREBRAS_API_KEY cerebras-llama-3.3-70b

Tech Stack

  • Python>=3.11
  • Framework: mcp (FastMCP) via stdio JSON-RPC
  • HTTP: httpx (async) con manejo de límites de tasa (Retry-After)
  • Classifier: Híbrido (LLM basado en Groq llama-3.1-8b + Keyword-based fallback)
  • Configuration: Externa vía config.json

🔧 Recent Improvements

  • Cerebras Model Fixed — Model parameter is now properly mapped (was always sending llama3.3-70b regardless of input)
  • Retry Logic Deduplicated — Shared retry_ask() helper in base.py replaces 3 copies of identical retry code
  • is_available() Cached — Provider availability cached for 60s TTL (no HTTP call on every routing decision)
  • Configurable Temperature/Max Tokensask() and compare() tools now accept temperature and max_tokens parameters
  • Lazy Provider Init — Providers are created only when first needed, not at server startup
  • Provider-Model Validation — Engine validates that the requested model belongs to the correct provider before forwarding
  • Prompt Injection Mitigation — Classifier truncates user input to 1000 chars with clear delimiters
  • Atomic Config Writeconfig.json uses .tmp + os.replace() to prevent corruption on concurrent writes
  • Module-Level Importsclassify_task no longer re-imports modules on every call

Quick Start

# Configurar API keys
export GEMINI_API_KEY="..."
export GROQ_API_KEY="..."
export CEREBRAS_API_KEY="..."

# Instalar
pip install mcp httpx

# Ejecutar servidor
python server.py

Ejemplos

# Listar modelos disponibles
result = await session.call_tool("models", {})

# Clasificar tarea
result = await session.call_tool("classify_task", {"prompt": "write a Python function"})

# Enrutar automáticamente
result = await session.call_tool("route", {"prompt": "explain quantum computing"})

# Preguntar a un modelo específico
result = await session.call_tool("ask", {"model": "gemini-2.0-flash", "prompt": "hello"})

# Comparar modelos
result = await session.call_tool("compare", {
    "prompt": "solve 2+2",
    "models": "gemini-2.0-flash,llama-3.3-70b"
})

Project Structure

routemcp/
├── server.py                 # MCP server entry point (tools)
├── router/
│   ├── __init__.py
│   ├── engine.py             # RouterEngine: routing & fallback logic, async compare
│   ├── classifier.py         # Task classifier (LLM Hybrid + keyword scoring)
│   ├── models.py             # Model definitions & config.json loader
│   └── providers/
│       ├── __init__.py
│       ├── base.py           # AIProvider base class & ProviderError
│       ├── google_provider.py   # Google Gemini API
│       ├── groq_provider.py     # Groq API
│       └── cerebras_provider.py # Cerebras API
├── client.py                 # Test client CLI
└── pyproject.toml

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

routemcp-1.27.1.tar.gz (13.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

routemcp-1.27.1-py3-none-any.whl (35.1 kB view details)

Uploaded Python 3

File details

Details for the file routemcp-1.27.1.tar.gz.

File metadata

  • Download URL: routemcp-1.27.1.tar.gz
  • Upload date:
  • Size: 13.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for routemcp-1.27.1.tar.gz
Algorithm Hash digest
SHA256 f48b5eda6aec8ba424c8a3a2984ee5ef2fbf0830ba19d30beadaddf5c1e17cd9
MD5 0a03de4ba49b5da57f910bd07e1c7e08
BLAKE2b-256 455cadd324b813922f7207a3993afe4b12fe2a8114d3a6cb15e2eda49e519734

See more details on using hashes here.

File details

Details for the file routemcp-1.27.1-py3-none-any.whl.

File metadata

  • Download URL: routemcp-1.27.1-py3-none-any.whl
  • Upload date:
  • Size: 35.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for routemcp-1.27.1-py3-none-any.whl
Algorithm Hash digest
SHA256 c2b3aac5abc63285fdf34bc843595ca88bf8fa206a5ad7c07317eb32a3dcd44e
MD5 20d82f3a7168ae4e67e3b2e8d478abe5
BLAKE2b-256 8d2afb015f0c42cf22db47dd9264c60ede7b313959320efff76d0ce45631a91c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page