Skip to main content

Intelligent LLM Router - Route requests across multiple LLM providers with automatic failover, quota management, and caching

Project description

Intelligent LLM Router

PyPI version Python versions License

Intelligent LLM Router is a Python library that routes requests across multiple LLM providers with automatic failover, quota management, and response caching.

Features

  • Multi-Provider Support: OpenAI, Anthropic, Google Gemini, Groq, Mistral, Cohere, DeepSeek, Together, HuggingFace, OpenRouter, xAI, DashScope, and Ollama
  • Automatic Failover: Automatically switches to the next provider if one fails
  • Quota Management: Tracks RPM/RPD limits per provider and routes around rate limits
  • Response Caching: Exact and semantic caching to reduce costs and latency
  • Multiple Routing Strategies: Auto, Cost-Optimized, Quality-First, Latency-First, Round-Robin
  • Vision & Embeddings: Full support for vision models and embeddings

Installation

pip install llm_router

Or with specific extras:

pip install llm_router[server]  # Includes FastAPI server

Quick Start

import asyncio
from llm_router import IntelligentRouter, RoutingOptions, RoutingStrategy

async def main():
    # Initialize the router
    router = IntelligentRouter()
    await router.start()
    
    # Define your request
    request_data = {
        "messages": [
            {"role": "user", "content": "What is the capital of France?"}
        ],
        "temperature": 0.7,
    }
    
    # Configure routing options (optional)
    options = RoutingOptions(
        strategy=RoutingStrategy.AUTO,
    )
    
    # Route the request
    response = await router.route(request_data, options)
    
    print(response["choices"][0]["message"]["content"])
    print(f"Provider: {response['routing_metadata']['provider']}")
    
    await router.stop()

asyncio.run(main())

Configuration

Environment Variables

All configuration can be done via environment variables with the ROUTER_ prefix:

Variable Default Description
ROUTER_LLM_TIMEOUT 60 Timeout for LLM calls in seconds
ROUTER_MAX_RETRIES 3 Maximum retry attempts
ROUTER_ENABLE_OLLAMA_FALLBACK true Enable Ollama fallback when cloud providers fail
ROUTER_CACHE_DIR /tmp/llm_router_cache Directory for response cache
ROUTER_RESPONSE_CACHE_TTL 3600 Cache TTL in seconds
ROUTER_DEFAULT_STRATEGY auto Default routing strategy

Provider API Keys

Set API keys as environment variables:

export OPENAI_API_KEY=sk-...
export ANTHROPIC_API_KEY=sk-ant-...
export GROQ_API_KEY=gsk_...
export GEMINI_API_KEY=AIza...
# etc.

Routing Strategies

Strategy Description
auto Balanced: quota + latency + quality (default)
cost_optimized Maximize remaining free quota
quality_first Prioritize highest quality models
latency_first Prioritize fastest responding models
round_robins Uniform spread across providers

Request Options

from llm_router import RoutingOptions, CachePolicy

options = RoutingOptions(
    strategy=RoutingStrategy.COST_OPTIMIZED,
    free_tier_only=True,  # Only use free tier providers
    preferred_providers=["groq", "gemini"],  # Prefer these providers
    excluded_providers=["openai"],  # Skip these providers
    cache_policy=CachePolicy.ENABLED,  # Enable response caching
)

API Reference

IntelligentRouter

router = IntelligentRouter()
await router.start()  # Initialize router
response = await router.route(request_data, options)  # Route request
stats = router.get_stats()  # Get router statistics
await router.stop()  # Cleanup

Models

  • RoutingOptions: Configuration for routing behavior
  • RoutingStrategy: Enum for routing strategies
  • TaskType: Enum for task types (chat, embeddings, vision, etc.)
  • CachePolicy: Enum for caching behavior
  • settings: Global settings object

Running the Server

# Install with server extras
pip install llm_router[server]

# Run the server
llm-router

# Or with custom settings
ROUTER_PORT=8000 llm-router

The server provides a FastAPI-compatible API at /v1/chat/completions, /v1/embeddings, and /health.

Development

# Clone the repository
git clone https://github.com/anomalyco/llm_router.git

# Install in development mode
pip install -e ".[dev]"

# Run tests
pytest

# Run linting
ruff check src/

License

MIT License - see LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

routeme-0.1.1.tar.gz (46.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

routeme-0.1.1-py3-none-any.whl (41.2 kB view details)

Uploaded Python 3

File details

Details for the file routeme-0.1.1.tar.gz.

File metadata

  • Download URL: routeme-0.1.1.tar.gz
  • Upload date:
  • Size: 46.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for routeme-0.1.1.tar.gz
Algorithm Hash digest
SHA256 026706f069b985ba504c0f30dc28e3ce43d56f7ab55714710d08edb09fcfcf85
MD5 fb52026760fa79287012492274f9fa40
BLAKE2b-256 ff5f09420d87d245a35cc8ba67e5d9226478215c0b5eb993cc3c84e7eb4aa773

See more details on using hashes here.

File details

Details for the file routeme-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: routeme-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 41.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for routeme-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 42487886fa62fa915ff8b16c4d19c270954bae1ade074315e8adb5a7d279ec73
MD5 89810dc4102c9cc7dba5286fcd220d47
BLAKE2b-256 bd50ea2283f17d7d9fb46c9510b79b2bde8d7190239267235c48a4b14198c4d6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page