Skip to main content

Intelligent LLM Router - Route requests across multiple LLM providers with automatic failover, quota management, and caching

Project description

Intelligent LLM Router

PyPI version Python versions License

Intelligent LLM Router is a Python library that routes requests across multiple LLM providers with automatic failover, quota management, and response caching.

Features

  • Multi-Provider Support: OpenAI, Anthropic, Google Gemini, Groq, Mistral, Cohere, DeepSeek, Together, HuggingFace, OpenRouter, xAI, DashScope, and Ollama
  • Automatic Failover: Automatically switches to the next provider if one fails
  • Quota Management: Tracks RPM/RPD limits per provider and routes around rate limits
  • Response Caching: Exact and semantic caching to reduce costs and latency
  • Multiple Routing Strategies: Auto, Cost-Optimized, Quality-First, Latency-First, Round-Robin
  • Vision & Embeddings: Full support for vision models and embeddings

Installation

pip install llm_router

Or with specific extras:

pip install llm_router[server]  # Includes FastAPI server

Quick Start

import asyncio
from llm_router import IntelligentRouter, RoutingOptions, RoutingStrategy

async def main():
    # Initialize the router
    router = IntelligentRouter()
    await router.start()
    
    # Define your request
    request_data = {
        "messages": [
            {"role": "user", "content": "What is the capital of France?"}
        ],
        "temperature": 0.7,
    }
    
    # Configure routing options (optional)
    options = RoutingOptions(
        strategy=RoutingStrategy.AUTO,
    )
    
    # Route the request
    response = await router.route(request_data, options)
    
    print(response["choices"][0]["message"]["content"])
    print(f"Provider: {response['routing_metadata']['provider']}")
    
    await router.stop()

asyncio.run(main())

Configuration

Environment Variables

All configuration can be done via environment variables with the ROUTER_ prefix:

Variable Default Description
ROUTER_LLM_TIMEOUT 60 Timeout for LLM calls in seconds
ROUTER_MAX_RETRIES 3 Maximum retry attempts
ROUTER_ENABLE_OLLAMA_FALLBACK true Enable Ollama fallback when cloud providers fail
ROUTER_CACHE_DIR /tmp/llm_router_cache Directory for response cache
ROUTER_RESPONSE_CACHE_TTL 3600 Cache TTL in seconds
ROUTER_DEFAULT_STRATEGY auto Default routing strategy

Provider API Keys

Set API keys as environment variables:

export OPENAI_API_KEY=sk-...
export ANTHROPIC_API_KEY=sk-ant-...
export GROQ_API_KEY=gsk_...
export GEMINI_API_KEY=AIza...
# etc.

Routing Strategies

Strategy Description
auto Balanced: quota + latency + quality (default)
cost_optimized Maximize remaining free quota
quality_first Prioritize highest quality models
latency_first Prioritize fastest responding models
round_robins Uniform spread across providers

Request Options

from llm_router import RoutingOptions, CachePolicy

options = RoutingOptions(
    strategy=RoutingStrategy.COST_OPTIMIZED,
    free_tier_only=True,  # Only use free tier providers
    preferred_providers=["groq", "gemini"],  # Prefer these providers
    excluded_providers=["openai"],  # Skip these providers
    cache_policy=CachePolicy.ENABLED,  # Enable response caching
)

API Reference

IntelligentRouter

router = IntelligentRouter()
await router.start()  # Initialize router
response = await router.route(request_data, options)  # Route request
stats = router.get_stats()  # Get router statistics
await router.stop()  # Cleanup

Models

  • RoutingOptions: Configuration for routing behavior
  • RoutingStrategy: Enum for routing strategies
  • TaskType: Enum for task types (chat, embeddings, vision, etc.)
  • CachePolicy: Enum for caching behavior
  • settings: Global settings object

Running the Server

# Install with server extras
pip install llm_router[server]

# Run the server
llm-router

# Or with custom settings
ROUTER_PORT=8000 llm-router

The server provides a FastAPI-compatible API at /v1/chat/completions, /v1/embeddings, and /health.

Development

# Clone the repository
git clone https://github.com/anomalyco/llm_router.git

# Install in development mode
pip install -e ".[dev]"

# Run tests
pytest

# Run linting
ruff check src/

License

MIT License - see LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

routeme-0.1.2.tar.gz (47.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

routeme-0.1.2-py3-none-any.whl (41.8 kB view details)

Uploaded Python 3

File details

Details for the file routeme-0.1.2.tar.gz.

File metadata

  • Download URL: routeme-0.1.2.tar.gz
  • Upload date:
  • Size: 47.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for routeme-0.1.2.tar.gz
Algorithm Hash digest
SHA256 891d0ca372b4beb934fe2e3626bcaa76c404d7319675d40c28b00bb5e64661ef
MD5 3b9d0ef425e2e717504f1aaa7d9e109b
BLAKE2b-256 51189db2d0ef0f8140dbc7697c14ac804316de8e29cba2cf9a1e22e7fa1b0c2d

See more details on using hashes here.

File details

Details for the file routeme-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: routeme-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 41.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for routeme-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 c42c3623b23b1e0d8670b390d2cb502449ef1dc00fef4605e01410a511a1fe97
MD5 832730bd2f942861a6a96858d841f812
BLAKE2b-256 6ba9a8c69a40a5240e349e2f8fe059da86fe085800cbc91e97f38f0d60b34d0e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page