Skip to main content

Smart LLM routing across providers - automatically picks the most cost-efficient model for your prompt

Project description

orkestra

Smart LLM routing across providers. Automatically selects the most cost-efficient model for your prompt using KNN-based routing.

Install

pip install orkestra

# With provider extras
pip install orkestra[google]
pip install orkestra[anthropic]
pip install orkestra[openai]
pip install orkestra[all]

Quick Start

import orkestra as o

provider = o.Provider("google", "YOUR_GEMINI_API_KEY")
response = provider.chat("Explain quantum computing")

print(response.text)
print(f"Provider: {response.provider}")
print(f"Model: {response.model}")
print(f"Cost: ${response.cost:.6f}")
print(f"Saved: {response.savings_percent:.1f}%")

Multi-Provider Routing

Combine multiple providers and let orkestra pick the best one:

import orkestra as o

google = o.Provider("google", "GOOGLE_KEY")
anthropic = o.Provider("anthropic", "ANTHROPIC_KEY")
openai = o.Provider("openai", "OPENAI_KEY")

multi = o.MultiProvider([google, anthropic, openai])

# Route to cheapest option across all providers
response = multi.chat("What is 2+2?", strategy="cheapest")

# Route to smartest option
response = multi.chat("Prove the Riemann hypothesis", strategy="smartest")

# Balanced: prefer mid-tier models, break ties by cost
response = multi.chat("Write a Python function", strategy="balanced")

Streaming

provider = o.Provider("google", "YOUR_KEY")
for chunk in provider.stream_text("Write a poem"):
    print(chunk, end="")

Explore

======================================================================
PROVIDER MODEL CATALOG
======================================================================

  GOOGLE
    gemini-2.5-flash-lite        budget     $0.10/$0.40 per 1M tokens
    gemini-3-flash-preview       balanced   $0.50/$3.00 per 1M tokens
    gemini-3-pro-preview         premium    $2.00/$12.00 per 1M tokens

  ANTHROPIC
    claude-haiku-4               budget     $0.80/$4.00 per 1M tokens
    claude-sonnet-4-5            balanced   $3.00/$15.00 per 1M tokens
    claude-opus-4                premium    $15.00/$75.00 per 1M tokens

  OPENAI
    gpt-4o-mini                  budget     $0.15/$0.60 per 1M tokens
    gpt-4o                       balanced   $2.50/$10.00 per 1M tokens
    o3                           premium    $10.00/$40.00 per 1M tokens

======================================================================
ROUTER PREDICTIONS (which model gets picked per prompt)
======================================================================

  [Simple] What is the capital of Japan?
    google       → gemini-3-flash-preview (balanced)
    anthropic    → claude-sonnet-4-5 (balanced)
    openai       → gpt-4o (balanced)

  [Moderate] Explain how a hash table works with collision handling
    google       → gemini-3-flash-preview (balanced)
    anthropic    → claude-sonnet-4-5 (balanced)
    openai       → gpt-4o (balanced)

  [Complex] Implement a B-tree with insert and search in Python. Handle node ...
    google       → gemini-3-pro-preview (premium)
    anthropic    → claude-opus-4 (premium)
    openai       → o3 (premium)

======================================================================
COST SIMULATION (500 input tokens, 1000 output tokens)
======================================================================

  [Simple] What is the capital of Japan?
    google       gemini-3-flash-preview       $0.003250  (saves $0.009750, 75% vs gemini-3-pro-preview)
    anthropic    claude-sonnet-4-5            $0.016500  (saves $0.066000, 80% vs claude-opus-4)
    openai       gpt-4o                       $0.011250  (saves $0.033750, 75% vs o3)

  [Moderate] Explain how a hash table works with collision handling
    google       gemini-3-flash-preview       $0.003250  (saves $0.009750, 75% vs gemini-3-pro-preview)
    anthropic    claude-sonnet-4-5            $0.016500  (saves $0.066000, 80% vs claude-opus-4)
    openai       gpt-4o                       $0.011250  (saves $0.033750, 75% vs o3)

  [Complex] Implement a B-tree with insert and search in Python. Handle node ...
    google       gemini-3-pro-preview         $0.013000  (saves $0.000000, 0% vs gemini-3-pro-preview)
    anthropic    claude-opus-4                $0.082500  (saves $0.000000, 0% vs claude-opus-4)
    openai       o3                           $0.045000  (saves $0.000000, 0% vs o3)

======================================================================
STRATEGY COMPARISON (multi-provider selection)
======================================================================

  [Simple] What is the capital of Japan?
    Per-provider routes: google→gemini-3-flash-preview, anthropic→claude-sonnet-4-5, openai→gpt-4o
    cheapest   → google/gemini-3-flash-preview (balanced) $0.003250
    smartest   → google/gemini-3-flash-preview (balanced) $0.003250
    balanced   → google/gemini-3-flash-preview (balanced) $0.003250

  [Moderate] Explain how a hash table works with collision handling
    Per-provider routes: google→gemini-3-flash-preview, anthropic→claude-sonnet-4-5, openai→gpt-4o
    cheapest   → google/gemini-3-flash-preview (balanced) $0.003250
    smartest   → google/gemini-3-flash-preview (balanced) $0.003250
    balanced   → google/gemini-3-flash-preview (balanced) $0.003250

  [Complex] Implement a B-tree with insert and search in Python. Handle node ...
    Per-provider routes: google→gemini-3-pro-preview, anthropic→claude-opus-4, openai→o3
    cheapest   → google/gemini-3-pro-preview (premium) $0.013000
    smartest   → google/gemini-3-pro-preview (premium) $0.013000
    balanced   → google/gemini-3-pro-preview (premium) $0.013000

======================================================================
PUBLIC API
======================================================================

  import orkestra as o

  # Single provider (routes within one provider's model family)
  provider = o.Provider("google", "API_KEY")
  response = provider.chat("your prompt")
  stream   = provider.stream_text("your prompt")

  # Multi-provider (picks best provider+model using a strategy)
  multi = o.MultiProvider([provider1, provider2])
  response = multi.chat("your prompt", strategy="cheapest")
  response = multi.chat("your prompt", strategy="smartest")
  response = multi.chat("your prompt", strategy="balanced")

  # Response fields
  response.text             # generated text
  response.model            # model used (e.g. "gemini-2.5-flash-lite")
  response.provider         # provider name (e.g. "google")
  response.cost             # total cost in dollars
  response.input_tokens     # input token count
  response.output_tokens    # output token count
  response.savings          # dollars saved vs premium model
  response.savings_percent  # savings as percentage

How It Works

Orkestra uses a KNN (K-Nearest Neighbors) router trained on benchmark query embeddings to predict which model tier will perform best for your specific prompt. Simple queries get routed to cheaper models, complex ones to premium models.

Each call:

  1. Embeds your prompt using Longformer (768-dim)
  2. KNN finds the 5 nearest training queries
  3. Routes to the model that performed best on similar queries
  4. Calls the selected model via the provider's API
  5. Returns the response with cost and savings info

Router models are downloaded automatically on first use and cached in ~/.orkestra/routers/.

Supported Providers

Google Gemini

Tier Model Input $/1M Output $/1M
Budget gemini-2.5-flash-lite $0.10 $0.40
Balanced gemini-3-flash-preview $0.50 $3.00
Premium gemini-3-pro-preview $2.00 $12.00

Anthropic Claude

Tier Model Input $/1M Output $/1M
Budget claude-haiku-4 $0.80 $4.00
Balanced claude-sonnet-4-5 $3.00 $15.00
Premium claude-opus-4 $15.00 $75.00

OpenAI

Tier Model Input $/1M Output $/1M
Budget gpt-4o-mini $0.15 $0.60
Balanced gpt-4o $2.50 $10.00
Premium o3 $10.00 $40.00

API Reference

orkestra.Provider(name, api_key)

Create a provider with automatic routing.

  • name: "google", "anthropic", or "openai"
  • api_key: Your API key for the provider

provider.chat(prompt, *, max_tokens=8192, temperature=1.0)

Generate a response with automatic model routing. Returns a Response.

provider.stream_text(prompt, *, max_tokens=8192, temperature=1.0)

Stream text chunks with automatic model routing. Yields str.

orkestra.MultiProvider(providers)

Combine multiple Provider instances for cross-provider routing.

multi.chat(prompt, *, strategy="cheapest", max_tokens=8192, temperature=1.0)

Generate with strategy-based provider selection. Strategies: "cheapest", "smartest", "balanced".

orkestra.Response

Field Type Description
text str Generated response text
model str Model that was selected
provider str Provider that was used
cost float Actual cost in dollars
input_tokens int Input token count
output_tokens int Output token count
savings float Dollars saved vs base model
savings_percent float Percentage saved vs base model
base_model str Comparison baseline model
base_cost float What it would have cost with base model

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

orkestra_router-0.0.1.tar.gz (21.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

orkestra_router-0.0.1-py3-none-any.whl (17.9 kB view details)

Uploaded Python 3

File details

Details for the file orkestra_router-0.0.1.tar.gz.

File metadata

  • Download URL: orkestra_router-0.0.1.tar.gz
  • Upload date:
  • Size: 21.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for orkestra_router-0.0.1.tar.gz
Algorithm Hash digest
SHA256 9d80abb3f9015f5965bf6af1565101bade265a1de6d7a868aab6785303ff7691
MD5 1f558df0c13eee8346a811700db2a2c7
BLAKE2b-256 27914ad8a60a7f2bf3f3f8a0af33d63af4b79cf8c90d068c09b239565f507358

See more details on using hashes here.

File details

Details for the file orkestra_router-0.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for orkestra_router-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 2813d016ca664209540a5c7104894de384f72b61be948d82637108e18921706b
MD5 bdb5b322e2a0b2c5de9e2fac423e4909
BLAKE2b-256 d3ffb4411e557b3e7c4f907e5cd9c7dd15288d6a7276571d7588ef3c4f4a4008

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page