Smart LLM routing across providers - automatically picks the most cost-efficient model for your prompt

These details have not been verified by PyPI

Project links

Project description

orkestra

Smart LLM routing across providers. Automatically selects the most cost-efficient model for your prompt using KNN-based routing.

Install

pip install orkestra

# With provider extras
pip install orkestra[google]
pip install orkestra[anthropic]
pip install orkestra[openai]
pip install orkestra[all]

Quick Start

import orkestra as o

provider = o.Provider("google", "YOUR_GEMINI_API_KEY")
response = provider.chat("Explain quantum computing")

print(response.text)
print(f"Provider: {response.provider}")
print(f"Model: {response.model}")
print(f"Cost: ${response.cost:.6f}")
print(f"Saved: {response.savings_percent:.1f}%")

Multi-Provider Routing

Combine multiple providers and let orkestra pick the best one:

import orkestra as o

google = o.Provider("google", "GOOGLE_KEY")
anthropic = o.Provider("anthropic", "ANTHROPIC_KEY")
openai = o.Provider("openai", "OPENAI_KEY")

multi = o.MultiProvider([google, anthropic, openai])

# Route to cheapest option across all providers
response = multi.chat("What is 2+2?", strategy="cheapest")

# Route to smartest option
response = multi.chat("Prove the Riemann hypothesis", strategy="smartest")

# Balanced: prefer mid-tier models, break ties by cost
response = multi.chat("Write a Python function", strategy="balanced")

Streaming

provider = o.Provider("google", "YOUR_KEY")
for chunk in provider.stream_text("Write a poem"):
    print(chunk, end="")

Explore

======================================================================
PROVIDER MODEL CATALOG
======================================================================

  GOOGLE
    gemini-2.5-flash-lite        budget     $0.10/$0.40 per 1M tokens
    gemini-3-flash-preview       balanced   $0.50/$3.00 per 1M tokens
    gemini-3-pro-preview         premium    $2.00/$12.00 per 1M tokens

  ANTHROPIC
    claude-haiku-4               budget     $0.80/$4.00 per 1M tokens
    claude-sonnet-4-5            balanced   $3.00/$15.00 per 1M tokens
    claude-opus-4                premium    $15.00/$75.00 per 1M tokens

  OPENAI
    gpt-4o-mini                  budget     $0.15/$0.60 per 1M tokens
    gpt-4o                       balanced   $2.50/$10.00 per 1M tokens
    o3                           premium    $10.00/$40.00 per 1M tokens

======================================================================
ROUTER PREDICTIONS (which model gets picked per prompt)
======================================================================

  [Simple] What is the capital of Japan?
    google       → gemini-3-flash-preview (balanced)
    anthropic    → claude-sonnet-4-5 (balanced)
    openai       → gpt-4o (balanced)

  [Moderate] Explain how a hash table works with collision handling
    google       → gemini-3-flash-preview (balanced)
    anthropic    → claude-sonnet-4-5 (balanced)
    openai       → gpt-4o (balanced)

  [Complex] Implement a B-tree with insert and search in Python. Handle node ...
    google       → gemini-3-pro-preview (premium)
    anthropic    → claude-opus-4 (premium)
    openai       → o3 (premium)

======================================================================
COST SIMULATION (500 input tokens, 1000 output tokens)
======================================================================

  [Simple] What is the capital of Japan?
    google       gemini-3-flash-preview       $0.003250  (saves $0.009750, 75% vs gemini-3-pro-preview)
    anthropic    claude-sonnet-4-5            $0.016500  (saves $0.066000, 80% vs claude-opus-4)
    openai       gpt-4o                       $0.011250  (saves $0.033750, 75% vs o3)

  [Moderate] Explain how a hash table works with collision handling
    google       gemini-3-flash-preview       $0.003250  (saves $0.009750, 75% vs gemini-3-pro-preview)
    anthropic    claude-sonnet-4-5            $0.016500  (saves $0.066000, 80% vs claude-opus-4)
    openai       gpt-4o                       $0.011250  (saves $0.033750, 75% vs o3)

  [Complex] Implement a B-tree with insert and search in Python. Handle node ...
    google       gemini-3-pro-preview         $0.013000  (saves $0.000000, 0% vs gemini-3-pro-preview)
    anthropic    claude-opus-4                $0.082500  (saves $0.000000, 0% vs claude-opus-4)
    openai       o3                           $0.045000  (saves $0.000000, 0% vs o3)

======================================================================
STRATEGY COMPARISON (multi-provider selection)
======================================================================

  [Simple] What is the capital of Japan?
    Per-provider routes: google→gemini-3-flash-preview, anthropic→claude-sonnet-4-5, openai→gpt-4o
    cheapest   → google/gemini-3-flash-preview (balanced) $0.003250
    smartest   → google/gemini-3-flash-preview (balanced) $0.003250
    balanced   → google/gemini-3-flash-preview (balanced) $0.003250

  [Moderate] Explain how a hash table works with collision handling
    Per-provider routes: google→gemini-3-flash-preview, anthropic→claude-sonnet-4-5, openai→gpt-4o
    cheapest   → google/gemini-3-flash-preview (balanced) $0.003250
    smartest   → google/gemini-3-flash-preview (balanced) $0.003250
    balanced   → google/gemini-3-flash-preview (balanced) $0.003250

  [Complex] Implement a B-tree with insert and search in Python. Handle node ...
    Per-provider routes: google→gemini-3-pro-preview, anthropic→claude-opus-4, openai→o3
    cheapest   → google/gemini-3-pro-preview (premium) $0.013000
    smartest   → google/gemini-3-pro-preview (premium) $0.013000
    balanced   → google/gemini-3-pro-preview (premium) $0.013000

======================================================================
PUBLIC API
======================================================================

  import orkestra as o

  # Single provider (routes within one provider's model family)
  provider = o.Provider("google", "API_KEY")
  response = provider.chat("your prompt")
  stream   = provider.stream_text("your prompt")

  # Multi-provider (picks best provider+model using a strategy)
  multi = o.MultiProvider([provider1, provider2])
  response = multi.chat("your prompt", strategy="cheapest")
  response = multi.chat("your prompt", strategy="smartest")
  response = multi.chat("your prompt", strategy="balanced")

  # Response fields
  response.text             # generated text
  response.model            # model used (e.g. "gemini-2.5-flash-lite")
  response.provider         # provider name (e.g. "google")
  response.cost             # total cost in dollars
  response.input_tokens     # input token count
  response.output_tokens    # output token count
  response.savings          # dollars saved vs premium model
  response.savings_percent  # savings as percentage

How It Works

Orkestra uses a KNN (K-Nearest Neighbors) router trained on benchmark query embeddings to predict which model tier will perform best for your specific prompt. Simple queries get routed to cheaper models, complex ones to premium models.

Each call:

Embeds your prompt using Longformer (768-dim)
KNN finds the 5 nearest training queries
Routes to the model that performed best on similar queries
Calls the selected model via the provider's API
Returns the response with cost and savings info

Router models are downloaded automatically on first use and cached in ~/.orkestra/routers/.

Supported Providers

Google Gemini

Tier	Model	Input $/1M	Output $/1M
Budget	`gemini-2.5-flash-lite`	$0.10	$0.40
Balanced	`gemini-3-flash-preview`	$0.50	$3.00
Premium	`gemini-3-pro-preview`	$2.00	$12.00

Anthropic Claude

Tier	Model	Input $/1M	Output $/1M
Budget	`claude-haiku-4`	$0.80	$4.00
Balanced	`claude-sonnet-4-5`	$3.00	$15.00
Premium	`claude-opus-4`	$15.00	$75.00

OpenAI

Tier	Model	Input $/1M	Output $/1M
Budget	`gpt-4o-mini`	$0.15	$0.60
Balanced	`gpt-4o`	$2.50	$10.00
Premium	`o3`	$10.00	$40.00

API Reference

`orkestra.Provider(name, api_key)`

Create a provider with automatic routing.

name: "google", "anthropic", or "openai"
api_key: Your API key for the provider

`provider.chat(prompt, *, max_tokens=8192, temperature=1.0)`

Generate a response with automatic model routing. Returns a Response.

`provider.stream_text(prompt, *, max_tokens=8192, temperature=1.0)`

Stream text chunks with automatic model routing. Yields str.

`orkestra.MultiProvider(providers)`

Combine multiple Provider instances for cross-provider routing.

`multi.chat(prompt, *, strategy="cheapest", max_tokens=8192, temperature=1.0)`

Generate with strategy-based provider selection. Strategies: "cheapest", "smartest", "balanced".

`orkestra.Response`

Field	Type	Description
`text`	`str`	Generated response text
`model`	`str`	Model that was selected
`provider`	`str`	Provider that was used
`cost`	`float`	Actual cost in dollars
`input_tokens`	`int`	Input token count
`output_tokens`	`int`	Output token count
`savings`	`float`	Dollars saved vs base model
`savings_percent`	`float`	Percentage saved vs base model
`base_model`	`str`	Comparison baseline model
`base_cost`	`float`	What it would have cost with base model

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.1.1

Mar 3, 2026

0.1.0

Mar 3, 2026

This version

0.0.1

Feb 27, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

orkestra_router-0.0.1.tar.gz (21.8 kB view details)

Uploaded Feb 27, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

orkestra_router-0.0.1-py3-none-any.whl (17.9 kB view details)

Uploaded Feb 27, 2026 Python 3

File details

Details for the file orkestra_router-0.0.1.tar.gz.

File metadata

Download URL: orkestra_router-0.0.1.tar.gz
Upload date: Feb 27, 2026
Size: 21.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for orkestra_router-0.0.1.tar.gz
Algorithm	Hash digest
SHA256	`9d80abb3f9015f5965bf6af1565101bade265a1de6d7a868aab6785303ff7691`
MD5	`1f558df0c13eee8346a811700db2a2c7`
BLAKE2b-256	`27914ad8a60a7f2bf3f3f8a0af33d63af4b79cf8c90d068c09b239565f507358`

See more details on using hashes here.

File details

Details for the file orkestra_router-0.0.1-py3-none-any.whl.

File metadata

Download URL: orkestra_router-0.0.1-py3-none-any.whl
Upload date: Feb 27, 2026
Size: 17.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for orkestra_router-0.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`2813d016ca664209540a5c7104894de384f72b61be948d82637108e18921706b`
MD5	`bdb5b322e2a0b2c5de9e2fac423e4909`
BLAKE2b-256	`d3ffb4411e557b3e7c4f907e5cd9c7dd15288d6a7276571d7588ef3c4f4a4008`

See more details on using hashes here.

orkestra-router 0.0.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

orkestra

Install

Quick Start

Multi-Provider Routing

Streaming

Explore

How It Works

Supported Providers

Google Gemini

Anthropic Claude

OpenAI

API Reference

orkestra.Provider(name, api_key)

provider.chat(prompt, *, max_tokens=8192, temperature=1.0)

provider.stream_text(prompt, *, max_tokens=8192, temperature=1.0)

orkestra.MultiProvider(providers)

multi.chat(prompt, *, strategy="cheapest", max_tokens=8192, temperature=1.0)

orkestra.Response

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`orkestra.Provider(name, api_key)`

`provider.chat(prompt, *, max_tokens=8192, temperature=1.0)`

`provider.stream_text(prompt, *, max_tokens=8192, temperature=1.0)`

`orkestra.MultiProvider(providers)`

`multi.chat(prompt, *, strategy="cheapest", max_tokens=8192, temperature=1.0)`

`orkestra.Response`