Skip to main content

Intelligent LLM routing proxy — cost optimization via local proxy

Reason this release was yanked:

Replaced by @robot-resources/router on npm. Run npx robot-resources to install.

Project description

Robot Resources

CI Python 3.10+ License: MIT Version

Intelligent LLM cost optimization via local proxy.

Automatically route each LLM request to the cheapest model that can handle it. Capability scores calibrated from Chatbot Arena ELO ratings.

  • API-key users: 60-90% direct cost savings (benchmarked 82.5% avg, 210 prompts)
  • Subscription users (e.g., OpenClaw + Claude): 3x token budget stretch (53.7% avg savings, Haiku/Sonnet/Opus split)

Quick Start

The fastest way — installs Router, registers it as an always-on service, and auto-configures MCP:

npx @robot-resources/router

From PyPI (manual setup)

# 1. Install
pip install robot-resources-router

# 2. Set API keys
export ANTHROPIC_API_KEY="sk-ant-..."
export OPENAI_API_KEY="sk-..."

# 3. Start proxy
rr-router start
# Proxy running on http://localhost:3838

Either way, point your agent to http://localhost:3838 and use model: "auto".

Why Robot Resources?

Without RR With RR
Every message uses same expensive model Each message routed to optimal model
"hello" costs same as "refactor codebase" Simple tasks use cheap/free models
Manual model selection Automatic task detection
No cost visibility Full routing transparency

Savings by user type

API-key users (pay per token, all 11 models available):

Workload Avg Savings Typical Model
Simple Q&A 98% gemini-2.5-flash-lite, gpt-5.4-nano
Creative 83% gpt-5.4-mini, gemini-2.5-flash
Reasoning 79% o4-mini, gemini-2.5-pro
Coding 77% gpt-5.4-mini, gemini-2.5-flash
Analysis 73% gpt-5.4-mini, gemini-2.5-pro

Subscription users (e.g., OpenClaw + Claude, Anthropic models only):

Complexity Model Selected Savings vs Opus
Simple prompts Haiku (41.9%) 80%
Medium prompts Sonnet (50.5%) 40%
Complex prompts Opus (7.6%) 0%

Token budget multiplier: 3x — your subscription handles 3x more requests through intelligent routing.

How It Works

Your Agent
    │
    │  POST /v1/chat/completions
    │  model: "auto"
    ▼
┌─────────────────────────────────────┐
│   Robot Resources (localhost:3838)  │
│                                     │
│   1. Detect task type               │
│      → coding, reasoning, analysis  │
│        simple_qa, creative, general │
│                                     │
│   2. Filter capable models          │
│      → capability >= 0.70 threshold │
│                                     │
│   3. Select cheapest                │
│      → lowest cost_per_1k_input     │
│                                     │
│   4. Forward to provider            │
│      → Anthropic, OpenAI, Google    │
└─────────────────────────────────────┘
    │
    ▼
Real LLM Provider (using your API keys)

Installation

From PyPI

pip install robot-resources-router

From Source

git clone https://github.com/robot-resources/robot-resources.git
cd robot-resources/router
pip install -e ".[dev]"

Requirements

  • Python 3.10+
  • API keys for at least one provider (Anthropic, OpenAI, or Google)

Configuration

Environment Variables

# Required: At least one provider
export ANTHROPIC_API_KEY="sk-ant-..."
export OPENAI_API_KEY="sk-..."
export GOOGLE_API_KEY="..."

# Optional: Server settings
export ROUTER_PORT=3838              # Default: 3838
export ROUTER_API_KEY="your-key"     # Optional: enable auth on all endpoints
export ROUTER_CORS_ORIGINS=""        # Default: localhost only

Agent Integration

Point your agent's API base URL to http://localhost:3838 and use model auto. Works with any OpenAI-compatible client.

from openai import OpenAI

client = OpenAI(base_url="http://localhost:3838/v1", api_key="unused")
response = client.chat.completions.create(
    model="auto",
    messages=[{"role": "user", "content": "Hello!"}],
)

API Reference

Endpoints

Endpoint Method Description
/v1/chat/completions POST Chat completions (streaming supported)
/v1/models GET List available models
/v1/stats GET Cost savings statistics
/v1/models/compare GET Compare models by task type
/v1/config GET/PATCH View or update routing config at runtime
/health GET Health check with component diagnostics

Request Format

Standard OpenAI chat completions format:

{
  "model": "auto",
  "messages": [
    {"role": "system", "content": "You are helpful."},
    {"role": "user", "content": "Hello!"}
  ],
  "temperature": 0.7,
  "max_tokens": 1000,
  "stream": false
}

Response Format

Standard OpenAI format plus routing_info:

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "model": "gemini-2.0-flash",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! How can I help you?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 10,
    "completion_tokens": 8,
    "total_tokens": 18
  },
  "routing_info": {
    "selected_model": "gemini-2.0-flash",
    "original_model": "auto",
    "provider": "google",
    "task_type": "simple_qa",
    "capability_score": 0.92,
    "savings_percent": 96.0,
    "baseline_model": "gpt-4o",
    "reasoning": "Selected gemini-2.0-flash as cheapest capable model..."
  }
}

Task Types

RR automatically detects 6 task types:

Task Type Detection Keywords Typical Models
coding function, code, debug, python, api claude-sonnet-4-6, gpt-5.4-mini
reasoning explain why, prove, step by step o3, o4-mini
analysis compare, pros and cons, evaluate gpt-5.4-mini, gemini-2.5-pro
simple_qa what is, who invented, capital of gemini-2.5-flash, claude-haiku-4-5
creative write a story, compose, brainstorm claude-sonnet-4-6, gpt-5.4
general (fallback) cheapest available

Supported Models

11 models across 3 supported providers (routes within your available providers):

Provider Models
OpenAI gpt-5.4, gpt-5.4-mini, gpt-5.4-nano, o3, o4-mini
Anthropic claude-opus-4-6, claude-sonnet-4-6, claude-haiku-4-5-20251001
Google gemini-2.5-pro, gemini-2.5-flash, gemini-2.5-flash-lite

CLI Commands

rr-router start                # Start the proxy server
rr-router start --port 8080    # Start on custom port
rr-router status               # Check proxy health and config
rr-router report weekly        # Cost savings report (7 days)
rr-router report monthly       # Cost savings report (30 days)
rr-router --version            # Show version

MCP Server

The Router includes an MCP server for AI agent integration:

npx @robot-resources/router-mcp

Available tools: router_get_stats, router_compare_models, router_get_config, router_set_config.

Development

See CONTRIBUTING.md for the full development guide.

git clone https://github.com/robot-resources/robot-resources.git
cd robot-resources/router
python -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"
pytest  # 681 tests

Project Structure

src/robot_resources/
├── cli/                    # CLI entry point (click)
├── config.py               # Centralized settings (pydantic-settings)
├── proxy/
│   ├── server.py           # FastAPI app (auth, CORS, lifespan, middleware)
│   ├── security.py         # Bearer token auth (timing-safe)
│   ├── models.py           # Pydantic models with validators
│   ├── handlers/           # API endpoints (completions, stats, config, compare)
│   └── providers/          # LLM provider clients (Anthropic, OpenAI, Google)
├── routing/
│   ├── task_detection.py   # 6 task types, keyword + context
│   ├── classifier.py       # LLM task classifier (async)
│   ├── router.py           # Hybrid routing with confidence branching
│   ├── selector.py         # Capability filter + cheapest model
│   ├── decision_log.py     # SQLite WAL decision persistence
│   └── models_db.json      # 11 models with capabilities + pricing
└── tracking/
    ├── db.py               # OutcomeDB (SQLite WAL, async, migrations)
    ├── recorder.py          # OutcomeRecorder (routing outcomes)
    ├── calculator.py        # CostCalculator (pricing from models_db)
    └── telemetry.py         # TelemetryReporter (platform API)

Troubleshooting

Port already in use

lsof -i :3838
rr-router start --port 3839

API key not found

echo $ANTHROPIC_API_KEY
echo $OPENAI_API_KEY
export ANTHROPIC_API_KEY="sk-ant-..."

Model not found

Use model: "auto" for automatic routing. Check /v1/models for available models.

Roadmap

  • Local proxy with task detection routing
  • Real SSE streaming for all 3 providers
  • Hybrid routing (keyword + LLM classifier)
  • MCP server for stats and configuration
  • Production hardening (681 tests, error handling, observability)
  • Outcome-based routing (learning from success/failure)
  • Calibration lab (benchmark-driven model scoring)

License

MIT

Contributing

Contributions welcome! Please read CONTRIBUTING.md first.

For security vulnerabilities, see SECURITY.md.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

robot_resources_router-2.1.0.tar.gz (65.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

robot_resources_router-2.1.0-py3-none-any.whl (88.8 kB view details)

Uploaded Python 3

File details

Details for the file robot_resources_router-2.1.0.tar.gz.

File metadata

  • Download URL: robot_resources_router-2.1.0.tar.gz
  • Upload date:
  • Size: 65.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.13

File hashes

Hashes for robot_resources_router-2.1.0.tar.gz
Algorithm Hash digest
SHA256 72451211e9df00263fe8e74cb409ff8acaf44efa2ca08290ab002ea1b54881ee
MD5 9b96d3f07a3cb1e52f21d326534beefa
BLAKE2b-256 d6fa0a8d3734130f6aad965295557aafaa8ddcaee102b978ef0136bd4ad589a1

See more details on using hashes here.

File details

Details for the file robot_resources_router-2.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for robot_resources_router-2.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 df3d7d8d69b0b7dd700f4a955ebb5f710ad13d65441ef6713c4dca81ab021fb7
MD5 49f96ad3f97e62dd510688015be24519
BLAKE2b-256 009e8491d59de9ef069bd492fc94f95212ea90cedcfc7960e5a8529eacb69ac6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page