Skip to main content

Intelligent LLM routing proxy — cost optimization via local proxy

Reason this release was yanked:

Replaced by @robot-resources/router on npm. Run npx robot-resources to install.

Project description

Robot Resources

CI Python 3.10+ License: MIT Version

Intelligent LLM cost optimization via local proxy.

Automatically route each LLM request to the cheapest model that can handle it. Capability scores calibrated from Chatbot Arena ELO ratings.

  • API-key users: 60-90% direct cost savings (benchmarked 82.5% avg, 210 prompts)
  • Subscription users (e.g., OpenClaw + Claude): 3x token budget stretch (53.7% avg savings, Haiku/Sonnet/Opus split)

Quick Start

The fastest way — installs Router, registers it as an always-on service, and auto-configures MCP:

npx @robot-resources/router

From PyPI (manual setup)

# 1. Install
pip install robot-resources-router

# 2. Set API keys
export ANTHROPIC_API_KEY="sk-ant-..."
export OPENAI_API_KEY="sk-..."

# 3. Start proxy
rr-router start
# Proxy running on http://localhost:3838

Either way, point your agent to http://localhost:3838 and use model: "auto".

Why Robot Resources?

Without RR With RR
Every message uses same expensive model Each message routed to optimal model
"hello" costs same as "refactor codebase" Simple tasks use cheap/free models
Manual model selection Automatic task detection
No cost visibility Full routing transparency

Savings by user type

API-key users (pay per token, all 11 models available):

Workload Avg Savings Typical Model
Simple Q&A 98% gemini-2.5-flash-lite, gpt-5.4-nano
Creative 83% gpt-5.4-mini, gemini-2.5-flash
Reasoning 79% o4-mini, gemini-2.5-pro
Coding 77% gpt-5.4-mini, gemini-2.5-flash
Analysis 73% gpt-5.4-mini, gemini-2.5-pro

Subscription users (e.g., OpenClaw + Claude, Anthropic models only):

Complexity Model Selected Savings vs Opus
Simple prompts Haiku (41.9%) 80%
Medium prompts Sonnet (50.5%) 40%
Complex prompts Opus (7.6%) 0%

Token budget multiplier: 3x — your subscription handles 3x more requests through intelligent routing.

How It Works

Your Agent
    │
    │  POST /v1/chat/completions
    │  model: "auto"
    ▼
┌─────────────────────────────────────┐
│   Robot Resources (localhost:3838)  │
│                                     │
│   1. Detect task type               │
│      → coding, reasoning, analysis  │
│        simple_qa, creative, general │
│                                     │
│   2. Filter capable models          │
│      → capability >= 0.70 threshold │
│                                     │
│   3. Select cheapest                │
│      → lowest cost_per_1k_input     │
│                                     │
│   4. Forward to provider            │
│      → Anthropic, OpenAI, Google    │
└─────────────────────────────────────┘
    │
    ▼
Real LLM Provider (using your API keys)

Installation

From PyPI

pip install robot-resources-router

From Source

git clone https://github.com/robot-resources/robot-resources.git
cd robot-resources/router
pip install -e ".[dev]"

Requirements

  • Python 3.10+
  • API keys for at least one provider (Anthropic, OpenAI, or Google)

Configuration

Environment Variables

# Required: At least one provider
export ANTHROPIC_API_KEY="sk-ant-..."
export OPENAI_API_KEY="sk-..."
export GOOGLE_API_KEY="..."

# Optional: Server settings
export ROUTER_PORT=3838              # Default: 3838
export ROUTER_API_KEY="your-key"     # Optional: enable auth on all endpoints
export ROUTER_CORS_ORIGINS=""        # Default: localhost only

Agent Integration

Point your agent's API base URL to http://localhost:3838 and use model auto. Works with any OpenAI-compatible client.

from openai import OpenAI

client = OpenAI(base_url="http://localhost:3838/v1", api_key="unused")
response = client.chat.completions.create(
    model="auto",
    messages=[{"role": "user", "content": "Hello!"}],
)

API Reference

Endpoints

Endpoint Method Description
/v1/chat/completions POST Chat completions (streaming supported)
/v1/models GET List available models
/v1/stats GET Cost savings statistics
/v1/models/compare GET Compare models by task type
/v1/config GET/PATCH View or update routing config at runtime
/health GET Health check with component diagnostics

Request Format

Standard OpenAI chat completions format:

{
  "model": "auto",
  "messages": [
    {"role": "system", "content": "You are helpful."},
    {"role": "user", "content": "Hello!"}
  ],
  "temperature": 0.7,
  "max_tokens": 1000,
  "stream": false
}

Response Format

Standard OpenAI format plus routing_info:

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "model": "gemini-2.0-flash",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! How can I help you?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 10,
    "completion_tokens": 8,
    "total_tokens": 18
  },
  "routing_info": {
    "selected_model": "gemini-2.0-flash",
    "original_model": "auto",
    "provider": "google",
    "task_type": "simple_qa",
    "capability_score": 0.92,
    "savings_percent": 96.0,
    "baseline_model": "gpt-4o",
    "reasoning": "Selected gemini-2.0-flash as cheapest capable model..."
  }
}

Task Types

RR automatically detects 6 task types:

Task Type Detection Keywords Typical Models
coding function, code, debug, python, api claude-sonnet-4-6, gpt-5.4-mini
reasoning explain why, prove, step by step o3, o4-mini
analysis compare, pros and cons, evaluate gpt-5.4-mini, gemini-2.5-pro
simple_qa what is, who invented, capital of gemini-2.5-flash, claude-haiku-4-5
creative write a story, compose, brainstorm claude-sonnet-4-6, gpt-5.4
general (fallback) cheapest available

Supported Models

11 models across 3 supported providers (routes within your available providers):

Provider Models
OpenAI gpt-5.4, gpt-5.4-mini, gpt-5.4-nano, o3, o4-mini
Anthropic claude-opus-4-6, claude-sonnet-4-6, claude-haiku-4-5-20251001
Google gemini-2.5-pro, gemini-2.5-flash, gemini-2.5-flash-lite

CLI Commands

rr-router start                # Start the proxy server
rr-router start --port 8080    # Start on custom port
rr-router status               # Check proxy health and config
rr-router report weekly        # Cost savings report (7 days)
rr-router report monthly       # Cost savings report (30 days)
rr-router --version            # Show version

MCP Server

The Router includes an MCP server for AI agent integration:

npx @robot-resources/router-mcp

Available tools: router_get_stats, router_compare_models, router_get_config, router_set_config.

Development

See CONTRIBUTING.md for the full development guide.

git clone https://github.com/robot-resources/robot-resources.git
cd robot-resources/router
python -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"
pytest  # 681 tests

Project Structure

src/robot_resources/
├── cli/                    # CLI entry point (click)
├── config.py               # Centralized settings (pydantic-settings)
├── proxy/
│   ├── server.py           # FastAPI app (auth, CORS, lifespan, middleware)
│   ├── security.py         # Bearer token auth (timing-safe)
│   ├── models.py           # Pydantic models with validators
│   ├── handlers/           # API endpoints (completions, stats, config, compare)
│   └── providers/          # LLM provider clients (Anthropic, OpenAI, Google)
├── routing/
│   ├── task_detection.py   # 6 task types, keyword + context
│   ├── classifier.py       # LLM task classifier (async)
│   ├── router.py           # Hybrid routing with confidence branching
│   ├── selector.py         # Capability filter + cheapest model
│   ├── decision_log.py     # SQLite WAL decision persistence
│   └── models_db.json      # 11 models with capabilities + pricing
└── tracking/
    ├── db.py               # OutcomeDB (SQLite WAL, async, migrations)
    ├── recorder.py          # OutcomeRecorder (routing outcomes)
    ├── calculator.py        # CostCalculator (pricing from models_db)
    └── telemetry.py         # TelemetryReporter (platform API)

Troubleshooting

Port already in use

lsof -i :3838
rr-router start --port 3839

API key not found

echo $ANTHROPIC_API_KEY
echo $OPENAI_API_KEY
export ANTHROPIC_API_KEY="sk-ant-..."

Model not found

Use model: "auto" for automatic routing. Check /v1/models for available models.

Roadmap

  • Local proxy with task detection routing
  • Real SSE streaming for all 3 providers
  • Hybrid routing (keyword + LLM classifier)
  • MCP server for stats and configuration
  • Production hardening (681 tests, error handling, observability)
  • Outcome-based routing (learning from success/failure)
  • Calibration lab (benchmark-driven model scoring)

License

MIT

Contributing

Contributions welcome! Please read CONTRIBUTING.md first.

For security vulnerabilities, see SECURITY.md.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

robot_resources_router-2.1.3.tar.gz (65.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

robot_resources_router-2.1.3-py3-none-any.whl (88.4 kB view details)

Uploaded Python 3

File details

Details for the file robot_resources_router-2.1.3.tar.gz.

File metadata

  • Download URL: robot_resources_router-2.1.3.tar.gz
  • Upload date:
  • Size: 65.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.13

File hashes

Hashes for robot_resources_router-2.1.3.tar.gz
Algorithm Hash digest
SHA256 1153540346e1b14550f8be24827211155a9509fe6a43330909cbbb37c0b00192
MD5 22df49cc571d6cc856338ca19b36b514
BLAKE2b-256 e48f51d24cfef2a6f2acc2592005ef0b0a32386612a5d25be57bc40c385bbb34

See more details on using hashes here.

File details

Details for the file robot_resources_router-2.1.3-py3-none-any.whl.

File metadata

File hashes

Hashes for robot_resources_router-2.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 3d4d322343de8f95c36c2c186b686e2fcb5b457f9d192751469900043d41838d
MD5 e3321ab5b0e7164b3554de9ca3bde7ba
BLAKE2b-256 3fbf2a083e6b9520aed3985a348801f9924e3c8809b932373ddc8082d192aaab

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page