Skip to main content

Intelligent LLM routing proxy — cost optimization via local proxy

Reason this release was yanked:

Replaced by @robot-resources/router on npm. Run npx robot-resources to install.

Project description

Robot Resources

CI Python 3.10+ License: MIT Version

Intelligent LLM cost optimization via local proxy.

Automatically route each LLM request to the cheapest model that can handle it. Capability scores calibrated from Chatbot Arena ELO ratings.

  • API-key users: 60-90% direct cost savings (benchmarked 82.5% avg, 210 prompts)
  • Subscription users (e.g., OpenClaw + Claude): 3x token budget stretch (53.7% avg savings, Haiku/Sonnet/Opus split)

Quick Start

The fastest way — installs Router, registers it as an always-on service, and auto-configures MCP:

npx @robot-resources/router

From PyPI (manual setup)

# 1. Install
pip install robot-resources-router

# 2. Set API keys
export ANTHROPIC_API_KEY="sk-ant-..."
export OPENAI_API_KEY="sk-..."

# 3. Start proxy
rr-router start
# Proxy running on http://localhost:3838

Either way, point your agent to http://localhost:3838 and use model: "auto".

Why Robot Resources?

Without RR With RR
Every message uses same expensive model Each message routed to optimal model
"hello" costs same as "refactor codebase" Simple tasks use cheap/free models
Manual model selection Automatic task detection
No cost visibility Full routing transparency

Savings by user type

API-key users (pay per token, all 11 models available):

Workload Avg Savings Typical Model
Simple Q&A 98% gemini-2.5-flash-lite, gpt-5.4-nano
Creative 83% gpt-5.4-mini, gemini-2.5-flash
Reasoning 79% o4-mini, gemini-2.5-pro
Coding 77% gpt-5.4-mini, gemini-2.5-flash
Analysis 73% gpt-5.4-mini, gemini-2.5-pro

Subscription users (e.g., OpenClaw + Claude, Anthropic models only):

Complexity Model Selected Savings vs Opus
Simple prompts Haiku (41.9%) 80%
Medium prompts Sonnet (50.5%) 40%
Complex prompts Opus (7.6%) 0%

Token budget multiplier: 3x — your subscription handles 3x more requests through intelligent routing.

How It Works

Your Agent
    │
    │  POST /v1/chat/completions
    │  model: "auto"
    ▼
┌─────────────────────────────────────┐
│   Robot Resources (localhost:3838)  │
│                                     │
│   1. Detect task type               │
│      → coding, reasoning, analysis  │
│        simple_qa, creative, general │
│                                     │
│   2. Filter capable models          │
│      → capability >= 0.70 threshold │
│                                     │
│   3. Select cheapest                │
│      → lowest cost_per_1k_input     │
│                                     │
│   4. Forward to provider            │
│      → Anthropic, OpenAI, Google    │
└─────────────────────────────────────┘
    │
    ▼
Real LLM Provider (using your API keys)

Installation

From PyPI

pip install robot-resources-router

From Source

git clone https://github.com/robot-resources/robot-resources.git
cd robot-resources/router
pip install -e ".[dev]"

Requirements

  • Python 3.10+
  • API keys for at least one provider (Anthropic, OpenAI, or Google)

Configuration

Environment Variables

# Required: At least one provider
export ANTHROPIC_API_KEY="sk-ant-..."
export OPENAI_API_KEY="sk-..."
export GOOGLE_API_KEY="..."

# Optional: Server settings
export ROUTER_PORT=3838              # Default: 3838
export ROUTER_API_KEY="your-key"     # Optional: enable auth on all endpoints
export ROUTER_CORS_ORIGINS=""        # Default: localhost only

Agent Integration

Point your agent's API base URL to http://localhost:3838 and use model auto. Works with any OpenAI-compatible client.

from openai import OpenAI

client = OpenAI(base_url="http://localhost:3838/v1", api_key="unused")
response = client.chat.completions.create(
    model="auto",
    messages=[{"role": "user", "content": "Hello!"}],
)

API Reference

Endpoints

Endpoint Method Description
/v1/chat/completions POST Chat completions (streaming supported)
/v1/models GET List available models
/v1/stats GET Cost savings statistics
/v1/models/compare GET Compare models by task type
/v1/config GET/PATCH View or update routing config at runtime
/health GET Health check with component diagnostics

Request Format

Standard OpenAI chat completions format:

{
  "model": "auto",
  "messages": [
    {"role": "system", "content": "You are helpful."},
    {"role": "user", "content": "Hello!"}
  ],
  "temperature": 0.7,
  "max_tokens": 1000,
  "stream": false
}

Response Format

Standard OpenAI format plus routing_info:

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "model": "gemini-2.0-flash",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! How can I help you?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 10,
    "completion_tokens": 8,
    "total_tokens": 18
  },
  "routing_info": {
    "selected_model": "gemini-2.0-flash",
    "original_model": "auto",
    "provider": "google",
    "task_type": "simple_qa",
    "capability_score": 0.92,
    "savings_percent": 96.0,
    "baseline_model": "gpt-4o",
    "reasoning": "Selected gemini-2.0-flash as cheapest capable model..."
  }
}

Task Types

RR automatically detects 6 task types:

Task Type Detection Keywords Typical Models
coding function, code, debug, python, api claude-sonnet-4-6, gpt-5.4-mini
reasoning explain why, prove, step by step o3, o4-mini
analysis compare, pros and cons, evaluate gpt-5.4-mini, gemini-2.5-pro
simple_qa what is, who invented, capital of gemini-2.5-flash, claude-haiku-4-5
creative write a story, compose, brainstorm claude-sonnet-4-6, gpt-5.4
general (fallback) cheapest available

Supported Models

11 models across 3 supported providers (routes within your available providers):

Provider Models
OpenAI gpt-5.4, gpt-5.4-mini, gpt-5.4-nano, o3, o4-mini
Anthropic claude-opus-4-6, claude-sonnet-4-6, claude-haiku-4-5-20251001
Google gemini-2.5-pro, gemini-2.5-flash, gemini-2.5-flash-lite

CLI Commands

rr-router start                # Start the proxy server
rr-router start --port 8080    # Start on custom port
rr-router status               # Check proxy health and config
rr-router report weekly        # Cost savings report (7 days)
rr-router report monthly       # Cost savings report (30 days)
rr-router --version            # Show version

MCP Server

The Router includes an MCP server for AI agent integration:

npx @robot-resources/router-mcp

Available tools: router_get_stats, router_compare_models, router_get_config, router_set_config.

Development

See CONTRIBUTING.md for the full development guide.

git clone https://github.com/robot-resources/robot-resources.git
cd robot-resources/router
python -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"
pytest  # 681 tests

Project Structure

src/robot_resources/
├── cli/                    # CLI entry point (click)
├── config.py               # Centralized settings (pydantic-settings)
├── proxy/
│   ├── server.py           # FastAPI app (auth, CORS, lifespan, middleware)
│   ├── security.py         # Bearer token auth (timing-safe)
│   ├── models.py           # Pydantic models with validators
│   ├── handlers/           # API endpoints (completions, stats, config, compare)
│   └── providers/          # LLM provider clients (Anthropic, OpenAI, Google)
├── routing/
│   ├── task_detection.py   # 6 task types, keyword + context
│   ├── classifier.py       # LLM task classifier (async)
│   ├── router.py           # Hybrid routing with confidence branching
│   ├── selector.py         # Capability filter + cheapest model
│   ├── decision_log.py     # SQLite WAL decision persistence
│   └── models_db.json      # 11 models with capabilities + pricing
└── tracking/
    ├── db.py               # OutcomeDB (SQLite WAL, async, migrations)
    ├── recorder.py          # OutcomeRecorder (routing outcomes)
    ├── calculator.py        # CostCalculator (pricing from models_db)
    └── telemetry.py         # TelemetryReporter (platform API)

Troubleshooting

Port already in use

lsof -i :3838
rr-router start --port 3839

API key not found

echo $ANTHROPIC_API_KEY
echo $OPENAI_API_KEY
export ANTHROPIC_API_KEY="sk-ant-..."

Model not found

Use model: "auto" for automatic routing. Check /v1/models for available models.

Roadmap

  • Local proxy with task detection routing
  • Real SSE streaming for all 3 providers
  • Hybrid routing (keyword + LLM classifier)
  • MCP server for stats and configuration
  • Production hardening (681 tests, error handling, observability)
  • Outcome-based routing (learning from success/failure)
  • Calibration lab (benchmark-driven model scoring)

License

MIT

Contributing

Contributions welcome! Please read CONTRIBUTING.md first.

For security vulnerabilities, see SECURITY.md.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

robot_resources_router-2.1.6.tar.gz (65.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

robot_resources_router-2.1.6-py3-none-any.whl (89.0 kB view details)

Uploaded Python 3

File details

Details for the file robot_resources_router-2.1.6.tar.gz.

File metadata

  • Download URL: robot_resources_router-2.1.6.tar.gz
  • Upload date:
  • Size: 65.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for robot_resources_router-2.1.6.tar.gz
Algorithm Hash digest
SHA256 eee92052d2d9107741cc799e3c451d171a06e1b2f06051960dff8366d46af402
MD5 53e2bf2c63ba7e524a555f49f343978b
BLAKE2b-256 dd7c704691cea9c3ebe7c3d65375a168596579223f4a1fa06c6b2d934ccf905c

See more details on using hashes here.

Provenance

The following attestation bundles were made for robot_resources_router-2.1.6.tar.gz:

Publisher: publish.yml on robot-resources/robot-resources

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file robot_resources_router-2.1.6-py3-none-any.whl.

File metadata

File hashes

Hashes for robot_resources_router-2.1.6-py3-none-any.whl
Algorithm Hash digest
SHA256 5d0be0230cab761a260fa00b5049b9fceebb62c5c2d7af6124bea2a4d2729a2d
MD5 00483dd25975ec5c88b180fb6a33991e
BLAKE2b-256 f13ef79f9b5f2df56776a3cfcc3f1b3076c439720231f673cae125764bfd5e24

See more details on using hashes here.

Provenance

The following attestation bundles were made for robot_resources_router-2.1.6-py3-none-any.whl:

Publisher: publish.yml on robot-resources/robot-resources

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page