Skip to main content

Intelligent LLM routing proxy — cost optimization via local proxy

Reason this release was yanked:

Replaced by @robot-resources/router on npm. Run npx robot-resources to install.

Project description

Robot Resources

CI Python 3.10+ License: MIT Version

Intelligent LLM cost optimization via local proxy.

Automatically route each LLM request to the cheapest model that can handle it. Capability scores calibrated from Chatbot Arena ELO ratings.

  • API-key users: 60-90% direct cost savings (benchmarked 82.5% avg, 210 prompts)
  • Subscription users (e.g., OpenClaw + Claude): 3x token budget stretch (53.7% avg savings, Haiku/Sonnet/Opus split)

Quick Start

The fastest way — installs Router, registers it as an always-on service, and auto-configures MCP:

npx @robot-resources/router

From PyPI (manual setup)

# 1. Install
pip install robot-resources-router

# 2. Set API keys
export ANTHROPIC_API_KEY="sk-ant-..."
export OPENAI_API_KEY="sk-..."

# 3. Start proxy
rr-router start
# Proxy running on http://localhost:3838

Self-hosted Docker

For compliance customers running in their own VPC. The image is published to GHCR on every PyPI release with multi-arch (amd64 + arm64) support.

docker run -p 3838:3838 \
  -e RR_API_KEY=rr_live_... \
  -e ANTHROPIC_API_KEY=sk-ant-... \
  -e OPENAI_API_KEY=sk-... \
  ghcr.io/robot-resources/router:latest

For docker-compose with persistent telemetry buffer + resource limits, copy docker-compose.yml into your project. The image:

  • Runs as a non-root user (uid 1000)
  • Uses tini as PID 1 for clean SIGTERM handling
  • HEALTHCHECK against /health every 10s
  • Resource budget: ~128MB RAM idle, scales with traffic

Either way, point your agent to http://localhost:3838 and use model: "auto".

SDK base_url notes (different per SDK — read carefully):

  • OpenAI: OPENAI_BASE_URL=http://localhost:3838/v1 (with /v1)
  • Anthropic: ANTHROPIC_BASE_URL=http://localhost:3838 (NO /v1 — SDK appends /v1/messages)
  • Gemini: route via OpenAI-compatible client with model name gemini-2.5-flash

Why Robot Resources?

Without RR With RR
Every message uses same expensive model Each message routed to optimal model
"hello" costs same as "refactor codebase" Simple tasks use cheap/free models
Manual model selection Automatic task detection
No cost visibility Full routing transparency

Savings by user type

API-key users (pay per token, all 11 models available):

Workload Avg Savings Typical Model
Simple Q&A 98% gemini-2.5-flash-lite, gpt-5.4-nano
Creative 83% gpt-5.4-mini, gemini-2.5-flash
Reasoning 79% o4-mini, gemini-2.5-pro
Coding 77% gpt-5.4-mini, gemini-2.5-flash
Analysis 73% gpt-5.4-mini, gemini-2.5-pro

Subscription users (e.g., OpenClaw + Claude, Anthropic models only):

Complexity Model Selected Savings vs Opus
Simple prompts Haiku (41.9%) 80%
Medium prompts Sonnet (50.5%) 40%
Complex prompts Opus (7.6%) 0%

Token budget multiplier: 3x — your subscription handles 3x more requests through intelligent routing.

How It Works

Your Agent
    │
    │  POST /v1/chat/completions
    │  model: "auto"
    ▼
┌─────────────────────────────────────┐
│   Robot Resources (localhost:3838)  │
│                                     │
│   1. Detect task type               │
│      → coding, reasoning, analysis  │
│        simple_qa, creative, general │
│                                     │
│   2. Filter capable models          │
│      → capability >= 0.70 threshold │
│                                     │
│   3. Select cheapest                │
│      → lowest cost_per_1k_input     │
│                                     │
│   4. Forward to provider            │
│      → Anthropic, OpenAI, Google    │
└─────────────────────────────────────┘
    │
    ▼
Real LLM Provider (using your API keys)

Installation

From PyPI

pip install robot-resources-router

From Source

git clone https://github.com/robot-resources/robot-resources.git
cd robot-resources/router
pip install -e ".[dev]"

Requirements

  • Python 3.10+
  • API keys for at least one provider (Anthropic, OpenAI, or Google)

Configuration

Environment Variables

# Required: At least one provider
export ANTHROPIC_API_KEY="sk-ant-..."
export OPENAI_API_KEY="sk-..."
export GOOGLE_API_KEY="..."

# Optional: Server settings
export ROUTER_PORT=3838              # Default: 3838
export ROUTER_API_KEY="your-key"     # Optional: enable auth on all endpoints
export ROUTER_CORS_ORIGINS=""        # Default: localhost only

Agent Integration

Point your agent's API base URL to http://localhost:3838 and use model auto. Works with any OpenAI-compatible client.

from openai import OpenAI

client = OpenAI(base_url="http://localhost:3838/v1", api_key="unused")
response = client.chat.completions.create(
    model="auto",
    messages=[{"role": "user", "content": "Hello!"}],
)

API Reference

Endpoints

Endpoint Method Description
/v1/chat/completions POST Chat completions (streaming supported)
/v1/models GET List available models
/v1/stats GET Cost savings statistics
/v1/models/compare GET Compare models by task type
/v1/config GET/PATCH View or update routing config at runtime
/health GET Health check with component diagnostics

Request Format

Standard OpenAI chat completions format:

{
  "model": "auto",
  "messages": [
    {"role": "system", "content": "You are helpful."},
    {"role": "user", "content": "Hello!"}
  ],
  "temperature": 0.7,
  "max_tokens": 1000,
  "stream": false
}

Response Format

Standard OpenAI format plus routing_info:

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "model": "gemini-2.0-flash",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! How can I help you?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 10,
    "completion_tokens": 8,
    "total_tokens": 18
  },
  "routing_info": {
    "selected_model": "gemini-2.0-flash",
    "original_model": "auto",
    "provider": "google",
    "task_type": "simple_qa",
    "capability_score": 0.92,
    "savings_percent": 96.0,
    "baseline_model": "gpt-4o",
    "reasoning": "Selected gemini-2.0-flash as cheapest capable model..."
  }
}

Task Types

RR automatically detects 6 task types:

Task Type Detection Keywords Typical Models
coding function, code, debug, python, api claude-sonnet-4-6, gpt-5.4-mini
reasoning explain why, prove, step by step o3, o4-mini
analysis compare, pros and cons, evaluate gpt-5.4-mini, gemini-2.5-pro
simple_qa what is, who invented, capital of gemini-2.5-flash, claude-haiku-4-5
creative write a story, compose, brainstorm claude-sonnet-4-6, gpt-5.4
general (fallback) cheapest available

Supported Models

11 models across 3 supported providers (routes within your available providers):

Provider Models
OpenAI gpt-5.4, gpt-5.4-mini, gpt-5.4-nano, o3, o4-mini
Anthropic claude-opus-4-6, claude-sonnet-4-6, claude-haiku-4-5-20251001
Google gemini-2.5-pro, gemini-2.5-flash, gemini-2.5-flash-lite

CLI Commands

rr-router start                # Start the proxy server
rr-router start --port 8080    # Start on custom port
rr-router status               # Check proxy health and config
rr-router report weekly        # Cost savings report (7 days)
rr-router report monthly       # Cost savings report (30 days)
rr-router --version            # Show version

MCP Server

The Router includes an MCP server for AI agent integration:

npx @robot-resources/router-mcp

Available tools: router_get_stats, router_compare_models, router_get_config, router_set_config.

Development

See CONTRIBUTING.md for the full development guide.

git clone https://github.com/robot-resources/robot-resources.git
cd robot-resources/router
python -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"
pytest  # 681 tests

Project Structure

src/robot_resources/
├── cli/                    # CLI entry point (click)
├── config.py               # Centralized settings (pydantic-settings)
├── proxy/
│   ├── server.py           # FastAPI app (auth, CORS, lifespan, middleware)
│   ├── security.py         # Bearer token auth (timing-safe)
│   ├── models.py           # Pydantic models with validators
│   ├── handlers/           # API endpoints (completions, stats, config, compare)
│   └── providers/          # LLM provider clients (Anthropic, OpenAI, Google)
├── routing/
│   ├── task_detection.py   # 6 task types, keyword + context
│   ├── classifier.py       # LLM task classifier (async)
│   ├── router.py           # Hybrid routing with confidence branching
│   ├── selector.py         # Capability filter + cheapest model
│   ├── decision_log.py     # SQLite WAL decision persistence
│   └── models_db.json      # 11 models with capabilities + pricing
└── tracking/
    ├── db.py               # OutcomeDB (SQLite WAL, async, migrations)
    ├── recorder.py          # OutcomeRecorder (routing outcomes)
    ├── calculator.py        # CostCalculator (pricing from models_db)
    └── telemetry.py         # TelemetryReporter (platform API)

Troubleshooting

Port already in use

lsof -i :3838
rr-router start --port 3839

API key not found

echo $ANTHROPIC_API_KEY
echo $OPENAI_API_KEY
export ANTHROPIC_API_KEY="sk-ant-..."

Model not found

Use model: "auto" for automatic routing. Check /v1/models for available models.

Roadmap

  • Local proxy with task detection routing
  • Real SSE streaming for all 3 providers
  • Hybrid routing (keyword + LLM classifier)
  • MCP server for stats and configuration
  • Production hardening (681 tests, error handling, observability)
  • Outcome-based routing (learning from success/failure)
  • Calibration lab (benchmark-driven model scoring)

License

MIT

Contributing

Contributions welcome! Please read CONTRIBUTING.md first.

For security vulnerabilities, see SECURITY.md.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

robot_resources_router-2.1.12.tar.gz (84.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

robot_resources_router-2.1.12-py3-none-any.whl (112.5 kB view details)

Uploaded Python 3

File details

Details for the file robot_resources_router-2.1.12.tar.gz.

File metadata

  • Download URL: robot_resources_router-2.1.12.tar.gz
  • Upload date:
  • Size: 84.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for robot_resources_router-2.1.12.tar.gz
Algorithm Hash digest
SHA256 408b77d0ed33b2a6112d4b0dc6b9ca5fa0e2cc31f63d7101fc5201b2c0cb68ab
MD5 c162ccc7cfe846b57846303934fc08f2
BLAKE2b-256 2e66669d137a3961f7db2f8cf08b4df8b84abc81e9b7ff91884e6650857518f4

See more details on using hashes here.

Provenance

The following attestation bundles were made for robot_resources_router-2.1.12.tar.gz:

Publisher: publish.yml on robot-resources/robot-resources

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file robot_resources_router-2.1.12-py3-none-any.whl.

File metadata

File hashes

Hashes for robot_resources_router-2.1.12-py3-none-any.whl
Algorithm Hash digest
SHA256 bb54715d75b8118c5462767eec9d7e72946582547e49683d4753befee9f1e158
MD5 264803f3527efe1f6dd2271b11372f5a
BLAKE2b-256 dc5e35a5dc8e81f47391edce089d066dd1626a07e2086ba16d35a11c66761755

See more details on using hashes here.

Provenance

The following attestation bundles were made for robot_resources_router-2.1.12-py3-none-any.whl:

Publisher: publish.yml on robot-resources/robot-resources

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page