Intelligent LLM routing proxy — cost optimization via local proxy

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

manusovi6

These details have not been verified by PyPI

Project description

Robot Resources

Intelligent LLM cost optimization via local proxy.

Automatically route each LLM request to the cheapest model that can handle it. Capability scores calibrated from Chatbot Arena ELO ratings.

API-key users: 60-90% direct cost savings (benchmarked 82.5% avg, 210 prompts)
Subscription users (e.g., OpenClaw + Claude): 3x token budget stretch (53.7% avg savings, Haiku/Sonnet/Opus split)

Quick Start

The fastest way — installs Router, registers it as an always-on service, and auto-configures MCP:

npx @robot-resources/router

From PyPI (manual setup)

# 1. Install
pip install robot-resources-router

# 2. Set API keys
export ANTHROPIC_API_KEY="sk-ant-..."
export OPENAI_API_KEY="sk-..."

# 3. Start proxy
rr-router start
# Proxy running on http://localhost:3838

Self-hosted Docker

For compliance customers running in their own VPC. The image is published to GHCR on every PyPI release with multi-arch (amd64 + arm64) support.

docker run -p 3838:3838 \
  -e RR_API_KEY=rr_live_... \
  -e ANTHROPIC_API_KEY=sk-ant-... \
  -e OPENAI_API_KEY=sk-... \
  ghcr.io/robot-resources/router:latest

For docker-compose with persistent telemetry buffer + resource limits, copy docker-compose.yml into your project. The image:

Runs as a non-root user (uid 1000)
Uses tini as PID 1 for clean SIGTERM handling
HEALTHCHECK against /health every 10s
Resource budget: ~128MB RAM idle, scales with traffic

Either way, point your agent to http://localhost:3838 and use model: "auto".

SDK base_url notes (different per SDK — read carefully):

OpenAI: OPENAI_BASE_URL=http://localhost:3838/v1 (with /v1)
Anthropic: ANTHROPIC_BASE_URL=http://localhost:3838 (NO /v1 — SDK appends /v1/messages)
Gemini: route via OpenAI-compatible client with model name gemini-2.5-flash

Why Robot Resources?

Without RR	With RR
Every message uses same expensive model	Each message routed to optimal model
"hello" costs same as "refactor codebase"	Simple tasks use cheap/free models
Manual model selection	Automatic task detection
No cost visibility	Full routing transparency

Savings by user type

API-key users (pay per token, all 11 models available):

Workload	Avg Savings	Typical Model
Simple Q&A	98%	gemini-2.5-flash-lite, gpt-5.4-nano
Creative	83%	gpt-5.4-mini, gemini-2.5-flash
Reasoning	79%	o4-mini, gemini-2.5-pro
Coding	77%	gpt-5.4-mini, gemini-2.5-flash
Analysis	73%	gpt-5.4-mini, gemini-2.5-pro

Subscription users (e.g., OpenClaw + Claude, Anthropic models only):

Complexity	Model Selected	Savings vs Opus
Simple prompts	Haiku (41.9%)	80%
Medium prompts	Sonnet (50.5%)	40%
Complex prompts	Opus (7.6%)	0%

Token budget multiplier: 3x — your subscription handles 3x more requests through intelligent routing.

How It Works

Your Agent
    │
    │  POST /v1/chat/completions
    │  model: "auto"
    ▼
┌─────────────────────────────────────┐
│   Robot Resources (localhost:3838)  │
│                                     │
│   1. Detect task type               │
│      → coding, reasoning, analysis  │
│        simple_qa, creative, general │
│                                     │
│   2. Filter capable models          │
│      → capability >= 0.70 threshold │
│                                     │
│   3. Select cheapest                │
│      → lowest cost_per_1k_input     │
│                                     │
│   4. Forward to provider            │
│      → Anthropic, OpenAI, Google    │
└─────────────────────────────────────┘
    │
    ▼
Real LLM Provider (using your API keys)

Installation

From PyPI

pip install robot-resources-router

From Source

git clone https://github.com/robot-resources/robot-resources.git
cd robot-resources/router
pip install -e ".[dev]"

Requirements

Python 3.10+
API keys for at least one provider (Anthropic, OpenAI, or Google)

Configuration

Environment Variables

# Required: At least one provider
export ANTHROPIC_API_KEY="sk-ant-..."
export OPENAI_API_KEY="sk-..."
export GOOGLE_API_KEY="..."

# Optional: Server settings
export ROUTER_PORT=3838              # Default: 3838
export ROUTER_API_KEY="your-key"     # Optional: enable auth on all endpoints
export ROUTER_CORS_ORIGINS=""        # Default: localhost only

Agent Integration

Point your agent's API base URL to http://localhost:3838 and use model auto. Works with any OpenAI-compatible client.

from openai import OpenAI

client = OpenAI(base_url="http://localhost:3838/v1", api_key="unused")
response = client.chat.completions.create(
    model="auto",
    messages=[{"role": "user", "content": "Hello!"}],
)

API Reference

Endpoints

Endpoint	Method	Description
`/v1/chat/completions`	POST	Chat completions (streaming supported)
`/v1/models`	GET	List available models
`/v1/stats`	GET	Cost savings statistics
`/v1/models/compare`	GET	Compare models by task type
`/v1/config`	GET/PATCH	View or update routing config at runtime
`/health`	GET	Health check with component diagnostics

Request Format

Standard OpenAI chat completions format:

{
  "model": "auto",
  "messages": [
    {"role": "system", "content": "You are helpful."},
    {"role": "user", "content": "Hello!"}
  ],
  "temperature": 0.7,
  "max_tokens": 1000,
  "stream": false
}

Response Format

Standard OpenAI format plus routing_info:

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "model": "gemini-2.0-flash",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! How can I help you?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 10,
    "completion_tokens": 8,
    "total_tokens": 18
  },
  "routing_info": {
    "selected_model": "gemini-2.0-flash",
    "original_model": "auto",
    "provider": "google",
    "task_type": "simple_qa",
    "capability_score": 0.92,
    "savings_percent": 96.0,
    "baseline_model": "gpt-4o",
    "reasoning": "Selected gemini-2.0-flash as cheapest capable model..."
  }
}

Task Types

RR automatically detects 6 task types:

Task Type	Detection Keywords	Typical Models
`coding`	function, code, debug, python, api	claude-sonnet-4-6, gpt-5.4-mini
`reasoning`	explain why, prove, step by step	o3, o4-mini
`analysis`	compare, pros and cons, evaluate	gpt-5.4-mini, gemini-2.5-pro
`simple_qa`	what is, who invented, capital of	gemini-2.5-flash, claude-haiku-4-5
`creative`	write a story, compose, brainstorm	claude-sonnet-4-6, gpt-5.4
`general`	(fallback)	cheapest available

Supported Models

11 models across 3 supported providers (routes within your available providers):

Provider	Models
OpenAI	gpt-5.4, gpt-5.4-mini, gpt-5.4-nano, o3, o4-mini
Anthropic	claude-opus-4-6, claude-sonnet-4-6, claude-haiku-4-5-20251001
Google	gemini-2.5-pro, gemini-2.5-flash, gemini-2.5-flash-lite

CLI Commands

rr-router start                # Start the proxy server
rr-router start --port 8080    # Start on custom port
rr-router status               # Check proxy health and config
rr-router report weekly        # Cost savings report (7 days)
rr-router report monthly       # Cost savings report (30 days)
rr-router --version            # Show version

MCP Server

The Router includes an MCP server for AI agent integration:

npx @robot-resources/router-mcp

Available tools: router_get_stats, router_compare_models, router_get_config, router_set_config.

Development

See CONTRIBUTING.md for the full development guide.

git clone https://github.com/robot-resources/robot-resources.git
cd robot-resources/router
python -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"
pytest  # 681 tests

Project Structure

src/robot_resources/
├── cli/                    # CLI entry point (click)
├── config.py               # Centralized settings (pydantic-settings)
├── proxy/
│   ├── server.py           # FastAPI app (auth, CORS, lifespan, middleware)
│   ├── security.py         # Bearer token auth (timing-safe)
│   ├── models.py           # Pydantic models with validators
│   ├── handlers/           # API endpoints (completions, stats, config, compare)
│   └── providers/          # LLM provider clients (Anthropic, OpenAI, Google)
├── routing/
│   ├── task_detection.py   # 6 task types, keyword + context
│   ├── classifier.py       # LLM task classifier (async)
│   ├── router.py           # Hybrid routing with confidence branching
│   ├── selector.py         # Capability filter + cheapest model
│   ├── decision_log.py     # SQLite WAL decision persistence
│   └── models_db.json      # 11 models with capabilities + pricing
└── tracking/
    ├── db.py               # OutcomeDB (SQLite WAL, async, migrations)
    ├── recorder.py          # OutcomeRecorder (routing outcomes)
    ├── calculator.py        # CostCalculator (pricing from models_db)
    └── telemetry.py         # TelemetryReporter (platform API)

Troubleshooting

Port already in use

lsof -i :3838
rr-router start --port 3839

API key not found

echo $ANTHROPIC_API_KEY
echo $OPENAI_API_KEY
export ANTHROPIC_API_KEY="sk-ant-..."

Model not found

Use model: "auto" for automatic routing. Check /v1/models for available models.

Roadmap

Local proxy with task detection routing
Real SSE streaming for all 3 providers
Hybrid routing (keyword + LLM classifier)
MCP server for stats and configuration
Production hardening (681 tests, error handling, observability)
Outcome-based routing (learning from success/failure)
Calibration lab (benchmark-driven model scoring)

License

MIT

Contributing

Contributions welcome! Please read CONTRIBUTING.md first.

For security vulnerabilities, see SECURITY.md.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

manusovi6

These details have not been verified by PyPI

Release history Release notifications | RSS feed

2.99.0

Apr 27, 2026

2.1.13 yanked

Apr 23, 2026