Intelligent LLM routing proxy — cost optimization via local proxy
Reason this release was yanked:
Replaced by @robot-resources/router on npm. Run npx robot-resources to install.
Project description
Robot Resources
Intelligent LLM cost optimization via local proxy.
Automatically route each LLM request to the cheapest model that can handle it. Capability scores calibrated from Chatbot Arena ELO ratings.
- API-key users: 60-90% direct cost savings (benchmarked 82.5% avg, 210 prompts)
- Subscription users (e.g., OpenClaw + Claude): 3x token budget stretch (53.7% avg savings, Haiku/Sonnet/Opus split)
Quick Start
The fastest way — installs Router, registers it as an always-on service, and auto-configures MCP:
npx @robot-resources/router
From PyPI (manual setup)
# 1. Install
pip install robot-resources-router
# 2. Set API keys
export ANTHROPIC_API_KEY="sk-ant-..."
export OPENAI_API_KEY="sk-..."
# 3. Start proxy
rr-router start
# Proxy running on http://localhost:3838
Either way, point your agent to http://localhost:3838 and use model: "auto".
Why Robot Resources?
| Without RR | With RR |
|---|---|
| Every message uses same expensive model | Each message routed to optimal model |
| "hello" costs same as "refactor codebase" | Simple tasks use cheap/free models |
| Manual model selection | Automatic task detection |
| No cost visibility | Full routing transparency |
Savings by user type
API-key users (pay per token, all 11 models available):
| Workload | Avg Savings | Typical Model |
|---|---|---|
| Simple Q&A | 98% | gemini-2.5-flash-lite, gpt-5.4-nano |
| Creative | 83% | gpt-5.4-mini, gemini-2.5-flash |
| Reasoning | 79% | o4-mini, gemini-2.5-pro |
| Coding | 77% | gpt-5.4-mini, gemini-2.5-flash |
| Analysis | 73% | gpt-5.4-mini, gemini-2.5-pro |
Subscription users (e.g., OpenClaw + Claude, Anthropic models only):
| Complexity | Model Selected | Savings vs Opus |
|---|---|---|
| Simple prompts | Haiku (41.9%) | 80% |
| Medium prompts | Sonnet (50.5%) | 40% |
| Complex prompts | Opus (7.6%) | 0% |
Token budget multiplier: 3x — your subscription handles 3x more requests through intelligent routing.
How It Works
Your Agent
│
│ POST /v1/chat/completions
│ model: "auto"
▼
┌─────────────────────────────────────┐
│ Robot Resources (localhost:3838) │
│ │
│ 1. Detect task type │
│ → coding, reasoning, analysis │
│ simple_qa, creative, general │
│ │
│ 2. Filter capable models │
│ → capability >= 0.70 threshold │
│ │
│ 3. Select cheapest │
│ → lowest cost_per_1k_input │
│ │
│ 4. Forward to provider │
│ → Anthropic, OpenAI, Google │
└─────────────────────────────────────┘
│
▼
Real LLM Provider (using your API keys)
Installation
From PyPI
pip install robot-resources-router
From Source
git clone https://github.com/robot-resources/robot-resources.git
cd robot-resources/router
pip install -e ".[dev]"
Requirements
- Python 3.10+
- API keys for at least one provider (Anthropic, OpenAI, or Google)
Configuration
Environment Variables
# Required: At least one provider
export ANTHROPIC_API_KEY="sk-ant-..."
export OPENAI_API_KEY="sk-..."
export GOOGLE_API_KEY="..."
# Optional: Server settings
export ROUTER_PORT=3838 # Default: 3838
export ROUTER_API_KEY="your-key" # Optional: enable auth on all endpoints
export ROUTER_CORS_ORIGINS="" # Default: localhost only
Agent Integration
Point your agent's API base URL to http://localhost:3838 and use model auto. Works with any OpenAI-compatible client.
from openai import OpenAI
client = OpenAI(base_url="http://localhost:3838/v1", api_key="unused")
response = client.chat.completions.create(
model="auto",
messages=[{"role": "user", "content": "Hello!"}],
)
API Reference
Endpoints
| Endpoint | Method | Description |
|---|---|---|
/v1/chat/completions |
POST | Chat completions (streaming supported) |
/v1/models |
GET | List available models |
/v1/stats |
GET | Cost savings statistics |
/v1/models/compare |
GET | Compare models by task type |
/v1/config |
GET/PATCH | View or update routing config at runtime |
/health |
GET | Health check with component diagnostics |
Request Format
Standard OpenAI chat completions format:
{
"model": "auto",
"messages": [
{"role": "system", "content": "You are helpful."},
{"role": "user", "content": "Hello!"}
],
"temperature": 0.7,
"max_tokens": 1000,
"stream": false
}
Response Format
Standard OpenAI format plus routing_info:
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"model": "gemini-2.0-flash",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! How can I help you?"
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 10,
"completion_tokens": 8,
"total_tokens": 18
},
"routing_info": {
"selected_model": "gemini-2.0-flash",
"original_model": "auto",
"provider": "google",
"task_type": "simple_qa",
"capability_score": 0.92,
"savings_percent": 96.0,
"baseline_model": "gpt-4o",
"reasoning": "Selected gemini-2.0-flash as cheapest capable model..."
}
}
Task Types
RR automatically detects 6 task types:
| Task Type | Detection Keywords | Typical Models |
|---|---|---|
coding |
function, code, debug, python, api | claude-sonnet-4-6, gpt-5.4-mini |
reasoning |
explain why, prove, step by step | o3, o4-mini |
analysis |
compare, pros and cons, evaluate | gpt-5.4-mini, gemini-2.5-pro |
simple_qa |
what is, who invented, capital of | gemini-2.5-flash, claude-haiku-4-5 |
creative |
write a story, compose, brainstorm | claude-sonnet-4-6, gpt-5.4 |
general |
(fallback) | cheapest available |
Supported Models
11 models across 3 supported providers (routes within your available providers):
| Provider | Models |
|---|---|
| OpenAI | gpt-5.4, gpt-5.4-mini, gpt-5.4-nano, o3, o4-mini |
| Anthropic | claude-opus-4-6, claude-sonnet-4-6, claude-haiku-4-5-20251001 |
| gemini-2.5-pro, gemini-2.5-flash, gemini-2.5-flash-lite |
CLI Commands
rr-router start # Start the proxy server
rr-router start --port 8080 # Start on custom port
rr-router status # Check proxy health and config
rr-router report weekly # Cost savings report (7 days)
rr-router report monthly # Cost savings report (30 days)
rr-router --version # Show version
MCP Server
The Router includes an MCP server for AI agent integration:
npx @robot-resources/router-mcp
Available tools: router_get_stats, router_compare_models, router_get_config, router_set_config.
Development
See CONTRIBUTING.md for the full development guide.
git clone https://github.com/robot-resources/robot-resources.git
cd robot-resources/router
python -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"
pytest # 681 tests
Project Structure
src/robot_resources/
├── cli/ # CLI entry point (click)
├── config.py # Centralized settings (pydantic-settings)
├── proxy/
│ ├── server.py # FastAPI app (auth, CORS, lifespan, middleware)
│ ├── security.py # Bearer token auth (timing-safe)
│ ├── models.py # Pydantic models with validators
│ ├── handlers/ # API endpoints (completions, stats, config, compare)
│ └── providers/ # LLM provider clients (Anthropic, OpenAI, Google)
├── routing/
│ ├── task_detection.py # 6 task types, keyword + context
│ ├── classifier.py # LLM task classifier (async)
│ ├── router.py # Hybrid routing with confidence branching
│ ├── selector.py # Capability filter + cheapest model
│ ├── decision_log.py # SQLite WAL decision persistence
│ └── models_db.json # 11 models with capabilities + pricing
└── tracking/
├── db.py # OutcomeDB (SQLite WAL, async, migrations)
├── recorder.py # OutcomeRecorder (routing outcomes)
├── calculator.py # CostCalculator (pricing from models_db)
└── telemetry.py # TelemetryReporter (platform API)
Troubleshooting
Port already in use
lsof -i :3838
rr-router start --port 3839
API key not found
echo $ANTHROPIC_API_KEY
echo $OPENAI_API_KEY
export ANTHROPIC_API_KEY="sk-ant-..."
Model not found
Use model: "auto" for automatic routing. Check /v1/models for available models.
Roadmap
- Local proxy with task detection routing
- Real SSE streaming for all 3 providers
- Hybrid routing (keyword + LLM classifier)
- MCP server for stats and configuration
- Production hardening (681 tests, error handling, observability)
- Outcome-based routing (learning from success/failure)
- Calibration lab (benchmark-driven model scoring)
License
Contributing
Contributions welcome! Please read CONTRIBUTING.md first.
For security vulnerabilities, see SECURITY.md.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file robot_resources_router-2.1.2.tar.gz.
File metadata
- Download URL: robot_resources_router-2.1.2.tar.gz
- Upload date:
- Size: 65.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
50aed97d93ef2f25d9d1a34b4944083f4371233082fd8b0fbc794aac51bff497
|
|
| MD5 |
290be5d962ffde33f3286109354fd661
|
|
| BLAKE2b-256 |
80f24d6b922a13dcd6e5f8f9902ad20491896045541cd9856ca53e40a2179385
|
File details
Details for the file robot_resources_router-2.1.2-py3-none-any.whl.
File metadata
- Download URL: robot_resources_router-2.1.2-py3-none-any.whl
- Upload date:
- Size: 88.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5b0520143dbb81af65eb7f4ee0871dced093ac8f2e7981ac9e15aca7a70e436b
|
|
| MD5 |
c4035159482eac15af13503ddf3ce957
|
|
| BLAKE2b-256 |
9ce7e535aaf8c527a44d544315a9428506276e1c4ea56bf8e1896e449cc13540
|