Intelligent LLM routing proxy — cost optimization via local proxy
Reason this release was yanked:
Replaced by @robot-resources/router on npm. Run npx robot-resources to install.
Project description
Robot Resources
Intelligent LLM cost optimization via local proxy.
Automatically route each LLM request to the cheapest model that can handle it. 60-90% cost savings with no quality loss.
Quick Start
# 1. Install
pip install robot-resources-router
# 2. Set API keys
export ANTHROPIC_API_KEY="sk-ant-..."
export OPENAI_API_KEY="sk-..."
# 3. Start proxy
rr-router start
# Proxy running on http://localhost:3838
That's it. Point your agent to http://localhost:3838 and use model: "auto".
Why Robot Resources?
| Without RR | With RR |
|---|---|
| Every message uses same expensive model | Each message routed to optimal model |
| "hello" costs same as "refactor codebase" | Simple tasks use cheap/free models |
| Manual model selection | Automatic task detection |
| No cost visibility | Full routing transparency |
Example Savings
Turn 1: "hello" → gemini-1.5-flash-8b $0.0000
Turn 2: "what's 2+2?" → gemini-1.5-flash-8b $0.0000
Turn 3: "refactor this React code" → gpt-4o-mini $0.0002
Turn 4: "thanks, looks good" → gemini-1.5-flash-8b $0.0000
─────────────────────────────────────────────────────────────────────────
Total with RR: $0.0002
Without RR (gpt-4o): $0.0075
Savings: 97%
How It Works
Your Agent
│
│ POST /v1/chat/completions
│ model: "auto"
▼
┌─────────────────────────────────────┐
│ Robot Resources (localhost:3838) │
│ │
│ 1. Detect task type │
│ → coding, reasoning, analysis │
│ simple_qa, creative, general │
│ │
│ 2. Filter capable models │
│ → capability >= 0.70 threshold │
│ │
│ 3. Select cheapest │
│ → lowest cost_per_1k_input │
│ │
│ 4. Forward to provider │
│ → Anthropic, OpenAI, Google │
└─────────────────────────────────────┘
│
▼
Real LLM Provider (using your API keys)
Installation
From PyPI
pip install robot-resources-router
From Source
git clone https://github.com/your-org/robot-resources.git
cd robot-resources
pip install -e .
Requirements
- Python 3.11+
- API keys for at least one provider (Anthropic, OpenAI, or Google)
Configuration
Environment Variables
# Required: At least one provider
export ANTHROPIC_API_KEY="sk-ant-..."
export OPENAI_API_KEY="sk-..."
export GOOGLE_API_KEY="..."
# Optional: Server settings
export RR_PORT=3838 # Default: 3838
export RR_HOST=127.0.0.1 # Default: 127.0.0.1
OpenClaw Integration
Add to your OpenClaw config:
{
"models": {
"providers": {
"robot-resources": {
"baseUrl": "http://localhost:3838",
"api": "openai-completions"
}
}
},
"agents": {
"defaults": {
"model": {
"primary": "robot-resources/auto"
}
}
}
}
Claude Desktop / Other Agents
Point your agent's API base URL to http://localhost:3838 and use model auto.
Usage
Automatic Routing (Recommended)
Use model: "auto" to let RR choose the optimal model:
curl http://localhost:3838/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "auto",
"messages": [{"role": "user", "content": "Hello!"}]
}'
Explicit Model
Bypass routing by specifying a model directly:
curl http://localhost:3838/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"messages": [{"role": "user", "content": "Hello!"}]
}'
API Reference
Endpoints
| Endpoint | Method | Description |
|---|---|---|
/v1/chat/completions |
POST | Chat completions (main endpoint) |
/v1/models |
GET | List available models |
/health |
GET | Health check |
Request Format
Standard OpenAI chat completions format:
{
"model": "auto",
"messages": [
{"role": "system", "content": "You are helpful."},
{"role": "user", "content": "Hello!"}
],
"temperature": 0.7,
"max_tokens": 1000,
"stream": false
}
Response Format
Standard OpenAI format plus routing_info:
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"model": "gemini-2.0-flash",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! How can I help you?"
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 10,
"completion_tokens": 8,
"total_tokens": 18
},
"routing_info": {
"selected_model": "gemini-2.0-flash",
"original_model": "auto",
"provider": "google",
"task_type": "simple_qa",
"capability_score": 0.92,
"savings_percent": 96.0,
"baseline_model": "gpt-4o",
"reasoning": "Selected gemini-2.0-flash as cheapest capable model..."
}
}
Task Types
RR automatically detects 6 task types:
| Task Type | Detection Keywords | Typical Models |
|---|---|---|
coding |
function, code, debug, python, api | claude-sonnet-4, gpt-4o-mini |
reasoning |
explain why, prove, step by step | o3-mini, o1-mini |
analysis |
compare, pros and cons, evaluate | gpt-4o-mini, gemini-1.5-pro |
simple_qa |
what is, who invented, capital of | gemini-2.0-flash, claude-3-haiku |
creative |
write a story, compose, brainstorm | claude-sonnet-4, gpt-4o |
general |
(fallback) | cheapest available |
Supported Models
14 models across 3 providers:
| Provider | Models |
|---|---|
| OpenAI | gpt-4o, gpt-4o-mini, o1, o1-mini, o3-mini |
| Anthropic | claude-opus-4, claude-sonnet-4, claude-3-5-sonnet, claude-3-5-haiku, claude-3-haiku |
| gemini-2.0-flash, gemini-1.5-pro, gemini-1.5-flash, gemini-1.5-flash-8b |
CLI Commands
# Start the proxy server
rr-router start
# Start on custom port
rr-router start --port 8080
# Check version
rr-router --version
# Get help
rr-router --help
Development
Setup
git clone https://github.com/your-org/robot-resources.git
cd robot-resources
python -m venv venv
source venv/bin/activate
pip install -e ".[dev]"
Run Tests
pytest # All tests
pytest --cov=robot_resources # With coverage
pytest -v # Verbose
Project Structure
src/robot_resources/
├── cli/ # CLI entry point
├── proxy/
│ ├── server.py # FastAPI app
│ ├── models.py # Pydantic models
│ ├── handlers/ # API endpoints
│ └── providers/ # LLM provider clients
├── routing/
│ ├── task_detection.py # Task type classification
│ ├── selector.py # Model selection logic
│ ├── router.py # Routing pipeline
│ └── models_db.json # Model capabilities database
└── mcp/ # (Future) MCP server for stats
Troubleshooting
Port already in use
# Check what's using port 3838
lsof -i :3838
# Use a different port
rr-router start --port 3839
API key not found
# Verify keys are set
echo $ANTHROPIC_API_KEY
echo $OPENAI_API_KEY
# Set them
export ANTHROPIC_API_KEY="sk-ant-..."
Model not found
Use model: "auto" for automatic routing. Check /v1/models for available models.
Roadmap
- Phase 1: Local proxy with task detection routing
- Phase 2: Outcome-based routing (learning from success/failure)
- Phase 3: MCP server for stats and configuration
License
MIT
Contributing
Contributions welcome! Please read the contributing guidelines first.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file robot_resources_router-2.0.0.tar.gz.
File metadata
- Download URL: robot_resources_router-2.0.0.tar.gz
- Upload date:
- Size: 34.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
19cce93d0dfd92bef8588cdf2aeba1d12f9d54c29ea3c600c7c92ccb8f78cca2
|
|
| MD5 |
6f671140b190c9db01e39b74b924446f
|
|
| BLAKE2b-256 |
7f58a726d9867d6cd5c92521ee29658e00ce9593865f69dd1e357b2d7883f713
|
File details
Details for the file robot_resources_router-2.0.0-py3-none-any.whl.
File metadata
- Download URL: robot_resources_router-2.0.0-py3-none-any.whl
- Upload date:
- Size: 47.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f52b3e88b7a0c05b6a9dd0c8c061e8baebbe66edc08482d5a401c29f3e0b67be
|
|
| MD5 |
bf272cbb05ce0e8faca4d433fbcf557b
|
|
| BLAKE2b-256 |
9682e640570ea1deafd587865e1d74e5c7714d230b680e9fa796c0842e123c4e
|