Intelligent LLM routing proxy — cost optimization via local proxy

These details have not been verified by PyPI

Project links

Project description

Robot Resources

Intelligent LLM cost optimization via local proxy.

Automatically route each LLM request to the cheapest model that can handle it. 60-90% cost savings with no quality loss.

Quick Start

# 1. Install
pip install robot-resources-router

# 2. Set API keys
export ANTHROPIC_API_KEY="sk-ant-..."
export OPENAI_API_KEY="sk-..."

# 3. Start proxy
rr-router start
# Proxy running on http://localhost:3838

That's it. Point your agent to http://localhost:3838 and use model: "auto".

Why Robot Resources?

Without RR	With RR
Every message uses same expensive model	Each message routed to optimal model
"hello" costs same as "refactor codebase"	Simple tasks use cheap/free models
Manual model selection	Automatic task detection
No cost visibility	Full routing transparency

Example Savings

Turn 1: "hello"                    → gemini-1.5-flash-8b          $0.0000
Turn 2: "what's 2+2?"              → gemini-1.5-flash-8b          $0.0000
Turn 3: "refactor this React code" → gpt-4o-mini                  $0.0002
Turn 4: "thanks, looks good"       → gemini-1.5-flash-8b          $0.0000
─────────────────────────────────────────────────────────────────────────
Total with RR:     $0.0002
Without RR (gpt-4o): $0.0075
Savings:           97%

How It Works

Your Agent
    │
    │  POST /v1/chat/completions
    │  model: "auto"
    ▼
┌─────────────────────────────────────┐
│   Robot Resources (localhost:3838)  │
│                                     │
│   1. Detect task type               │
│      → coding, reasoning, analysis  │
│        simple_qa, creative, general │
│                                     │
│   2. Filter capable models          │
│      → capability >= 0.70 threshold │
│                                     │
│   3. Select cheapest                │
│      → lowest cost_per_1k_input     │
│                                     │
│   4. Forward to provider            │
│      → Anthropic, OpenAI, Google    │
└─────────────────────────────────────┘
    │
    ▼
Real LLM Provider (using your API keys)

Installation

From PyPI

pip install robot-resources-router

From Source

git clone https://github.com/your-org/robot-resources.git
cd robot-resources
pip install -e .

Requirements

Python 3.11+
API keys for at least one provider (Anthropic, OpenAI, or Google)

Configuration

Environment Variables

# Required: At least one provider
export ANTHROPIC_API_KEY="sk-ant-..."
export OPENAI_API_KEY="sk-..."
export GOOGLE_API_KEY="..."

# Optional: Server settings
export RR_PORT=3838              # Default: 3838
export RR_HOST=127.0.0.1         # Default: 127.0.0.1

OpenClaw Integration

Add to your OpenClaw config:

{
  "models": {
    "providers": {
      "robot-resources": {
        "baseUrl": "http://localhost:3838",
        "api": "openai-completions"
      }
    }
  },
  "agents": {
    "defaults": {
      "model": {
        "primary": "robot-resources/auto"
      }
    }
  }
}

Claude Desktop / Other Agents

Point your agent's API base URL to http://localhost:3838 and use model auto.

Usage

Automatic Routing (Recommended)

Use model: "auto" to let RR choose the optimal model:

curl http://localhost:3838/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "auto",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Explicit Model

Bypass routing by specifying a model directly:

curl http://localhost:3838/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

API Reference

Endpoints

Endpoint	Method	Description
`/v1/chat/completions`	POST	Chat completions (main endpoint)
`/v1/models`	GET	List available models
`/health`	GET	Health check

Request Format

Standard OpenAI chat completions format:

{
  "model": "auto",
  "messages": [
    {"role": "system", "content": "You are helpful."},
    {"role": "user", "content": "Hello!"}
  ],
  "temperature": 0.7,
  "max_tokens": 1000,
  "stream": false
}

Response Format

Standard OpenAI format plus routing_info:

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "model": "gemini-2.0-flash",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! How can I help you?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 10,
    "completion_tokens": 8,
    "total_tokens": 18
  },
  "routing_info": {
    "selected_model": "gemini-2.0-flash",
    "original_model": "auto",
    "provider": "google",
    "task_type": "simple_qa",
    "capability_score": 0.92,
    "savings_percent": 96.0,
    "baseline_model": "gpt-4o",
    "reasoning": "Selected gemini-2.0-flash as cheapest capable model..."
  }
}

Task Types

RR automatically detects 6 task types:

Task Type	Detection Keywords	Typical Models
`coding`	function, code, debug, python, api	claude-sonnet-4, gpt-4o-mini
`reasoning`	explain why, prove, step by step	o3-mini, o1-mini
`analysis`	compare, pros and cons, evaluate	gpt-4o-mini, gemini-1.5-pro
`simple_qa`	what is, who invented, capital of	gemini-2.0-flash, claude-3-haiku
`creative`	write a story, compose, brainstorm	claude-sonnet-4, gpt-4o
`general`	(fallback)	cheapest available

Supported Models

14 models across 3 providers:

Provider	Models
OpenAI	gpt-4o, gpt-4o-mini, o1, o1-mini, o3-mini
Anthropic	claude-opus-4, claude-sonnet-4, claude-3-5-sonnet, claude-3-5-haiku, claude-3-haiku
Google	gemini-2.0-flash, gemini-1.5-pro, gemini-1.5-flash, gemini-1.5-flash-8b

CLI Commands

# Start the proxy server
rr-router start

# Start on custom port
rr-router start --port 8080

# Check version
rr-router --version

# Get help
rr-router --help

Development

Setup

git clone https://github.com/your-org/robot-resources.git
cd robot-resources
python -m venv venv
source venv/bin/activate
pip install -e ".[dev]"

Run Tests

pytest                          # All tests
pytest --cov=robot_resources    # With coverage
pytest -v                       # Verbose

Project Structure

src/robot_resources/
├── cli/                    # CLI entry point
├── proxy/
│   ├── server.py          # FastAPI app
│   ├── models.py          # Pydantic models
│   ├── handlers/          # API endpoints
│   └── providers/         # LLM provider clients
├── routing/
│   ├── task_detection.py  # Task type classification
│   ├── selector.py        # Model selection logic
│   ├── router.py          # Routing pipeline
│   └── models_db.json     # Model capabilities database
└── mcp/                   # (Future) MCP server for stats

Troubleshooting

Port already in use

# Check what's using port 3838
lsof -i :3838

# Use a different port
rr-router start --port 3839

API key not found

# Verify keys are set
echo $ANTHROPIC_API_KEY
echo $OPENAI_API_KEY

# Set them
export ANTHROPIC_API_KEY="sk-ant-..."

Model not found

Use model: "auto" for automatic routing. Check /v1/models for available models.

Roadmap

Phase 1: Local proxy with task detection routing
Phase 2: Outcome-based routing (learning from success/failure)
Phase 3: MCP server for stats and configuration

License

MIT

Contributing

Contributions welcome! Please read the contributing guidelines first.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

2.99.0

Apr 27, 2026

2.1.13 yanked

Apr 23, 2026