Skip to main content

Intelligent LLM routing proxy — cost optimization via local proxy

Reason this release was yanked:

Replaced by @robot-resources/router on npm. Run npx robot-resources to install.

Project description

Robot Resources

Intelligent LLM cost optimization via local proxy.

Automatically route each LLM request to the cheapest model that can handle it. 60-90% cost savings with no quality loss.

Quick Start

# 1. Install
pip install robot-resources-router

# 2. Set API keys
export ANTHROPIC_API_KEY="sk-ant-..."
export OPENAI_API_KEY="sk-..."

# 3. Start proxy
rr-router start
# Proxy running on http://localhost:3838

That's it. Point your agent to http://localhost:3838 and use model: "auto".

Why Robot Resources?

Without RR With RR
Every message uses same expensive model Each message routed to optimal model
"hello" costs same as "refactor codebase" Simple tasks use cheap/free models
Manual model selection Automatic task detection
No cost visibility Full routing transparency

Example Savings

Turn 1: "hello"                    → gemini-1.5-flash-8b          $0.0000
Turn 2: "what's 2+2?"              → gemini-1.5-flash-8b          $0.0000
Turn 3: "refactor this React code" → gpt-4o-mini                  $0.0002
Turn 4: "thanks, looks good"       → gemini-1.5-flash-8b          $0.0000
─────────────────────────────────────────────────────────────────────────
Total with RR:     $0.0002
Without RR (gpt-4o): $0.0075
Savings:           97%

How It Works

Your Agent
    │
    │  POST /v1/chat/completions
    │  model: "auto"
    ▼
┌─────────────────────────────────────┐
│   Robot Resources (localhost:3838)  │
│                                     │
│   1. Detect task type               │
│      → coding, reasoning, analysis  │
│        simple_qa, creative, general │
│                                     │
│   2. Filter capable models          │
│      → capability >= 0.70 threshold │
│                                     │
│   3. Select cheapest                │
│      → lowest cost_per_1k_input     │
│                                     │
│   4. Forward to provider            │
│      → Anthropic, OpenAI, Google    │
└─────────────────────────────────────┘
    │
    ▼
Real LLM Provider (using your API keys)

Installation

From PyPI

pip install robot-resources-router

From Source

git clone https://github.com/your-org/robot-resources.git
cd robot-resources
pip install -e .

Requirements

  • Python 3.11+
  • API keys for at least one provider (Anthropic, OpenAI, or Google)

Configuration

Environment Variables

# Required: At least one provider
export ANTHROPIC_API_KEY="sk-ant-..."
export OPENAI_API_KEY="sk-..."
export GOOGLE_API_KEY="..."

# Optional: Server settings
export RR_PORT=3838              # Default: 3838
export RR_HOST=127.0.0.1         # Default: 127.0.0.1

OpenClaw Integration

Add to your OpenClaw config:

{
  "models": {
    "providers": {
      "robot-resources": {
        "baseUrl": "http://localhost:3838",
        "api": "openai-completions"
      }
    }
  },
  "agents": {
    "defaults": {
      "model": {
        "primary": "robot-resources/auto"
      }
    }
  }
}

Claude Desktop / Other Agents

Point your agent's API base URL to http://localhost:3838 and use model auto.

Usage

Automatic Routing (Recommended)

Use model: "auto" to let RR choose the optimal model:

curl http://localhost:3838/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "auto",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Explicit Model

Bypass routing by specifying a model directly:

curl http://localhost:3838/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

API Reference

Endpoints

Endpoint Method Description
/v1/chat/completions POST Chat completions (main endpoint)
/v1/models GET List available models
/health GET Health check

Request Format

Standard OpenAI chat completions format:

{
  "model": "auto",
  "messages": [
    {"role": "system", "content": "You are helpful."},
    {"role": "user", "content": "Hello!"}
  ],
  "temperature": 0.7,
  "max_tokens": 1000,
  "stream": false
}

Response Format

Standard OpenAI format plus routing_info:

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "model": "gemini-2.0-flash",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! How can I help you?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 10,
    "completion_tokens": 8,
    "total_tokens": 18
  },
  "routing_info": {
    "selected_model": "gemini-2.0-flash",
    "original_model": "auto",
    "provider": "google",
    "task_type": "simple_qa",
    "capability_score": 0.92,
    "savings_percent": 96.0,
    "baseline_model": "gpt-4o",
    "reasoning": "Selected gemini-2.0-flash as cheapest capable model..."
  }
}

Task Types

RR automatically detects 6 task types:

Task Type Detection Keywords Typical Models
coding function, code, debug, python, api claude-sonnet-4, gpt-4o-mini
reasoning explain why, prove, step by step o3-mini, o1-mini
analysis compare, pros and cons, evaluate gpt-4o-mini, gemini-1.5-pro
simple_qa what is, who invented, capital of gemini-2.0-flash, claude-3-haiku
creative write a story, compose, brainstorm claude-sonnet-4, gpt-4o
general (fallback) cheapest available

Supported Models

14 models across 3 providers:

Provider Models
OpenAI gpt-4o, gpt-4o-mini, o1, o1-mini, o3-mini
Anthropic claude-opus-4, claude-sonnet-4, claude-3-5-sonnet, claude-3-5-haiku, claude-3-haiku
Google gemini-2.0-flash, gemini-1.5-pro, gemini-1.5-flash, gemini-1.5-flash-8b

CLI Commands

# Start the proxy server
rr-router start

# Start on custom port
rr-router start --port 8080

# Check version
rr-router --version

# Get help
rr-router --help

Development

Setup

git clone https://github.com/your-org/robot-resources.git
cd robot-resources
python -m venv venv
source venv/bin/activate
pip install -e ".[dev]"

Run Tests

pytest                          # All tests
pytest --cov=robot_resources    # With coverage
pytest -v                       # Verbose

Project Structure

src/robot_resources/
├── cli/                    # CLI entry point
├── proxy/
│   ├── server.py          # FastAPI app
│   ├── models.py          # Pydantic models
│   ├── handlers/          # API endpoints
│   └── providers/         # LLM provider clients
├── routing/
│   ├── task_detection.py  # Task type classification
│   ├── selector.py        # Model selection logic
│   ├── router.py          # Routing pipeline
│   └── models_db.json     # Model capabilities database
└── mcp/                   # (Future) MCP server for stats

Troubleshooting

Port already in use

# Check what's using port 3838
lsof -i :3838

# Use a different port
rr-router start --port 3839

API key not found

# Verify keys are set
echo $ANTHROPIC_API_KEY
echo $OPENAI_API_KEY

# Set them
export ANTHROPIC_API_KEY="sk-ant-..."

Model not found

Use model: "auto" for automatic routing. Check /v1/models for available models.

Roadmap

  • Phase 1: Local proxy with task detection routing
  • Phase 2: Outcome-based routing (learning from success/failure)
  • Phase 3: MCP server for stats and configuration

License

MIT

Contributing

Contributions welcome! Please read the contributing guidelines first.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

robot_resources_router-2.0.0.tar.gz (34.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

robot_resources_router-2.0.0-py3-none-any.whl (47.9 kB view details)

Uploaded Python 3

File details

Details for the file robot_resources_router-2.0.0.tar.gz.

File metadata

  • Download URL: robot_resources_router-2.0.0.tar.gz
  • Upload date:
  • Size: 34.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.13

File hashes

Hashes for robot_resources_router-2.0.0.tar.gz
Algorithm Hash digest
SHA256 19cce93d0dfd92bef8588cdf2aeba1d12f9d54c29ea3c600c7c92ccb8f78cca2
MD5 6f671140b190c9db01e39b74b924446f
BLAKE2b-256 7f58a726d9867d6cd5c92521ee29658e00ce9593865f69dd1e357b2d7883f713

See more details on using hashes here.

File details

Details for the file robot_resources_router-2.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for robot_resources_router-2.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f52b3e88b7a0c05b6a9dd0c8c061e8baebbe66edc08482d5a401c29f3e0b67be
MD5 bf272cbb05ce0e8faca4d433fbcf557b
BLAKE2b-256 9682e640570ea1deafd587865e1d74e5c7714d230b680e9fa796c0842e123c4e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page