Skip to main content

A smart, lightweight proxy for routing AI model requests with performance analytics.

Project description

SmolRouter

A smart, lightweight proxy for routing AI model requests with performance analytics. Perfect for local LLM enthusiasts who want intelligent routing, real-time monitoring, and seamless model switching.

Quick Start

Using Docker

  1. Build the image:

    docker build -t smolrouter .
    
  2. Run the container:

    docker run -d \
      --name smolrouter \
      --restart unless-stopped \
      -p 1234:1234 \
      -e DEFAULT_UPSTREAM="http://localhost:8000" \
      -e MODEL_MAP='{"gpt-3.5-turbo":"llama3-8b"}' \
      -v ./routes.yaml:/app/routes.yaml \
      smolrouter
    

Using Python

  1. Install dependencies:

    pip install -r requirements.txt
    
  2. Run the application:

    export DEFAULT_UPSTREAM="http://localhost:8000"
    export MODEL_MAP='{"gpt-3.5-turbo":"llama3-8b"}'
    python app.py
    

Usage

Point your applications to http://localhost:1234 instead of the OpenAI API:

import openai

client = openai.OpenAI(
    base_url="http://localhost:1234/v1",
    api_key="your-api-key"  # This is passed through to the upstream server
)

response = client.chat.completions.create(
    model="gpt-3.5-turbo",  # This will be rewritten to "llama3-8b"
    messages=[{"role": "user", "content": "Hello!"}]
)

Core Features

Smart Routing

  • Host-based & Model-based Routing: Route requests from specific IPs or for specific models to different upstream servers.
  • Regex & Exact Matching: Use regex patterns (e.g., "/.*-8b/") or exact model names for flexible routing.
  • Model Overrides: Automatically change model names on-the-fly for each route.
  • YAML Configuration: Define all routing rules in a simple, human-readable routes.yaml file.

Performance Analytics & Monitoring

  • Interactive Dashboard: A web UI to view real-time and historical request data.
  • Performance Scatter Plots: Visualize token counts vs. response times to compare model performance.
  • Detailed Request Views: Inspect the full request/response transcripts for any logged event.
  • SQLite Backend: All request data is stored in a local SQLite database for persistence.

API Compatibility & Content Processing

  • OpenAI & Ollama Support: Acts as a drop-in replacement for both OpenAI and Ollama APIs.
  • Model Mapping: Remap model names using a simple JSON object for legacy or alternative model support.
  • Streaming Support: Full support for streaming responses for both API formats.
  • Content Manipulation:
    • Think-Chain Stripping: Automatically remove <think>...</think> blocks from responses.
    • JSON Markdown Scrubbing: Convert markdown-fenced JSON into pure JSON.

Configuration

Environment Variables

Variable Default Description
DEFAULT_UPSTREAM http://localhost:8000 The default upstream server to use when no routing rules match.
ROUTES_CONFIG routes.yaml Path to the YAML/JSON file containing smart routing rules.
MODEL_MAP {} A JSON string for simple, legacy model name remapping.
STRIP_THINKING true If true, removes <think>...</think> blocks from responses.
STRIP_JSON_MARKDOWN false If true, converts markdown-fenced JSON blocks to pure JSON.
DISABLE_THINKING false If true, appends a /no_think marker to prompts to disable thinking.
ENABLE_LOGGING true If true, enables request logging and the web UI.
REQUEST_TIMEOUT 3000.0 Timeout in seconds for upstream requests.
DB_PATH requests.db Path to the SQLite database file.
MAX_LOG_AGE_DAYS 7 Automatically delete logs older than this many days.
LISTEN_HOST 127.0.0.1 The host address for the application to bind to.
LISTEN_PORT 1234 The port for the application to listen on.

Smart Routing (routes.yaml)

Create a routes.yaml file to define your routing logic. The first rule that matches a request is used.

routes:
  # Route requests for small models to a specific GPU server using regex
  - match:
      model: "/.*-1.5b/"
    route:
      upstream: "http://gpu-server:8000"

  # Route requests from a specific developer's machine to a dev server
  - match:
      source_host: "10.0.1.100"
    route:
      upstream: "http://dev-server:8000"

  # Route requests for "gpt-4" and override the model name to "claude-3-opus"
  - match:
      model: "gpt-4"
    route:
      upstream: "http://claude-server:8000"
      model: "claude-3-opus"

Web UI & Monitoring

The web UI provides insights into your model usage and performance.

  • Dashboard (/): View the latest request logs and general statistics.
  • Performance (/performance): Analyze model performance with an interactive scatter plot.
  • Request Detail (/request/{id}): See the full transcript of a specific request.

Development

Running Tests

To run the test suite, use pytest:

pip install -r requirements.txt
pytest

Contributing

This project is open source. Please feel free to submit issues and pull requests.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

smolrouter-0.1.1.tar.gz (16.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

smolrouter-0.1.1-py3-none-any.whl (17.3 kB view details)

Uploaded Python 3

File details

Details for the file smolrouter-0.1.1.tar.gz.

File metadata

  • Download URL: smolrouter-0.1.1.tar.gz
  • Upload date:
  • Size: 16.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for smolrouter-0.1.1.tar.gz
Algorithm Hash digest
SHA256 9c95e3e9ae88b80cba61c1176acd1c4875327a91d07225834b6ed49231318c7a
MD5 1287130ce9f7631ec17e2a993c5000fe
BLAKE2b-256 db2e1c5e1dfb8ebd01b97e42d451ba15f8e1cdab210d9a3353cc6215e47526b4

See more details on using hashes here.

File details

Details for the file smolrouter-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: smolrouter-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 17.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for smolrouter-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 31ab3b220d7146515fd31ccebe8ce036a2b54b3921efdac067910f6cc9022a13
MD5 3e7fe5f41ee2b7bba947290cdd983031
BLAKE2b-256 d48d883fc592730d40cd155ea89fa7368453490cbf48a9acf5b374b6a61858d7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page