A smart, lightweight proxy for routing AI model requests with performance analytics.
Project description
SmolRouter
A smart, lightweight proxy for routing AI model requests with performance analytics. Perfect for local LLM enthusiasts who want intelligent routing, real-time monitoring, and seamless model switching.
Quick Start
Using Docker
-
Build the image:
docker build -t smolrouter .
-
Run the container:
docker run -d \ --name smolrouter \ --restart unless-stopped \ -p 1234:1234 \ -e DEFAULT_UPSTREAM="http://localhost:8000" \ -e MODEL_MAP='{"gpt-3.5-turbo":"llama3-8b"}' \ -v ./routes.yaml:/app/routes.yaml \ smolrouter
Using Python
-
Install dependencies:
pip install -r requirements.txt
-
Run the application:
export DEFAULT_UPSTREAM="http://localhost:8000" export MODEL_MAP='{"gpt-3.5-turbo":"llama3-8b"}' python app.py
Usage
Point your applications to http://localhost:1234 instead of the OpenAI API:
import openai
client = openai.OpenAI(
base_url="http://localhost:1234/v1",
api_key="your-api-key" # This is passed through to the upstream server
)
response = client.chat.completions.create(
model="gpt-3.5-turbo", # This will be rewritten to "llama3-8b"
messages=[{"role": "user", "content": "Hello!"}]
)
Core Features
Smart Routing
- Host-based & Model-based Routing: Route requests from specific IPs or for specific models to different upstream servers.
- Regex & Exact Matching: Use regex patterns (e.g.,
"/.*-8b/") or exact model names for flexible routing. - Model Overrides: Automatically change model names on-the-fly for each route.
- YAML Configuration: Define all routing rules in a simple, human-readable
routes.yamlfile.
Performance Analytics & Monitoring
- Interactive Dashboard: A web UI to view real-time and historical request data.
- Performance Scatter Plots: Visualize token counts vs. response times to compare model performance.
- Detailed Request Views: Inspect the full request/response transcripts for any logged event.
- SQLite Backend: All request data is stored in a local SQLite database for persistence.
API Compatibility & Content Processing
- OpenAI & Ollama Support: Acts as a drop-in replacement for both OpenAI and Ollama APIs.
- Model Mapping: Remap model names using a simple JSON object for legacy or alternative model support.
- Streaming Support: Full support for streaming responses for both API formats.
- Content Manipulation:
- Think-Chain Stripping: Automatically remove
<think>...</think>blocks from responses. - JSON Markdown Scrubbing: Convert markdown-fenced JSON into pure JSON.
- Think-Chain Stripping: Automatically remove
Configuration
Environment Variables
| Variable | Default | Description |
|---|---|---|
DEFAULT_UPSTREAM |
http://localhost:8000 |
The default upstream server to use when no routing rules match. |
ROUTES_CONFIG |
routes.yaml |
Path to the YAML/JSON file containing smart routing rules. |
MODEL_MAP |
{} |
A JSON string for simple, legacy model name remapping. |
STRIP_THINKING |
true |
If true, removes <think>...</think> blocks from responses. |
STRIP_JSON_MARKDOWN |
false |
If true, converts markdown-fenced JSON blocks to pure JSON. |
DISABLE_THINKING |
false |
If true, appends a /no_think marker to prompts to disable thinking. |
ENABLE_LOGGING |
true |
If true, enables request logging and the web UI. |
REQUEST_TIMEOUT |
3000.0 |
Timeout in seconds for upstream requests. |
DB_PATH |
requests.db |
Path to the SQLite database file. |
MAX_LOG_AGE_DAYS |
7 |
Automatically delete logs older than this many days. |
LISTEN_HOST |
127.0.0.1 |
The host address for the application to bind to. |
LISTEN_PORT |
1234 |
The port for the application to listen on. |
Smart Routing (routes.yaml)
Create a routes.yaml file to define your routing logic. The first rule that matches a request is used.
routes:
# Route requests for small models to a specific GPU server using regex
- match:
model: "/.*-1.5b/"
route:
upstream: "http://gpu-server:8000"
# Route requests from a specific developer's machine to a dev server
- match:
source_host: "10.0.1.100"
route:
upstream: "http://dev-server:8000"
# Route requests for "gpt-4" and override the model name to "claude-3-opus"
- match:
model: "gpt-4"
route:
upstream: "http://claude-server:8000"
model: "claude-3-opus"
Web UI & Monitoring
The web UI provides insights into your model usage and performance.
- Dashboard (
/): View the latest request logs and general statistics. - Performance (
/performance): Analyze model performance with an interactive scatter plot. - Request Detail (
/request/{id}): See the full transcript of a specific request.
Development
Running Tests
To run the test suite, use pytest:
pip install -r requirements.txt
pytest
Contributing
This project is open source. Please feel free to submit issues and pull requests.
License
This project is licensed under the MIT License. See the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file smolrouter-0.1.1.tar.gz.
File metadata
- Download URL: smolrouter-0.1.1.tar.gz
- Upload date:
- Size: 16.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9c95e3e9ae88b80cba61c1176acd1c4875327a91d07225834b6ed49231318c7a
|
|
| MD5 |
1287130ce9f7631ec17e2a993c5000fe
|
|
| BLAKE2b-256 |
db2e1c5e1dfb8ebd01b97e42d451ba15f8e1cdab210d9a3353cc6215e47526b4
|
File details
Details for the file smolrouter-0.1.1-py3-none-any.whl.
File metadata
- Download URL: smolrouter-0.1.1-py3-none-any.whl
- Upload date:
- Size: 17.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
31ab3b220d7146515fd31ccebe8ce036a2b54b3921efdac067910f6cc9022a13
|
|
| MD5 |
3e7fe5f41ee2b7bba947290cdd983031
|
|
| BLAKE2b-256 |
d48d883fc592730d40cd155ea89fa7368453490cbf48a9acf5b374b6a61858d7
|