Intelligent LLM router with ML-based query classification. Route prompts to the right model automatically.
Project description
๐ง VirtuSoul Router
Intelligent LLM router with ML-based query classification
Route prompts to the right model automatically. Save money on simple queries, use powerful models only when needed.
Quick Start โข How It Works โข Configuration โข Providers โข API Reference
What is VirtuSoul Router?
VirtuSoul Router is an open-source, self-hosted LLM proxy that automatically routes your prompts to the right model based on query complexity.
- "What is 2+2?" โ routes to a free/cheap model (Llama 3.2, Phi-3)
- "Design a microservices architecture" โ routes to a powerful model (Claude 3.5 Sonnet, GPT-4)
- "Prove the halting problem is undecidable" โ routes to a reasoning model (O1, Claude 3 Opus)
It's fully OpenAI-compatible. Just change your base_url and you're done. Works with any OpenAI SDK (Python, TypeScript, Go, etc.).
No LLM calls for classification โ uses a local ML model (MiniLM + Logistic Regression) that classifies in ~15ms on CPU.
Features
- ๐ง ML-powered smart routing โ local classifier, no API calls, ~15ms latency
- ๐ OpenAI-compatible API โ drop-in replacement, works with any SDK
- ๐ Multi-provider support โ OpenAI, Anthropic, OpenRouter, Groq, Together, Ollama, Mistral, DeepSeek, Google
- โก Streaming support โ full SSE streaming, just like OpenAI
- ๐ฏ 4 complexity tiers โ simple, medium, complex, reasoning
- ๐ฆ Single process โ no database, no Redis, just
pip installand go - ๐ณ Docker ready โ pre-built image with model weights included
- ๐ Retrainable โ add your own training data to improve accuracy
- ๐ Optional auth โ protect your router with a Bearer token
Quick Start
Install
pip install virtusoul-router
Configure
# Create your config
cp .env.example .env
# Edit .env โ set your API keys and model choices
Minimal .env (just OpenAI):
MODEL_NAME=virtusoul-v1
SIMPLE_PROVIDER=openai
SIMPLE_MODEL=gpt-4o-mini
SIMPLE_API_KEY=sk-your-key
MEDIUM_PROVIDER=openai
MEDIUM_MODEL=gpt-4o-mini
MEDIUM_API_KEY=sk-your-key
COMPLEX_PROVIDER=openai
COMPLEX_MODEL=gpt-4o
COMPLEX_API_KEY=sk-your-key
REASONING_PROVIDER=openai
REASONING_MODEL=o1-preview
REASONING_API_KEY=sk-your-key
Run
virtusoul-router
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ VirtuSoul Router v0.1.0 โ
โ Intelligent LLM Routing โ Open Source โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Model name: virtusoul-v1
Endpoint: http://0.0.0.0:4000/v1/chat/completions
Tiers:
simple โ openai/gpt-4o-mini
medium โ openai/gpt-4o-mini
complex โ openai/gpt-4o
reasoning โ openai/o1-preview
Use
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:4000/v1",
api_key="not-needed", # or your API_KEY if you set one
)
response = client.chat.completions.create(
model="virtusoul-v1",
messages=[{"role": "user", "content": "What is the capital of France?"}],
)
# โ Classified as "simple" โ routed to gpt-4o-mini
print(response.choices[0].message.content)
Works with any language:
// TypeScript
import OpenAI from "openai";
const client = new OpenAI({ baseURL: "http://localhost:4000/v1", apiKey: "not-needed" });
const res = await client.chat.completions.create({
model: "virtusoul-v1",
messages: [{ role: "user", content: "Design a REST API for a todo app" }],
});
// โ Classified as "medium" โ routed to gpt-4o-mini
# cURL
curl http://localhost:4000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model": "virtusoul-v1", "messages": [{"role": "user", "content": "Hello!"}]}'
How It Works
Your App (any OpenAI SDK)
โ
โผ
POST /v1/chat/completions {"model": "virtusoul-v1", "messages": [...]}
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ VirtuSoul Router โ
โ โ
โ 1. ML Classifier (~15ms, local) โ
โ MiniLM embedding โ Logistic Reg. โ
โ โ "This is a complex query" โ
โ โ
โ 2. Route to tier โ
โ complex โ anthropic/claude-3.5 โ
โ โ
โ 3. Forward request, return response โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
OpenAI-format response (same as if you called the model directly)
The classifier uses all-MiniLM-L6-v2 (Apache 2.0, ~80MB) to embed the query, then a Logistic Regression model trained on 200+ curated examples to predict the tier. No LLM calls, no external APIs โ runs entirely on CPU.
Tier Definitions
| Tier | When It's Used | Example Queries |
|---|---|---|
| simple | Greetings, factual lookups, yes/no, basic math | "What is 2+2?", "Hello", "Capital of France?" |
| medium | Explanations, summaries, comparisons, simple code | "Explain DNS", "Write a Python function", "Compare React vs Vue" |
| complex | Architecture, system design, refactoring, multi-step | "Design a microservices architecture", "Create a CI/CD pipeline" |
| reasoning | Proofs, formal logic, optimization, novel algorithms | "Prove sqrt(2) is irrational", "Design a consensus algorithm" |
Direct Tier Selection
Skip the classifier and pick a tier directly:
# Force complex tier
response = client.chat.completions.create(
model="complex", # or "simple", "medium", "reasoning"
messages=[{"role": "user", "content": "..."}],
)
Configuration
All configuration is via environment variables (.env file).
Server Settings
| Variable | Default | Description |
|---|---|---|
HOST |
0.0.0.0 |
Server bind address |
PORT |
4000 |
Server port |
MODEL_NAME |
virtusoul-v1 |
The model name your app sends |
API_KEY |
(none) | Optional Bearer token to protect the router |
LOG_LEVEL |
INFO |
Logging level (DEBUG, INFO, WARNING, ERROR) |
TIMEOUT |
120 |
Request timeout in seconds |
Tier Settings
Each tier has 4 variables: {TIER}_PROVIDER, {TIER}_MODEL, {TIER}_API_KEY, {TIER}_BASE_URL.
| Variable | Required | Description |
|---|---|---|
SIMPLE_PROVIDER |
Yes | Provider name (see Providers) |
SIMPLE_MODEL |
Yes | Model identifier |
SIMPLE_API_KEY |
Yes* | API key (*not needed for Ollama) |
SIMPLE_BASE_URL |
No | Custom base URL (overrides default) |
Same pattern for MEDIUM_*, COMPLEX_*, REASONING_*.
Unconfigured tiers fall back to the medium tier.
Providers
VirtuSoul Router supports these providers out of the box:
| Provider | Value | Default Base URL | Auth | Notes |
|---|---|---|---|---|
| OpenAI | openai |
api.openai.com | Bearer | Standard |
| Anthropic | anthropic |
api.anthropic.com | x-api-key | Auto-converted to/from OpenAI format |
| OpenRouter | openrouter |
openrouter.ai/api | Bearer | Access 200+ models |
| Groq | groq |
api.groq.com | Bearer | Ultra-fast inference |
| Together | together |
api.together.xyz | Bearer | Open-source models |
| Ollama | ollama |
localhost:11434 | None | Local models, no API key needed |
| Mistral | mistral |
api.mistral.ai | Bearer | Mistral models |
| DeepSeek | deepseek |
api.deepseek.com | Bearer | DeepSeek models |
google |
generativelanguage.googleapis.com | API key | Gemini models (OpenAI compat mode) |
Custom Provider
Any OpenAI-compatible API works. Set provider=custom and provide a BASE_URL:
MEDIUM_PROVIDER=custom
MEDIUM_MODEL=my-model
MEDIUM_API_KEY=my-key
MEDIUM_BASE_URL=https://my-custom-api.com/v1
Example: All Free with Ollama (Local)
SIMPLE_PROVIDER=ollama
SIMPLE_MODEL=llama3.2:3b
MEDIUM_PROVIDER=ollama
MEDIUM_MODEL=llama3.1:8b
COMPLEX_PROVIDER=ollama
COMPLEX_MODEL=llama3.1:70b
REASONING_PROVIDER=ollama
REASONING_MODEL=deepseek-r1:32b
Example: Mix Providers for Best Value
SIMPLE_PROVIDER=openrouter
SIMPLE_MODEL=meta-llama/llama-3.2-3b-instruct:free
SIMPLE_API_KEY=sk-or-...
MEDIUM_PROVIDER=openrouter
MEDIUM_MODEL=openai/gpt-4.1-mini
MEDIUM_API_KEY=sk-or-...
COMPLEX_PROVIDER=anthropic
COMPLEX_MODEL=claude-sonnet-4-20250514
COMPLEX_API_KEY=sk-ant-...
REASONING_PROVIDER=openai
REASONING_MODEL=o4-mini
REASONING_API_KEY=sk-...
Docker
# Build
docker build -t virtusoul-router .
# Run
docker run -p 4000:4000 --env-file .env virtusoul-router
Or with Docker Compose:
# docker-compose.yml
services:
virtusoul-router:
build: .
ports:
- "4000:4000"
env_file:
- .env
restart: unless-stopped
API Reference
POST /v1/chat/completions
OpenAI-compatible chat completions with smart routing.
Request:
{
"model": "virtusoul-v1",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain how DNS works"}
],
"temperature": 0.7,
"max_tokens": 1000,
"stream": false
}
Model options:
virtusoul-v1(or your customMODEL_NAME) โ smart routing via ML classifiersimple,medium,complex,reasoningโ direct tier selection
Response: Standard OpenAI chat completion format, plus a _virtusoul field with routing metadata:
{
"id": "chatcmpl-abc123",
"choices": [{"message": {"role": "assistant", "content": "..."}}],
"usage": {"prompt_tokens": 25, "completion_tokens": 150, "total_tokens": 175},
"_virtusoul": {
"routed_to": "openai/gpt-4o-mini",
"tier": "medium",
"confidence": 0.92,
"latency_ms": 14.2
}
}
POST /classify
Classify a query without routing (for testing/debugging).
curl -X POST http://localhost:4000/classify \
-H "Content-Type: application/json" \
-d '{"text": "Design a microservices architecture"}'
{
"tier": "complex",
"confidence": 0.988,
"reasoning": "complex=0.99, medium=0.01, simple=0.00, reasoning=0.00",
"latency_ms": 12.3,
"flagged": false
}
POST /retrain
Retrain the classifier with built-in training data.
curl -X POST http://localhost:4000/retrain
GET /health
Health check.
curl http://localhost:4000/health
How the Classifier Works
The classifier uses a two-stage approach:
- Embedding: The user's query is converted to a 384-dimensional vector using all-MiniLM-L6-v2 (~80MB, Apache 2.0 license)
- Classification: A Logistic Regression model (scikit-learn) predicts the tier from the embedding
The model is pre-trained on 200+ curated examples and achieves ~81% accuracy on cross-validation. It runs entirely on CPU in ~10-20ms.
Retraining
You can retrain the classifier by calling POST /retrain. To add custom training data, you can extend the training_data.py file with your own examples.
Low Confidence Handling
When the classifier's confidence is below 0.60, the response includes "flagged": true. This means the classification is uncertain and you may want to review it.
Test Results
Tested end-to-end on February 16, 2026 with OpenRouter as the provider. All 4 tiers, streaming, direct tier selection, and error handling verified.
Unit Tests
11 passed in 7.98s
โ test_simple_greeting
โ test_simple_factual
โ test_medium_explanation
โ test_complex_architecture
โ test_reasoning_proof
โ test_empty_input
โ test_confidence_range
โ test_reasoning_field
โ test_default_values
โ test_tier_loading
โ test_tier_config_is_configured
Live End-to-End Tests
| Test | Query | Classified As | Confidence | Model Used | Result |
|---|---|---|---|---|---|
| Smart โ Simple | "Hello! How are you?" | simple | 0.954 | gpt-4.1-nano | โ Correct response |
| Smart โ Medium | "Explain how DNS works" | medium | 0.857 | gpt-4.1-mini | โ Correct response |
| Smart โ Complex | "Design a microservices architecture" | complex | 0.971 | claude-sonnet-4 | โ Correct response |
| Smart โ Reasoning | "Prove sqrt(2) is irrational" | reasoning | 0.665 | o4-mini | โ Correct proof |
| Direct Tier | "model": "complex" |
โ | โ | claude-sonnet-4 | โ Bypassed classifier |
| Streaming | "Count from 1 to 5" | simple | 0.95 | gpt-4.1-nano | โ SSE chunks received |
| Invalid Model | "model": "invalid" |
โ | โ | โ | โ 400 error with helpful message |
| Health Check | GET /health |
โ | โ | โ | โ Returns status + tiers |
| Retrain | POST /retrain |
โ | โ | โ | โ 232 samples, 0.698 CV accuracy |
Classifier Latency
| Metric | Value |
|---|---|
| First request (cold start, model loading) | ~3.3s |
| Subsequent requests | 20-32ms |
| Embedding model size | ~80MB |
License
MIT License โ use it however you want, commercially or otherwise.
Dependency Licenses
All dependencies use permissive licenses:
| Component | License |
|---|---|
| sentence-transformers | Apache 2.0 |
| all-MiniLM-L6-v2 (model) | Apache 2.0 |
| scikit-learn | BSD 3-Clause |
| FastAPI | MIT |
| uvicorn | BSD 3-Clause |
| httpx | BSD 3-Clause |
| numpy | BSD 3-Clause |
| pydantic | MIT |
No GPL, no copyleft, no viral licenses. Safe for commercial use.
Contributing
Contributions are welcome! Here's how:
- Fork the repo
- Create a feature branch (
git checkout -b feature/my-feature) - Make your changes
- Run tests (
pytest) - Submit a PR
Ideas for Contributions
- More training data for better classification accuracy
- New provider adapters
- Web dashboard for monitoring
- Custom tier definitions (beyond the 4 defaults)
- Batch API support
- Function calling / tool use passthrough
Acknowledgments
Built with โค๏ธ by the VirtuSoul team. Inspired by the need for smarter, cost-effective LLM routing.
If VirtuSoul Router saves you money on your LLM bills, give us a โญ
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file virtusoul_router-0.1.0.tar.gz.
File metadata
- Download URL: virtusoul_router-0.1.0.tar.gz
- Upload date:
- Size: 27.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: curl/8.5.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
87d8ff1e6df594beb15a831d4216d2c513591cddf4f696fea2d25ffec4c2d79b
|
|
| MD5 |
0fc91b34b21e570e03262f6c2e9e5519
|
|
| BLAKE2b-256 |
7bae51ab810c88a0444275745accc66610f0bd11b59ae35acea9aa29c3c38353
|
File details
Details for the file virtusoul_router-0.1.0-py3-none-any.whl.
File metadata
- Download URL: virtusoul_router-0.1.0-py3-none-any.whl
- Upload date:
- Size: 23.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
03536fe66023c431f862e0739e4fd584d0ab6dc86ceca742182cf8ce0b8e7409
|
|
| MD5 |
44d3637c3f4177a4e0592d2e1b782eb9
|
|
| BLAKE2b-256 |
fc883882660eeb7f5a397b8389b703f0a9b97c73ab04c2b5f2e0002468e43635
|