Intelligent LLM router with ML-based query classification. Route prompts to the right model automatically.

These details have not been verified by PyPI

Project links

Project description

🧠 VirtuSoul Router

Intelligent LLM router with ML-based query classification

Route prompts to the right model automatically. Save money on simple queries, use powerful models only when needed.

Quick Start • How It Works • Configuration • Providers • API Reference

What is VirtuSoul Router?

VirtuSoul Router is an open-source, self-hosted LLM proxy that automatically routes your prompts to the right model based on query complexity.

"What is 2+2?" → routes to a free/cheap model (Llama 3.2, Phi-3)
"Design a microservices architecture" → routes to a powerful model (Claude 3.5 Sonnet, GPT-4)
"Prove the halting problem is undecidable" → routes to a reasoning model (O1, Claude 3 Opus)

It's fully OpenAI-compatible. Just change your base_url and you're done. Works with any OpenAI SDK (Python, TypeScript, Go, etc.).

No LLM calls for classification — uses a local ML model (MiniLM + Logistic Regression) that classifies in ~15ms on CPU.

Features

🧠 ML-powered smart routing — local classifier, no API calls, ~15ms latency
🔌 OpenAI-compatible API — drop-in replacement, works with any SDK
🌐 Multi-provider support — OpenAI, Anthropic, OpenRouter, Groq, Together, Ollama, Mistral, DeepSeek, Google
⚡ Streaming support — full SSE streaming, just like OpenAI
🎯 4 complexity tiers — simple, medium, complex, reasoning
📦 Single process — no database, no Redis, just pip install and go
🐳 Docker ready — pre-built image with model weights included
🔄 Retrainable — add your own training data to improve accuracy
🔑 Optional auth — protect your router with a Bearer token

Quick Start

Install

pip install virtusoul-router

Configure

# Create your config
cp .env.example .env

# Edit .env — set your API keys and model choices

Minimal .env (just OpenAI):

MODEL_NAME=virtusoul-v1

SIMPLE_PROVIDER=openai
SIMPLE_MODEL=gpt-4o-mini
SIMPLE_API_KEY=sk-your-key

MEDIUM_PROVIDER=openai
MEDIUM_MODEL=gpt-4o-mini
MEDIUM_API_KEY=sk-your-key

COMPLEX_PROVIDER=openai
COMPLEX_MODEL=gpt-4o
COMPLEX_API_KEY=sk-your-key

REASONING_PROVIDER=openai
REASONING_MODEL=o1-preview
REASONING_API_KEY=sk-your-key

Run

virtusoul-router

╔══════════════════════════════════════════════╗
║          VirtuSoul Router v0.1.0           ║
║   Intelligent LLM Routing — Open Source      ║
╚══════════════════════════════════════════════╝

  Model name:  virtusoul-v1
  Endpoint:    http://0.0.0.0:4000/v1/chat/completions
  Tiers:
    simple       → openai/gpt-4o-mini
    medium       → openai/gpt-4o-mini
    complex      → openai/gpt-4o
    reasoning    → openai/o1-preview

Use

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:4000/v1",
    api_key="not-needed",  # or your API_KEY if you set one
)

response = client.chat.completions.create(
    model="virtusoul-v1",
    messages=[{"role": "user", "content": "What is the capital of France?"}],
)
# → Classified as "simple" → routed to gpt-4o-mini
print(response.choices[0].message.content)

Works with any language:

// TypeScript
import OpenAI from "openai";
const client = new OpenAI({ baseURL: "http://localhost:4000/v1", apiKey: "not-needed" });
const res = await client.chat.completions.create({
  model: "virtusoul-v1",
  messages: [{ role: "user", content: "Design a REST API for a todo app" }],
});
// → Classified as "medium" → routed to gpt-4o-mini

# cURL
curl http://localhost:4000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "virtusoul-v1", "messages": [{"role": "user", "content": "Hello!"}]}'

How It Works

Your App (any OpenAI SDK)
    │
    ▼
POST /v1/chat/completions  {"model": "virtusoul-v1", "messages": [...]}
    │
    ▼
┌─────────────────────────────────────────┐
│         VirtuSoul Router                │
│                                         │
│  1. ML Classifier (~15ms, local)        │
│     MiniLM embedding → Logistic Reg.    │
│     → "This is a complex query"         │
│                                         │
│  2. Route to tier                       │
│     complex → anthropic/claude-3.5      │
│                                         │
│  3. Forward request, return response    │
└─────────────────────────────────────────┘
    │
    ▼
OpenAI-format response (same as if you called the model directly)

The classifier uses all-MiniLM-L6-v2 (Apache 2.0, ~80MB) to embed the query, then a Logistic Regression model trained on 200+ curated examples to predict the tier. No LLM calls, no external APIs — runs entirely on CPU.

Tier Definitions

Tier	When It's Used	Example Queries
simple	Greetings, factual lookups, yes/no, basic math	"What is 2+2?", "Hello", "Capital of France?"
medium	Explanations, summaries, comparisons, simple code	"Explain DNS", "Write a Python function", "Compare React vs Vue"
complex	Architecture, system design, refactoring, multi-step	"Design a microservices architecture", "Create a CI/CD pipeline"
reasoning	Proofs, formal logic, optimization, novel algorithms	"Prove sqrt(2) is irrational", "Design a consensus algorithm"

Direct Tier Selection

Skip the classifier and pick a tier directly:

# Force complex tier
response = client.chat.completions.create(
    model="complex",  # or "simple", "medium", "reasoning"
    messages=[{"role": "user", "content": "..."}],
)

Configuration

All configuration is via environment variables (.env file).

Server Settings

Variable	Default	Description
`HOST`	`0.0.0.0`	Server bind address
`PORT`	`4000`	Server port
`MODEL_NAME`	`virtusoul-v1`	The model name your app sends
`API_KEY`	(none)	Optional Bearer token to protect the router
`LOG_LEVEL`	`INFO`	Logging level (DEBUG, INFO, WARNING, ERROR)
`TIMEOUT`	`120`	Request timeout in seconds

Tier Settings

Each tier has 4 variables: {TIER}_PROVIDER, {TIER}_MODEL, {TIER}_API_KEY, {TIER}_BASE_URL.

Variable	Required	Description
`SIMPLE_PROVIDER`	Yes	Provider name (see Providers)
`SIMPLE_MODEL`	Yes	Model identifier
`SIMPLE_API_KEY`	Yes*	API key (*not needed for Ollama)
`SIMPLE_BASE_URL`	No	Custom base URL (overrides default)

Same pattern for MEDIUM_*, COMPLEX_*, REASONING_*.

Unconfigured tiers fall back to the medium tier.

Providers

VirtuSoul Router supports these providers out of the box:

Provider	Value	Default Base URL	Auth	Notes
OpenAI	`openai`	api.openai.com	Bearer	Standard
Anthropic	`anthropic`	api.anthropic.com	x-api-key	Auto-converted to/from OpenAI format
OpenRouter	`openrouter`	openrouter.ai/api	Bearer	Access 200+ models
Groq	`groq`	api.groq.com	Bearer	Ultra-fast inference
Together	`together`	api.together.xyz	Bearer	Open-source models
Ollama	`ollama`	localhost:11434	None	Local models, no API key needed
Mistral	`mistral`	api.mistral.ai	Bearer	Mistral models
DeepSeek	`deepseek`	api.deepseek.com	Bearer	DeepSeek models
Google	`google`	generativelanguage.googleapis.com	API key	Gemini models (OpenAI compat mode)

Custom Provider

Any OpenAI-compatible API works. Set provider=custom and provide a BASE_URL:

MEDIUM_PROVIDER=custom
MEDIUM_MODEL=my-model
MEDIUM_API_KEY=my-key
MEDIUM_BASE_URL=https://my-custom-api.com/v1

Example: All Free with Ollama (Local)

SIMPLE_PROVIDER=ollama
SIMPLE_MODEL=llama3.2:3b

MEDIUM_PROVIDER=ollama
MEDIUM_MODEL=llama3.1:8b

COMPLEX_PROVIDER=ollama
COMPLEX_MODEL=llama3.1:70b

REASONING_PROVIDER=ollama
REASONING_MODEL=deepseek-r1:32b

Example: Mix Providers for Best Value

SIMPLE_PROVIDER=openrouter
SIMPLE_MODEL=meta-llama/llama-3.2-3b-instruct:free
SIMPLE_API_KEY=sk-or-...

MEDIUM_PROVIDER=openrouter
MEDIUM_MODEL=openai/gpt-4.1-mini
MEDIUM_API_KEY=sk-or-...

COMPLEX_PROVIDER=anthropic
COMPLEX_MODEL=claude-sonnet-4-20250514
COMPLEX_API_KEY=sk-ant-...

REASONING_PROVIDER=openai
REASONING_MODEL=o4-mini
REASONING_API_KEY=sk-...

Docker

# Build
docker build -t virtusoul-router .

# Run
docker run -p 4000:4000 --env-file .env virtusoul-router

Or with Docker Compose:

# docker-compose.yml
services:
  virtusoul-router:
    build: .
    ports:
      - "4000:4000"
    env_file:
      - .env
    restart: unless-stopped

API Reference

`POST /v1/chat/completions`

OpenAI-compatible chat completions with smart routing.

Request:

{
  "model": "virtusoul-v1",
  "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Explain how DNS works"}
  ],
  "temperature": 0.7,
  "max_tokens": 1000,
  "stream": false
}

Model options:

virtusoul-v1 (or your custom MODEL_NAME) — smart routing via ML classifier
simple, medium, complex, reasoning — direct tier selection

Response: Standard OpenAI chat completion format, plus a _virtusoul field with routing metadata:

{
  "id": "chatcmpl-abc123",
  "choices": [{"message": {"role": "assistant", "content": "..."}}],
  "usage": {"prompt_tokens": 25, "completion_tokens": 150, "total_tokens": 175},
  "_virtusoul": {
    "routed_to": "openai/gpt-4o-mini",
    "tier": "medium",
    "confidence": 0.92,
    "latency_ms": 14.2
  }
}

`POST /classify`

Classify a query without routing (for testing/debugging).

curl -X POST http://localhost:4000/classify \
  -H "Content-Type: application/json" \
  -d '{"text": "Design a microservices architecture"}'

{
  "tier": "complex",
  "confidence": 0.988,
  "reasoning": "complex=0.99, medium=0.01, simple=0.00, reasoning=0.00",
  "latency_ms": 12.3,
  "flagged": false
}

`POST /retrain`

Retrain the classifier with built-in training data.

curl -X POST http://localhost:4000/retrain

`GET /health`

Health check.

curl http://localhost:4000/health

How the Classifier Works

The classifier uses a two-stage approach:

Embedding: The user's query is converted to a 384-dimensional vector using all-MiniLM-L6-v2 (~80MB, Apache 2.0 license)
Classification: A Logistic Regression model (scikit-learn) predicts the tier from the embedding

The model is pre-trained on 200+ curated examples and achieves ~81% accuracy on cross-validation. It runs entirely on CPU in ~10-20ms.

Retraining

You can retrain the classifier by calling POST /retrain. To add custom training data, you can extend the training_data.py file with your own examples.

Low Confidence Handling

When the classifier's confidence is below 0.60, the response includes "flagged": true. This means the classification is uncertain and you may want to review it.

Test Results

Tested end-to-end on February 16, 2026 with OpenRouter as the provider. All 4 tiers, streaming, direct tier selection, and error handling verified.

Unit Tests

11 passed in 7.98s
  ✓ test_simple_greeting
  ✓ test_simple_factual
  ✓ test_medium_explanation
  ✓ test_complex_architecture
  ✓ test_reasoning_proof
  ✓ test_empty_input
  ✓ test_confidence_range
  ✓ test_reasoning_field
  ✓ test_default_values
  ✓ test_tier_loading
  ✓ test_tier_config_is_configured

Live End-to-End Tests

Test	Query	Classified As	Confidence	Model Used	Result
Smart → Simple	"Hello! How are you?"	simple	0.954	gpt-4.1-nano	✅ Correct response
Smart → Medium	"Explain how DNS works"	medium	0.857	gpt-4.1-mini	✅ Correct response
Smart → Complex	"Design a microservices architecture"	complex	0.971	claude-sonnet-4	✅ Correct response
Smart → Reasoning	"Prove sqrt(2) is irrational"	reasoning	0.665	o4-mini	✅ Correct proof
Direct Tier	`"model": "complex"`	—	—	claude-sonnet-4	✅ Bypassed classifier
Streaming	"Count from 1 to 5"	simple	0.95	gpt-4.1-nano	✅ SSE chunks received
Invalid Model	`"model": "invalid"`	—	—	—	✅ 400 error with helpful message
Health Check	`GET /health`	—	—	—	✅ Returns status + tiers
Retrain	`POST /retrain`	—	—	—	✅ 232 samples, 0.698 CV accuracy

Classifier Latency

Metric	Value
First request (cold start, model loading)	~3.3s
Subsequent requests	20-32ms
Embedding model size	~80MB

License

MIT License — use it however you want, commercially or otherwise.

Dependency Licenses

All dependencies use permissive licenses:

Component	License
sentence-transformers	Apache 2.0
all-MiniLM-L6-v2 (model)	Apache 2.0
scikit-learn	BSD 3-Clause
FastAPI	MIT
uvicorn	BSD 3-Clause
httpx	BSD 3-Clause
numpy	BSD 3-Clause
pydantic	MIT

No GPL, no copyleft, no viral licenses. Safe for commercial use.

Contributing

Contributions are welcome! Here's how:

Fork the repo
Create a feature branch (git checkout -b feature/my-feature)
Make your changes
Run tests (pytest)
Submit a PR

Ideas for Contributions

More training data for better classification accuracy
New provider adapters
Web dashboard for monitoring
Custom tier definitions (beyond the 4 defaults)
Batch API support
Function calling / tool use passthrough

Acknowledgments

Built with ❤️ by the VirtuSoul team. Inspired by the need for smarter, cost-effective LLM routing.

_{If VirtuSoul Router saves you money on your LLM bills, give us a ⭐}

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.1.2

Feb 16, 2026

0.1.1

Feb 16, 2026

This version

0.1.0

Feb 16, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

virtusoul_router-0.1.0.tar.gz (27.7 kB view details)

Uploaded Feb 16, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

virtusoul_router-0.1.0-py3-none-any.whl (23.4 kB view details)

Uploaded Feb 16, 2026 Python 3

File details

Details for the file virtusoul_router-0.1.0.tar.gz.

File metadata

Download URL: virtusoul_router-0.1.0.tar.gz
Upload date: Feb 16, 2026
Size: 27.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: curl/8.5.0

File hashes

Hashes for virtusoul_router-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`87d8ff1e6df594beb15a831d4216d2c513591cddf4f696fea2d25ffec4c2d79b`
MD5	`0fc91b34b21e570e03262f6c2e9e5519`
BLAKE2b-256	`7bae51ab810c88a0444275745accc66610f0bd11b59ae35acea9aa29c3c38353`

See more details on using hashes here.

File details

Details for the file virtusoul_router-0.1.0-py3-none-any.whl.

File metadata

Download URL: virtusoul_router-0.1.0-py3-none-any.whl
Upload date: Feb 16, 2026
Size: 23.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for virtusoul_router-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`03536fe66023c431f862e0739e4fd584d0ab6dc86ceca742182cf8ce0b8e7409`
MD5	`44d3637c3f4177a4e0592d2e1b782eb9`
BLAKE2b-256	`fc883882660eeb7f5a397b8389b703f0a9b97c73ab04c2b5f2e0002468e43635`

See more details on using hashes here.

virtusoul-router 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

🧠 VirtuSoul Router

What is VirtuSoul Router?

Features

Quick Start

Install

Configure

Run

Use

How It Works

Tier Definitions

Direct Tier Selection

Configuration

Server Settings

Tier Settings

Providers

Custom Provider

Example: All Free with Ollama (Local)

Example: Mix Providers for Best Value

Docker

API Reference

POST /v1/chat/completions

POST /classify

POST /retrain

GET /health

How the Classifier Works

Retraining

Low Confidence Handling

Test Results

Unit Tests

Live End-to-End Tests

Classifier Latency

License

Dependency Licenses

Contributing

Ideas for Contributions

Acknowledgments

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`POST /v1/chat/completions`

`POST /classify`

`POST /retrain`

`GET /health`