Local-first OpenAI-compatible runtime for hybrid LLM routing, verification-aware escalation, and privacy-aware local execution.
Project description
RouteLabs Router
RouteLabs Router is a local-first runtime that sits between your app and local/cloud LLMs.
It is designed to feel like a practical gateway, not just a routing idea:
- one OpenAI-compatible endpoint
- local-first execution with cloud fallback
- verification-aware escalation
- privacy-aware local preference
- startup checks, model visibility, and request-level performance traces
It gives applications one endpoint that can decide:
- when to stay local
- when to use the cloud
- when privacy should override convenience
- which provider and model should handle the request
- why that decision was made
- when verification forced an escalation
- when privacy detection forced local execution
The goal is simple: route each step to the cheapest, fastest, safest model that can still be trusted.
Who This Is For
This repo is mainly for:
- AI app builders
- local-first power users
- agent and workflow developers
- teams experimenting with privacy-aware and cost-aware inference
If you want a polished end-user chat app, this is not that. If you want a runtime and routing layer you can plug into your own tools, this is exactly that.
What This Is
Think of RouteLabs as:
- a local runtime/server you run on your machine
- a Python client you can call from your app
- an OpenAI-compatible endpoint you can place in front of existing clients
It is not primarily:
- a browser extension
- a desktop UI
- a plugin marketplace product
Those may come later, but the current product is a runtime + middleware + API.
60-Second Quickstart
Install from PyPI, start the runtime, and send one request:
pip install routelabs-router
export OPENAI_API_KEY=your_api_key_here # optional, enables cloud execution
router start --reload
Then in another terminal:
curl -X POST http://127.0.0.1:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"messages":[{"role":"user","content":"Summarize this in one sentence: RouteLabs Router chooses between local and cloud models based on privacy, cost, latency, and task complexity."}],
"private":false
}'
Install
Recommended user install
pip install routelabs-router
router start
Contributor install
Clone the repo and install from source:
git clone https://github.com/routelabsai/router.git
cd router
conda create -n routelabs-router python=3.11 -y
conda activate routelabs-router
python -m pip install --upgrade pip setuptools wheel
pip install -e '.[dev]'
router start --reload
Why Use This
Most teams today have one of these problems:
Ollamaruns local models well, but it does not decide when a task should stay local versus escalate- cloud gateways like
LiteLLMandOpenRouterroute across hosted APIs, but they are not built around local-first policy decisions - chat apps can call models, but they usually hide the execution logic instead of exposing it
RouteLabs Router is the layer above those tools.
It is for teams who want:
- one API for hybrid local + cloud inference
- OpenAI-compatible model discovery for existing SDKs and UIs
- live
Ollamamodel discovery - embeddings support for retrieval and RAG-style workflows
- tool-calling support for agent workflows
- OpenAI-style streaming responses for chat completions
- structured output and common OpenAI request-field passthrough
- verification-aware escalation instead of naive “hard task -> expensive model”
- transparent routing decisions
- privacy-aware defaults
- automatic local preference for obvious sensitive or code-like content
- automatic local-to-cloud fallback when a provider is unavailable
- cost and latency visibility
- token-speed visibility for chat requests
- provider and model selection that can evolve over time
- a foundation for agentic step-level routing later
How You Use It
There are three practical ways to adopt RouteLabs today.
1. As a local runtime/server
Run:
router start --reload
Then point your tools to http://127.0.0.1:8000.
2. As a Python library client
Use the built-in client:
from routelabs_router import RouteLabsClient
client = RouteLabsClient("http://127.0.0.1:8000")
print(client.route("Summarize a short product description"))
3. As an OpenAI-compatible endpoint
If you already have code using an OpenAI-style client, point it at RouteLabs via base_url.
Use model="route-auto" when you want RouteLabs to choose the concrete backend model for each request.
That is one of the easiest ways to adopt it without rewriting your app.
What It Looks Like
app / agent / extension
|
v
RouteLabs Router
|
+--> policy + task complexity
+--> privacy constraints
+--> provider selection
+--> verification hooks
|
+--> Ollama
+--> llama.cpp
+--> cloud provider
Quick Demo
Once the server is running, you can inspect decisions directly:
curl -X POST http://127.0.0.1:8000/v1/route \
-H "Content-Type: application/json" \
-d '{"task":"summarize a short product description","private":false}'
Expected shape:
{
"target": "local",
"provider": "ollama",
"model": "qwen3:4b",
"reason": "task is suitable for local-first execution",
"complexity": "medium",
"verify": true,
"provider_available": true,
"provider_status": "ready",
"fallback_available": false,
"fallback_status": "not_configured"
}
What this tells you:
- the router chose
local - it selected
ollama - it picked a model
- it marked the request as worth verification
- it reports whether the planned provider is actually reachable right now
- it reports whether cloud fallback is available if the local route fails
/v1/route is a planning endpoint, not an execution endpoint. It tells you what RouteLabs would try first and whether that path currently looks available.
And you can send an OpenAI-style chat request:
curl -X POST http://127.0.0.1:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"messages":[{"role":"user","content":"Summarize this in one sentence: RouteLabs Router chooses between local and cloud models based on privacy, cost, latency, and task complexity."}],
"private":false
}'
If Ollama is running locally, that request executes against your configured local model.
If OPENAI_API_KEY is set, high-complexity requests can route through the configured OpenAI-compatible cloud provider.
The response includes a trace showing the initial route, verification result, and any escalation.
Positioning
| Tool | Core strength | What it does not solve |
|---|---|---|
Ollama |
Great local model runtime and API | Hybrid routing and policy decisions |
LiteLLM |
Cloud API normalization and routing | Local-first execution strategy |
OpenRouter |
Hosted provider access and fallback | On-device privacy-aware control plane |
RouteLabs Router |
Verification-aware local-first runtime with hybrid routing | Early-stage policy and provider coverage |
MVP scope
The first version focuses on a narrow but useful slice:
- OpenAI-compatible chat-style request handling
- local/cloud routing decisions
- adapter-based execution
- verification-aware fallback hooks
- structured telemetry showing why a route was chosen
This repository intentionally starts small. It is a control-plane foundation, not a full chat app.
Use Cases
- Local-first copilots that should only escalate when a task gets difficult
- Privacy-sensitive workflows where private data should never leave the device
- Browser or desktop assistants that need one middleware layer above multiple runtimes
- Agent systems that want future step-level routing instead of a single fixed model
Current Status
This is an early but usable product foundation. The repository already includes:
- project docs
- roadmap
- contribution guide
- Python project metadata
FastAPIserver and CLI- YAML config loading
- route inspection endpoint
- OpenAI-style
/v1/chat/completionsendpoint - OpenAI-style
/v1/embeddingsendpoint - OpenAI-compatible
/v1/modelsdiscovery endpoint - tool-call passthrough for OpenAI-style clients
- OpenAI-style SSE streaming on
/v1/chat/completions - structured-output passthrough and JSON-mode support
- real local execution through
Ollama - generic OpenAI-compatible cloud execution
- first verification-aware escalation loop
- automatic fallback from local provider failures to the cloud when policy allows it
- stats endpoint for local/cloud/escalation visibility
- runtime doctor and model inventory CLI commands
- simple estimated cost savings in stats
- latency and token-speed metrics in stats and logs
- heuristic privacy detection for email/identifier/code-like content
- recent route logs for per-request inspection
- test coverage for routing and API behavior
- example config profiles
- example curl flows
Still early:
- verifiers are heuristic and still early
- cost and latency dashboards are not implemented yet
- privacy detection is heuristic rather than model-based
- learning from user corrections is still future work
More Docs
- Architecture: docs/ARCHITECTURE.md
- Changelog: CHANGELOG.md
- Roadmap: ROADMAP.md
- Contributor guide: CONTRIBUTING.md
- Release guide: docs/release/README.md
- PyPI trusted publishing: docs/release/trusted-publishing.md
Setup And Usage
Prerequisites
- Python
3.11+ condarecommended for the smoothest setup on macOS
Install from PyPI
conda create -n routelabs-router python=3.11 -y
conda activate routelabs-router
python -m pip install --upgrade pip
pip install routelabs-router
Install from source
Use this path if you want to contribute or modify the router itself.
git clone https://github.com/routelabsai/router.git
cd router
conda create -n routelabs-router python=3.11 -y
conda activate routelabs-router
python -m pip install --upgrade pip setuptools wheel
pip install -e '.[dev]'
Configure cloud execution
If you want cloud-routed requests to execute instead of returning a configuration error, set:
export OPENAI_API_KEY=your_api_key_here
The default cloud adapter uses the OpenAI-compatible endpoint configured in config/router.yaml.
Why conda is the recommended path
During validation we hit two common issues that conda + Python 3.11 resolved cleanly:
- Python
3.9.7was too old for this project - older packaging tooling made editable installs unreliable
If you see requires a different Python: 3.9.7 not in '>=3.11', create the conda environment above and retry.
Run tests
pytest
Optional profile configs
The repo includes starter profiles in config/profiles/:
balanced.yamllocal-first.yamlopenclaw.yamlprivacy-first.yamlunsloth-local.yaml
Use one as your active config by copying or merging it into config/router.yaml.
Start the runtime
router start --reload
On startup, RouteLabs now prints a quick readiness summary so users can immediately see:
- whether the local provider is reachable
- whether cloud fallback is configured
- whether the runtime is
ok,degraded, orerror - what to do next if no provider path is available
Typical first-run warnings include:
- start
Ollamawithollama servefor local execution - set
OPENAI_API_KEYto enable cloud fallback and escalation
For explicit host or port overrides:
router start --host 0.0.0.0 --port 8000 --reload
Inspect a routing decision
router route --task "summarize a short product description" --private false
Run environment checks
router doctor
This shows:
- local and cloud provider readiness
- configured chat and embedding models
- installed
Ollamamodels when RouteLabs can detect them - missing configured local models
- the next setup action if something is unavailable
List visible models
router models
This shows:
- virtual models like
route-auto - configured local and cloud models
- installed
Ollamamodels discovered live - whether each model is
installed,configured, ornot_configured
Test the API
Health check:
curl http://127.0.0.1:8000/healthz
Expected shape:
{
"status": "ok",
"providers": {
"ollama": {
"available": true,
"status": "ready"
},
"openai-compatible": {
"available": false,
"status": "not_configured"
}
}
}
Health status semantics:
ok: the local-first path is availabledegraded: local is unavailable, but cloud execution is still possibleerror: neither local nor cloud execution is currently usable
Route inspection:
curl -X POST http://127.0.0.1:8000/v1/route \
-H "Content-Type: application/json" \
-d '{"task":"summarize a short product description","private":false}'
Stats endpoint:
curl http://127.0.0.1:8000/v1/stats
It includes:
- chat vs embeddings request counts
- average total latency
- average chat latency
- average embeddings latency
- average local vs cloud latency
- average completion token speed for chat requests
Recent route logs:
curl http://127.0.0.1:8000/v1/logs
Each log entry includes:
- request kind
- total request latency
- completion tokens per second when available
- per-attempt timing in the trace
Model discovery:
curl http://127.0.0.1:8000/v1/models
Ecosystem workflows:
- OpenClaw: examples/openclaw.md
- Unsloth: examples/unsloth.md
Embeddings:
curl -X POST http://127.0.0.1:8000/v1/embeddings \
-H "Content-Type: application/json" \
-d '{
"input":"RouteLabs Router chooses between local and cloud models based on privacy and task complexity.",
"private":false
}'
If local embeddings fail and cloud embeddings are not configured, RouteLabs now returns a clearer configuration error instead of a misleading “provider does not support embeddings” message.
Python client
You can also call the router from Python:
from routelabs_router import RouteLabsClient
client = RouteLabsClient("http://127.0.0.1:8000")
route = client.route("Summarize a short product description")
chat = client.chat(
[
{
"role": "user",
"content": "Summarize this in one sentence: RouteLabs Router chooses between local and cloud models based on privacy, cost, latency, and task complexity.",
}
]
)
embeddings = client.embeddings(
"RouteLabs Router chooses between local and cloud models based on privacy and task complexity."
)
stats = client.stats()
logs = client.logs()
There is also a runnable example in examples/python-client.py.
OpenAI-compatible drop-in example
If you already use the OpenAI Python SDK, you can point it at RouteLabs:
from openai import OpenAI
client = OpenAI(
base_url="http://127.0.0.1:8000/v1",
api_key="not-needed-for-local-dev",
)
response = client.chat.completions.create(
model="route-auto",
messages=[
{
"role": "user",
"content": "Summarize this in one sentence: RouteLabs Router chooses between local and cloud models based on privacy, cost, latency, and task complexity.",
}
],
)
See examples/openai-compatible-client.py.
You may need to install the OpenAI SDK separately:
pip install openai
For a multi-step tool-calling example, see:
The stats response includes simple estimated fields such as:
estimated_total_cost_usdestimated_baseline_cloud_cost_usdestimated_cost_saved_usdestimated_cloud_requests_avoided
OpenAI-style chat completion:
curl -X POST http://127.0.0.1:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"messages":[{"role":"user","content":"Summarize this in one sentence: RouteLabs Router chooses between local and cloud models based on privacy, cost, latency, and task complexity."}],
"private":false
}'
If Ollama is running locally, the chat endpoint will execute against your configured local model.
If OPENAI_API_KEY is set, high-complexity tasks can execute through the configured OpenAI-compatible cloud provider. If it is not set, cloud-routed chat requests return a clear configuration error.
If the local provider is unavailable and the request is not forced to stay private, RouteLabs can now fall back to the cloud automatically and record that decision in the trace.
The stats endpoint gives a simple first pass at the eventual cost/latency visibility story by showing how many requests stayed local, how many escalated, and how often verification failed.
It also includes a lightweight savings estimate based on configurable per-request local and cloud cost assumptions.
The logs endpoint exposes recent request-level decisions so users can inspect privacy detection, verification, escalation, final route choice, and estimated per-request cost directly.
Tool calling
RouteLabs now passes through OpenAI-style tools and tool_choice fields, which makes it more usable for agent loops and function-calling workflows.
Example:
curl -X POST http://127.0.0.1:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model":"route-auto",
"messages":[{"role":"user","content":"What is the weather in Chicago?"}],
"tools":[
{
"type":"function",
"function":{
"name":"get_weather",
"description":"Get weather for a city",
"parameters":{
"type":"object",
"properties":{"city":{"type":"string"}},
"required":["city"]
}
}
}
]
}'
If the model decides to call a tool, the response returns OpenAI-style tool_calls in the assistant message.
Streaming
RouteLabs now supports OpenAI-style streaming on /v1/chat/completions when stream=true.
Example:
curl -N -X POST http://127.0.0.1:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model":"route-auto",
"stream":true,
"messages":[{"role":"user","content":"Summarize RouteLabs Router in one sentence."}]
}'
This currently exposes an OpenAI-style SSE stream from the RouteLabs API layer so existing clients can consume streamed chunks normally.
Structured outputs and passthrough
RouteLabs now passes through several common OpenAI chat request fields so existing clients can switch over with fewer changes:
response_formattemperaturetop_pmax_tokensstopseedfrequency_penaltypresence_penalty
For local Ollama execution, OpenAI-style structured output requests are mapped into Ollama-compatible JSON mode or JSON-schema mode where possible.
Example:
curl -X POST http://127.0.0.1:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model":"route-auto",
"messages":[{"role":"user","content":"Return a JSON object with keys title and summary for RouteLabs Router."}],
"response_format":{"type":"json_object"},
"temperature":0.2,
"max_tokens":120
}'
Existing tool compatibility
RouteLabs now exposes the two OpenAI-style endpoints many existing tools check first:
/v1/chat/completions/v1/embeddings/v1/models
That makes it easier to place RouteLabs in front of:
- OpenAI Python SDK clients
- LangChain
ChatOpenAIclients configured withbase_url - Open WebUI connections that validate providers through
/models
Privacy-aware behavior
The router can now automatically prefer local execution for requests that look like:
- emails or phone-like identifiers
- SSN-like or account-like identifiers
- secret-like tokens
- code-like content
This first version uses lightweight heuristics so it is easy to run locally.
For a more advanced future detector, the project can integrate a model such as openai/privacy-filter.
Run with Ollama
Start Ollama, make sure the configured model exists, then run the server:
ollama serve
ollama pull qwen3:4b
router start --reload
The default local provider configuration lives in config/router.yaml.
Hybrid mode example
With both Ollama and OPENAI_API_KEY configured:
- simple tasks usually run locally
- private tasks prefer local execution
- high-complexity tasks can route to the cloud
Example cloud-leaning route check:
curl -X POST http://127.0.0.1:8000/v1/route \
-H "Content-Type: application/json" \
-d '{"task":"design architecture for a multi-step agent","private":false}'
More examples
- curl walkthrough:
examples/curl-quickstart.md - product framing and common scenarios:
examples/use-cases.md - agent loop walkthrough:
examples/agent-loop.md
Example Routing Philosophy
- send simple, low-risk tasks to local models first
- prefer local execution when privacy rules require it
- escalate to stronger models when verification or confidence checks fail
- keep the decision trace visible so routing can be audited and improved
Near-Term Roadmap
- richer verification strategies beyond heuristics
- policy packs for privacy and cost controls
- better task classification and prompt-shape heuristics
- latency-aware telemetry and routing feedback loops
- benchmark harness for local vs cloud trade-off analysis
More detail lives in ROADMAP.md.
License
This scaffold uses the MIT License. See LICENSE.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file routelabs_router-0.2.0.tar.gz.
File metadata
- Download URL: routelabs_router-0.2.0.tar.gz
- Upload date:
- Size: 39.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
93fbbf9acd94d53c4311cc08f6db95b47a0e735120a35454a31ada133110b07c
|
|
| MD5 |
2b072b89bfaf075c1d013cd019b87384
|
|
| BLAKE2b-256 |
3d4814fe38fb7f09041841e4bfc9bdca8bb30d59f8576220020e77940267de1c
|
Provenance
The following attestation bundles were made for routelabs_router-0.2.0.tar.gz:
Publisher:
publish.yml on routelabsai/router
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
routelabs_router-0.2.0.tar.gz -
Subject digest:
93fbbf9acd94d53c4311cc08f6db95b47a0e735120a35454a31ada133110b07c - Sigstore transparency entry: 1514981752
- Sigstore integration time:
-
Permalink:
routelabsai/router@4331303d12845842b5cfa1649911f5fd47b88961 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/routelabsai
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@4331303d12845842b5cfa1649911f5fd47b88961 -
Trigger Event:
push
-
Statement type:
File details
Details for the file routelabs_router-0.2.0-py3-none-any.whl.
File metadata
- Download URL: routelabs_router-0.2.0-py3-none-any.whl
- Upload date:
- Size: 30.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
59a93d9de4ce634c95d1e1a4d0142db90c76e35a005e0eef616a66fff779c335
|
|
| MD5 |
2a824ca4d34dc5c265279632c361dbdd
|
|
| BLAKE2b-256 |
9546d757de3d8ca3ca64393f6715f030ac2574839ea19a5c8854b32998c57d7d
|
Provenance
The following attestation bundles were made for routelabs_router-0.2.0-py3-none-any.whl:
Publisher:
publish.yml on routelabsai/router
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
routelabs_router-0.2.0-py3-none-any.whl -
Subject digest:
59a93d9de4ce634c95d1e1a4d0142db90c76e35a005e0eef616a66fff779c335 - Sigstore transparency entry: 1514981860
- Sigstore integration time:
-
Permalink:
routelabsai/router@4331303d12845842b5cfa1649911f5fd47b88961 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/routelabsai
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@4331303d12845842b5cfa1649911f5fd47b88961 -
Trigger Event:
push
-
Statement type: