A local-first inference control plane for hybrid LLM routing.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

saisandeep.kantareddy

These details have not been verified by PyPI

Project description

RouteLabs Router

RouteLabs Router is a local-first runtime that sits between your app and local/cloud LLMs.

It gives applications one endpoint that can decide:

when to stay local
when to use the cloud
when privacy should override convenience
which provider and model should handle the request
why that decision was made
when verification forced an escalation
when privacy detection forced local execution

The goal is simple: route each step to the cheapest, fastest, safest model that can still be trusted.

Who This Is For

This repo is mainly for:

AI app builders
local-first power users
agent and workflow developers
teams experimenting with privacy-aware and cost-aware inference

If you want a polished end-user chat app, this is not that. If you want a runtime and routing layer you can plug into your own tools, this is exactly that.

What This Is

Think of RouteLabs as:

a local runtime/server you run on your machine
a Python client you can call from your app
an OpenAI-compatible endpoint you can place in front of existing clients

It is not primarily:

a browser extension
a desktop UI
a plugin marketplace product

Those may come later, but the current product is a runtime + middleware + API.

60-Second Quickstart

Install from PyPI, start the runtime, and send one request:

pip install routelabs-router
export OPENAI_API_KEY=your_api_key_here  # optional, enables cloud execution
router start --reload

Then in another terminal:

curl -X POST http://127.0.0.1:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "messages":[{"role":"user","content":"Summarize this in one sentence: RouteLabs Router chooses between local and cloud models based on privacy, cost, latency, and task complexity."}],
    "private":false
  }'

Install

Recommended user install

pip install routelabs-router
router start

Contributor install

Clone the repo and install from source:

git clone https://github.com/routelabsai/router.git
cd router
conda create -n routelabs-router python=3.11 -y
conda activate routelabs-router
python -m pip install --upgrade pip setuptools wheel
pip install -e '.[dev]'
router start --reload

Why Use This

Most teams today have one of these problems:

Ollama runs local models well, but it does not decide when a task should stay local versus escalate
cloud gateways like LiteLLM and OpenRouter route across hosted APIs, but they are not built around local-first policy decisions
chat apps can call models, but they usually hide the execution logic instead of exposing it

RouteLabs Router is the layer above those tools.

It is for teams who want:

one API for hybrid local + cloud inference
verification-aware escalation instead of naive “hard task -> expensive model”
transparent routing decisions
privacy-aware defaults
automatic local preference for obvious sensitive or code-like content
cost and latency visibility
provider and model selection that can evolve over time
a foundation for agentic step-level routing later

For the longer-term product thesis, see docs/VISION.md.

How You Use It

There are three practical ways to adopt RouteLabs today.

1. As a local runtime/server

Run:

router start --reload

Then point your tools to http://127.0.0.1:8000.

2. As a Python library client

Use the built-in client:

from routelabs_router import RouteLabsClient

client = RouteLabsClient("http://127.0.0.1:8000")
print(client.route("Summarize a short product description"))

3. As an OpenAI-compatible endpoint

If you already have code using an OpenAI-style client, point it at RouteLabs via base_url.

That is one of the easiest ways to adopt it without rewriting your app.

What It Looks Like

app / agent / extension
        |
        v
  RouteLabs Router
        |
        +--> policy + task complexity
        +--> privacy constraints
        +--> provider selection
        +--> verification hooks
        |
        +--> Ollama
        +--> llama.cpp
        +--> cloud provider

Quick Demo

Once the server is running, you can inspect decisions directly:

curl -X POST http://127.0.0.1:8000/v1/route \
  -H "Content-Type: application/json" \
  -d '{"task":"summarize a short product description","private":false}'

Expected shape:

{
  "target": "local",
  "provider": "ollama",
  "model": "qwen3:4b",
  "reason": "task is suitable for local-first execution",
  "complexity": "medium",
  "verify": true
}

What this tells you:

the router chose local
it selected ollama
it picked a model
it marked the request as worth verification

And you can send an OpenAI-style chat request:

curl -X POST http://127.0.0.1:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "messages":[{"role":"user","content":"Summarize this in one sentence: RouteLabs Router chooses between local and cloud models based on privacy, cost, latency, and task complexity."}],
    "private":false
  }'

If Ollama is running locally, that request executes against your configured local model. If OPENAI_API_KEY is set, high-complexity requests can route through the configured OpenAI-compatible cloud provider. The response includes a trace showing the initial route, verification result, and any escalation.

Positioning

Tool	Core strength	What it does not solve
`Ollama`	Great local model runtime and API	Hybrid routing and policy decisions
`LiteLLM`	Cloud API normalization and routing	Local-first execution strategy
`OpenRouter`	Hosted provider access and fallback	On-device privacy-aware control plane
`RouteLabs Router`	Verification-aware local-first runtime with hybrid routing	Early-stage policy and provider coverage

MVP scope

The first version focuses on a narrow but useful slice:

OpenAI-compatible chat-style request handling
local/cloud routing decisions
adapter-based execution
verification-aware fallback hooks
structured telemetry showing why a route was chosen

This repository intentionally starts small. It is a control-plane foundation, not a full chat app.

Use Cases

Local-first copilots that should only escalate when a task gets difficult
Privacy-sensitive workflows where private data should never leave the device
Browser or desktop assistants that need one middleware layer above multiple runtimes
Agent systems that want future step-level routing instead of a single fixed model

Current Status

This is an early but usable product foundation. The repository already includes:

project docs
roadmap
contribution guide
Python project metadata
FastAPI server and CLI
YAML config loading
route inspection endpoint
OpenAI-style /v1/chat/completions endpoint
real local execution through Ollama
generic OpenAI-compatible cloud execution
first verification-aware escalation loop
stats endpoint for local/cloud/escalation visibility
simple estimated cost savings in stats
heuristic privacy detection for email/identifier/code-like content
recent route logs for per-request inspection
test coverage for routing and API behavior
example config profiles
example curl flows

Still early:

verifiers are heuristic and still early
cost and latency dashboards are not implemented yet
privacy detection is heuristic rather than model-based
learning from user corrections is still future work

More Docs

Product vision: docs/VISION.md
Architecture: docs/ARCHITECTURE.md
Roadmap: ROADMAP.md
Contributor guide: CONTRIBUTING.md
Release guide: docs/RELEASE.md
PyPI trusted publishing: docs/TRUSTED_PUBLISHING.md

Setup And Usage

Prerequisites

Python 3.11+
conda recommended for the smoothest setup on macOS

Install from PyPI

conda create -n routelabs-router python=3.11 -y
conda activate routelabs-router
python -m pip install --upgrade pip
pip install routelabs-router

Install from source

Use this path if you want to contribute or modify the router itself.

git clone https://github.com/routelabsai/router.git
cd router
conda create -n routelabs-router python=3.11 -y
conda activate routelabs-router
python -m pip install --upgrade pip setuptools wheel
pip install -e '.[dev]'

Configure cloud execution

If you want cloud-routed requests to execute instead of returning a configuration error, set:

export OPENAI_API_KEY=your_api_key_here

The default cloud adapter uses the OpenAI-compatible endpoint configured in config/router.yaml.

Why `conda` is the recommended path

During validation we hit two common issues that conda + Python 3.11 resolved cleanly:

Python 3.9.7 was too old for this project
older packaging tooling made editable installs unreliable

If you see requires a different Python: 3.9.7 not in '>=3.11', create the conda environment above and retry.

Run tests

pytest

Optional profile configs

The repo includes starter profiles in config/profiles/:

balanced.yaml
local-first.yaml
privacy-first.yaml

Use one as your active config by copying or merging it into config/router.yaml.

Start the runtime

router start --reload

For explicit host or port overrides:

router start --host 0.0.0.0 --port 8000 --reload

Inspect a routing decision

router route --task "summarize a short product description" --private false

Test the API

Health check:

curl http://127.0.0.1:8000/healthz

Route inspection:

curl -X POST http://127.0.0.1:8000/v1/route \
  -H "Content-Type: application/json" \
  -d '{"task":"summarize a short product description","private":false}'

Stats endpoint:

curl http://127.0.0.1:8000/v1/stats

Recent route logs:

curl http://127.0.0.1:8000/v1/logs

Python client

You can also call the router from Python:

from routelabs_router import RouteLabsClient

client = RouteLabsClient("http://127.0.0.1:8000")

route = client.route("Summarize a short product description")
chat = client.chat(
    [
        {
            "role": "user",
            "content": "Summarize this in one sentence: RouteLabs Router chooses between local and cloud models based on privacy, cost, latency, and task complexity.",
        }
    ]
)
stats = client.stats()
logs = client.logs()

There is also a runnable example in examples/python-client.py.

OpenAI-compatible drop-in example

If you already use the OpenAI Python SDK, you can point it at RouteLabs:

from openai import OpenAI

client = OpenAI(
    base_url="http://127.0.0.1:8000/v1",
    api_key="not-needed-for-local-dev",
)

response = client.chat.completions.create(
    model="route-auto",
    messages=[
        {
            "role": "user",
            "content": "Summarize this in one sentence: RouteLabs Router chooses between local and cloud models based on privacy, cost, latency, and task complexity.",
        }
    ],
)

See examples/openai-compatible-client.py. You may need to install the OpenAI SDK separately:

pip install openai

The stats response includes simple estimated fields such as:

estimated_total_cost_usd
estimated_baseline_cloud_cost_usd
estimated_cost_saved_usd
estimated_cloud_requests_avoided

OpenAI-style chat completion:

curl -X POST http://127.0.0.1:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "messages":[{"role":"user","content":"Summarize this in one sentence: RouteLabs Router chooses between local and cloud models based on privacy, cost, latency, and task complexity."}],
    "private":false
  }'

If Ollama is running locally, the chat endpoint will execute against your configured local model. If OPENAI_API_KEY is set, high-complexity tasks can execute through the configured OpenAI-compatible cloud provider. If it is not set, cloud-routed chat requests return a clear configuration error. The stats endpoint gives a simple first pass at the eventual cost/latency visibility story by showing how many requests stayed local, how many escalated, and how often verification failed. It also includes a lightweight savings estimate based on configurable per-request local and cloud cost assumptions. The logs endpoint exposes recent request-level decisions so users can inspect privacy detection, verification, escalation, final route choice, and estimated per-request cost directly.

Privacy-aware behavior

The router can now automatically prefer local execution for requests that look like:

emails or phone-like identifiers
SSN-like or account-like identifiers
secret-like tokens
code-like content

This first version uses lightweight heuristics so it is easy to run locally. For a more advanced future detector, the project can integrate a model such as openai/privacy-filter.

Run with Ollama

Start Ollama, make sure the configured model exists, then run the server:

ollama serve
ollama pull qwen3:4b
router start --reload

The default local provider configuration lives in config/router.yaml.

Hybrid mode example

With both Ollama and OPENAI_API_KEY configured:

simple tasks usually run locally
private tasks prefer local execution
high-complexity tasks can route to the cloud

Example cloud-leaning route check:

curl -X POST http://127.0.0.1:8000/v1/route \
  -H "Content-Type: application/json" \
  -d '{"task":"design architecture for a multi-step agent","private":false}'

More examples

curl walkthrough: examples/curl-quickstart.md
product framing and common scenarios: examples/use-cases.md

Example Routing Philosophy

send simple, low-risk tasks to local models first
prefer local execution when privacy rules require it
escalate to stronger models when verification or confidence checks fail
keep the decision trace visible so routing can be audited and improved

Near-Term Roadmap

richer verification strategies beyond heuristics
policy packs for privacy and cost controls
better task classification and prompt-shape heuristics
latency-aware telemetry and routing feedback loops
benchmark harness for local vs cloud trade-off analysis

More detail lives in ROADMAP.md.

License

This scaffold uses the MIT License. See LICENSE.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

saisandeep.kantareddy

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.2.0

May 12, 2026

0.1.2

May 5, 2026

This version

0.1.1

May 5, 2026

0.1.0

May 5, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

routelabs_router-0.1.1.tar.gz (24.6 kB view details)

Uploaded May 5, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

routelabs_router-0.1.1-py3-none-any.whl (20.9 kB view details)

Uploaded May 5, 2026 Python 3

File details

Details for the file routelabs_router-0.1.1.tar.gz.

File metadata

Download URL: routelabs_router-0.1.1.tar.gz
Upload date: May 5, 2026
Size: 24.6 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for routelabs_router-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`a8257b0d4408c216eed9645be22548bd93f90955eaea48762e8d48c03909b2a4`
MD5	`f3cc86d0b8df957f8c35d0899918b6e3`
BLAKE2b-256	`4ab9597211e64edb8558f7fb1e8cbe811f672fee746d430e12d84fbc0d7d7cea`

See more details on using hashes here.

Provenance

The following attestation bundles were made for routelabs_router-0.1.1.tar.gz:

Publisher: publish.yml on routelabsai/router

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: routelabs_router-0.1.1.tar.gz
- Subject digest: a8257b0d4408c216eed9645be22548bd93f90955eaea48762e8d48c03909b2a4
- Sigstore transparency entry: 1439137395
- Sigstore integration time: May 5, 2026
Source repository:
- Permalink: routelabsai/router@1a9ecee15c9342e74128c48fda195b2ad2a30b6b
- Branch / Tag: refs/tags/v0.1.1
- Owner: https://github.com/routelabsai
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@1a9ecee15c9342e74128c48fda195b2ad2a30b6b
- Trigger Event: push

File details

Details for the file routelabs_router-0.1.1-py3-none-any.whl.

File metadata

Download URL: routelabs_router-0.1.1-py3-none-any.whl
Upload date: May 5, 2026
Size: 20.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for routelabs_router-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`4e86d0978532424c3a6bd99f693e409ec01727804416eb00a49b91486bd64083`
MD5	`0f6ce5f3485391bdc027d978eb224035`
BLAKE2b-256	`4cfaceae235e150152f0292773b08e5ab6f32ade4a2e4dc1d347458a27713c64`

See more details on using hashes here.

Provenance

The following attestation bundles were made for routelabs_router-0.1.1-py3-none-any.whl:

Publisher: publish.yml on routelabsai/router

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: routelabs_router-0.1.1-py3-none-any.whl
- Subject digest: 4e86d0978532424c3a6bd99f693e409ec01727804416eb00a49b91486bd64083
- Sigstore transparency entry: 1439137403
- Sigstore integration time: May 5, 2026
Source repository:
- Permalink: routelabsai/router@1a9ecee15c9342e74128c48fda195b2ad2a30b6b
- Branch / Tag: refs/tags/v0.1.1
- Owner: https://github.com/routelabsai
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@1a9ecee15c9342e74128c48fda195b2ad2a30b6b
- Trigger Event: push

routelabs-router 0.1.1

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

RouteLabs Router

Who This Is For

What This Is

60-Second Quickstart

Install

Recommended user install

Contributor install

Why Use This

How You Use It

1. As a local runtime/server

2. As a Python library client

3. As an OpenAI-compatible endpoint

What It Looks Like

Quick Demo

Positioning

MVP scope

Use Cases

Current Status

More Docs

Setup And Usage

Prerequisites

Install from PyPI

Install from source

Configure cloud execution

Why conda is the recommended path

Run tests

Optional profile configs

Start the runtime

Inspect a routing decision

Test the API

Python client

OpenAI-compatible drop-in example

Privacy-aware behavior

Run with Ollama

Hybrid mode example

More examples

Example Routing Philosophy

Near-Term Roadmap

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

Why `conda` is the recommended path