Local-first LLM routing gateway — use model='neuralbroker' and it routes intelligently between local Ollama, discovered subscriptions (Claude Pro, Codex), and paid API fallbacks.
Project description
NeuralBroker is a local-first LLM routing daemon that sits between your AI tools (Claude Code, Cursor, Codex, Cline) and your models. It exposes a single OpenAI-compatible endpoint and a virtual model called neuralbroker — tools talk to that name, and NeuralBroker silently picks the best backend for every request.
The idea is simple: Why pay per-token on the API when you already pay $20/month for Claude Pro? NeuralBroker discovers your existing subscriptions, uses them for hard tasks, and sends easy tasks to your local GPU for free.
How it works
Your IDE / Tool
│ model: "neuralbroker"
▼
┌─────────────────────────────────┐
│ NeuralBroker │
│ │
│ 1. Score prompt (15 dims, <1ms)│
│ 2. Classify → SIMPLE/MEDIUM/ │
│ COMPLEX/REASONING│
│ 3. Pick backend: │
│ SIMPLE/MEDIUM → Local Ollama│
│ COMPLEX/REASONING → │
│ ① Discovered subscription │ ← Claude Pro / Codex / ChatGPT
│ ② Paid API key fallback │ ← Groq / OpenAI / Anthropic
└─────────────────────────────────┘
│
▼
Best model for the job
(you never choose manually again)
The 3-Tier Cost Strategy
| Task Tier | Example | Backend | Your Cost |
|---|---|---|---|
SIMPLE |
"What is the capital of France?" | Local Ollama (llama3.2:1b) | $0.00 |
MEDIUM |
"Write a short cover letter" | Local Ollama (qwen2.5:7b) | $0.00 |
COMPLEX |
"Refactor this 500-line module" | Claude Pro subscription | $0.00 (already paying) |
REASONING |
"Prove this math theorem step by step" | Claude Pro subscription | $0.00 (already paying) |
| Fallback | No local + no subscription | Groq/OpenAI API | ~$0.002 |
Quick Start
pip install neuralbrok
neuralbrok setup # Detect your GPU and generate config
neuralbrok start # Start the gateway on http://localhost:8000
Point any OpenAI-compatible tool to http://localhost:8000/v1 with model=neuralbroker and you're done.
Features
🧠 Intelligent Routing (No Config Required)
- 15-dimension prompt scoring classifies every request in under 1ms — no external LLM needed for routing decisions
- NeuralFit hardware scoring picks the best local model for your specific GPU and VRAM capacity
- Virtual model name — set
model=neuralbrokeronce, never touch it again
💸 Subscription Inheritance
- Auto-discovers Claude Code OAuth sessions, Codex auth, and env-based API keys on startup
- Inherited subscriptions are treated as zero marginal cost — they're preferred over paid API keys for high-tier tasks
- Works with: Claude Pro/Max, GitHub Copilot (Codex), ChatGPT Plus
🖥️ Local-First
- Ollama and llama.cpp supported out of the box
- VRAM-aware: automatically avoids routing to local when VRAM is critically low
- Models are ranked by NeuralFit composite score (quality, speed, context fit, hardware fit)
🔌 One-Command IDE Integration
neuralbrok setup claude-code # Wires NeuralBroker into Claude Code
neuralbrok setup cursor # Wires NeuralBroker into Cursor
neuralbrok setup codex # Wires NeuralBroker into Codex CLI
neuralbrok setup cline # Wires NeuralBroker into Cline (VS Code)
Supports 20+ tools: Claude Code, Cursor, Cline, GitHub Copilot, Gemini CLI, OpenCode, Warp, Codex, Amp, Kimi Code, Firebender, Windsurf, and more.
📡 MCP Server
NeuralBroker ships with an MCP server that exposes routing intelligence directly to Claude Code and Cursor:
neuralbrok mcp # Start MCP server on stdio
Available MCP tools:
nb_route_preview— Preview routing tier for any promptnb_get_active_auth— See which subscriptions are currently discovered
Configuration
NeuralBroker auto-detects your hardware and generates a config on first run. The config lives at ~/.neuralbrok/config.yaml.
local_nodes:
- name: local
runtime: ollama
host: localhost:11434
routing:
default_mode: smart # smart | cost | speed | fallback
# Optional: Specify which models are allowed for smart mode
# allowed_models:
# - qwen2.5:7b
# - llama3.2:3b
# Optional: Cloud fallback models (Ollama pull tags)
# ollama_cloud_models:
# - claude-sonnet-4-5
Routing Modes
| Mode | Behavior |
|---|---|
smart |
15-dim scoring decides local vs cloud per-request (default) |
cost |
Always prefer cheapest backend |
speed |
Always prefer lowest-latency backend |
fallback |
Try local first; spill to cloud only on failure |
Subscription Discovery
NeuralBroker automatically scans for auth on startup. View what it found:
curl http://localhost:8000/nb/discovered
To disable auto-discovery:
NB_DISABLE_AUTO_DISCOVERY=1 neuralbrok start
API Reference
NeuralBroker is fully OpenAI-compatible.
# Chat completions — use "neuralbroker" to activate smart routing
curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "neuralbroker",
"messages": [{"role": "user", "content": "Hello!"}]
}'
# List available models
curl http://localhost:8000/v1/models
# Check routing stats
curl http://localhost:8000/nb/stats
# Last 500 routing decisions
curl http://localhost:8000/nb/routing-log
# Live hardware info
curl http://localhost:8000/nb/hardware
# Change routing mode at runtime
curl -X POST http://localhost:8000/nb/mode \
-H "Content-Type: application/json" \
-d '{"mode": "speed"}'
Supported Providers
| Local | Cloud (API Key) | Cloud (Subscription Auto-Discovered) |
|---|---|---|
| Ollama | Groq | Claude Pro / Max (Claude Code) |
| llama.cpp | Together AI | GitHub Copilot (Codex) |
| LM Studio | OpenAI | ChatGPT Plus |
| Anthropic API | ||
| Gemini | ||
| Mistral | ||
| Perplexity | ||
| DeepSeek | ||
| + 15 more |
Observability
- Dashboard:
http://localhost:8000/dashboard— Live routing log, VRAM gauge, per-provider stats - Prometheus:
http://localhost:8000/metrics - Grafana: Pre-built dashboards in
grafana/
Security Note
NeuralBroker inherits auth tokens from tools already installed and authenticated on your machine. It never sends your credentials to external services — tokens are used directly against their respective provider APIs. You remain in full control.
License
MIT © NeuralBroker contributors
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file neuralbrok-3.0.2.tar.gz.
File metadata
- Download URL: neuralbrok-3.0.2.tar.gz
- Upload date:
- Size: 134.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
eaf2597e7c7524d918ac3b827cc7bc8d103a159ad890c37ba64f307214c63d46
|
|
| MD5 |
92b0d678d485b32d1f4790d3fbd42b00
|
|
| BLAKE2b-256 |
a052dd000d6468da3687a1dd696a6f8962a7965c8b076b2c9a6a4c5556af9d70
|
Provenance
The following attestation bundles were made for neuralbrok-3.0.2.tar.gz:
Publisher:
pypi-publish.yml on khan-sha/neuralbroker
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
neuralbrok-3.0.2.tar.gz -
Subject digest:
eaf2597e7c7524d918ac3b827cc7bc8d103a159ad890c37ba64f307214c63d46 - Sigstore transparency entry: 1543865155
- Sigstore integration time:
-
Permalink:
khan-sha/neuralbroker@0c560911067a1ccb31c625db1bb02fb4c8d9c898 -
Branch / Tag:
refs/tags/v3.0.2 - Owner: https://github.com/khan-sha
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pypi-publish.yml@0c560911067a1ccb31c625db1bb02fb4c8d9c898 -
Trigger Event:
push
-
Statement type:
File details
Details for the file neuralbrok-3.0.2-py3-none-any.whl.
File metadata
- Download URL: neuralbrok-3.0.2-py3-none-any.whl
- Upload date:
- Size: 150.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7b19757fbd356c277653acccb39e2c3c9459969aa11c8312c714e531fbbff612
|
|
| MD5 |
941517d371db7fd91e98f9ed97344e05
|
|
| BLAKE2b-256 |
0f216928f06f6b203d05d57ac54b831216f0eab412dc0547b2d8ff4dd67457cd
|
Provenance
The following attestation bundles were made for neuralbrok-3.0.2-py3-none-any.whl:
Publisher:
pypi-publish.yml on khan-sha/neuralbroker
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
neuralbrok-3.0.2-py3-none-any.whl -
Subject digest:
7b19757fbd356c277653acccb39e2c3c9459969aa11c8312c714e531fbbff612 - Sigstore transparency entry: 1543865271
- Sigstore integration time:
-
Permalink:
khan-sha/neuralbroker@0c560911067a1ccb31c625db1bb02fb4c8d9c898 -
Branch / Tag:
refs/tags/v3.0.2 - Owner: https://github.com/khan-sha
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pypi-publish.yml@0c560911067a1ccb31c625db1bb02fb4c8d9c898 -
Trigger Event:
push
-
Statement type: