Skip to main content

Local-first LLM routing gateway — use model='neuralbroker' and it routes intelligently between local Ollama, discovered subscriptions (Claude Pro, Codex), and paid API fallbacks.

Project description

NeuralBroker

NeuralBroker

The intelligent LLM gateway that makes your $20/mo subscriptions work everywhere.

PyPI version Python 3.10+ License: MIT


NeuralBroker is a local-first LLM routing daemon that sits between your AI tools (Claude Code, Cursor, Codex, Cline) and your models. It exposes a single OpenAI-compatible endpoint and a virtual model called neuralbroker — tools talk to that name, and NeuralBroker silently picks the best backend for every request.

The idea is simple: Why pay per-token on the API when you already pay $20/month for Claude Pro? NeuralBroker discovers your existing subscriptions, uses them for hard tasks, and sends easy tasks to your local GPU for free.


How it works

Your IDE / Tool
       │  model: "neuralbroker"
       ▼
 ┌─────────────────────────────────┐
 │         NeuralBroker            │
 │                                 │
 │  1. Score prompt (15 dims, <1ms)│
 │  2. Classify → SIMPLE/MEDIUM/   │
 │                COMPLEX/REASONING│
 │  3. Pick backend:               │
 │     SIMPLE/MEDIUM → Local Ollama│
 │     COMPLEX/REASONING →         │
 │       ① Discovered subscription │  ← Claude Pro / Codex / ChatGPT
 │       ② Paid API key fallback   │  ← Groq / OpenAI / Anthropic
 └─────────────────────────────────┘
       │
       ▼
 Best model for the job
 (you never choose manually again)

The 3-Tier Cost Strategy

Task Tier Example Backend Your Cost
SIMPLE "What is the capital of France?" Local Ollama (llama3.2:1b) $0.00
MEDIUM "Write a short cover letter" Local Ollama (qwen2.5:7b) $0.00
COMPLEX "Refactor this 500-line module" Claude Pro subscription $0.00 (already paying)
REASONING "Prove this math theorem step by step" Claude Pro subscription $0.00 (already paying)
Fallback No local + no subscription Groq/OpenAI API ~$0.002

Quick Start

pip install neuralbrok
neuralbrok setup    # Detect your GPU and generate config
neuralbrok start    # Start the gateway on http://localhost:8000

Point any OpenAI-compatible tool to http://localhost:8000/v1 with model=neuralbroker and you're done.


Features

🧠 Intelligent Routing (No Config Required)

  • 15-dimension prompt scoring classifies every request in under 1ms — no external LLM needed for routing decisions
  • NeuralFit hardware scoring picks the best local model for your specific GPU and VRAM capacity
  • Virtual model name — set model=neuralbroker once, never touch it again

💸 Subscription Inheritance

  • Auto-discovers Claude Code OAuth sessions, Codex auth, and env-based API keys on startup
  • Inherited subscriptions are treated as zero marginal cost — they're preferred over paid API keys for high-tier tasks
  • Works with: Claude Pro/Max, GitHub Copilot (Codex), ChatGPT Plus

🖥️ Local-First

  • Ollama and llama.cpp supported out of the box
  • VRAM-aware: automatically avoids routing to local when VRAM is critically low
  • Models are ranked by NeuralFit composite score (quality, speed, context fit, hardware fit)

🔌 One-Command IDE Integration

neuralbrok setup claude-code   # Wires NeuralBroker into Claude Code
neuralbrok setup cursor        # Wires NeuralBroker into Cursor
neuralbrok setup codex         # Wires NeuralBroker into Codex CLI
neuralbrok setup cline         # Wires NeuralBroker into Cline (VS Code)

Supports 20+ tools: Claude Code, Cursor, Cline, GitHub Copilot, Gemini CLI, OpenCode, Warp, Codex, Amp, Kimi Code, Firebender, Windsurf, and more.

📡 MCP Server

NeuralBroker ships with an MCP server that exposes routing intelligence directly to Claude Code and Cursor:

neuralbrok mcp   # Start MCP server on stdio

Available MCP tools:

  • nb_route_preview — Preview routing tier for any prompt
  • nb_get_active_auth — See which subscriptions are currently discovered

Configuration

NeuralBroker auto-detects your hardware and generates a config on first run. The config lives at ~/.neuralbrok/config.yaml.

local_nodes:
  - name: local
    runtime: ollama
    host: localhost:11434

routing:
  default_mode: smart   # smart | cost | speed | fallback

# Optional: Specify which models are allowed for smart mode
# allowed_models:
#   - qwen2.5:7b
#   - llama3.2:3b

# Optional: Cloud fallback models (Ollama pull tags)
# ollama_cloud_models:
#   - claude-sonnet-4-5

Routing Modes

Mode Behavior
smart 15-dim scoring decides local vs cloud per-request (default)
cost Always prefer cheapest backend
speed Always prefer lowest-latency backend
fallback Try local first; spill to cloud only on failure

Subscription Discovery

NeuralBroker automatically scans for auth on startup. View what it found:

curl http://localhost:8000/nb/discovered

To disable auto-discovery:

NB_DISABLE_AUTO_DISCOVERY=1 neuralbrok start

API Reference

NeuralBroker is fully OpenAI-compatible.

# Chat completions — use "neuralbroker" to activate smart routing
curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "neuralbroker",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

# List available models
curl http://localhost:8000/v1/models

# Check routing stats
curl http://localhost:8000/nb/stats

# Last 500 routing decisions
curl http://localhost:8000/nb/routing-log

# Live hardware info
curl http://localhost:8000/nb/hardware

# Change routing mode at runtime
curl -X POST http://localhost:8000/nb/mode \
  -H "Content-Type: application/json" \
  -d '{"mode": "speed"}'

Supported Providers

Local Cloud (API Key) Cloud (Subscription Auto-Discovered)
Ollama Groq Claude Pro / Max (Claude Code)
llama.cpp Together AI GitHub Copilot (Codex)
LM Studio OpenAI ChatGPT Plus
Anthropic API
Gemini
Mistral
Perplexity
DeepSeek
+ 15 more

Observability

  • Dashboard: http://localhost:8000/dashboard — Live routing log, VRAM gauge, per-provider stats
  • Prometheus: http://localhost:8000/metrics
  • Grafana: Pre-built dashboards in grafana/

Security Note

NeuralBroker inherits auth tokens from tools already installed and authenticated on your machine. It never sends your credentials to external services — tokens are used directly against their respective provider APIs. You remain in full control.


License

MIT © NeuralBroker contributors

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

neuralbrok-3.0.2.tar.gz (134.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

neuralbrok-3.0.2-py3-none-any.whl (150.7 kB view details)

Uploaded Python 3

File details

Details for the file neuralbrok-3.0.2.tar.gz.

File metadata

  • Download URL: neuralbrok-3.0.2.tar.gz
  • Upload date:
  • Size: 134.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for neuralbrok-3.0.2.tar.gz
Algorithm Hash digest
SHA256 eaf2597e7c7524d918ac3b827cc7bc8d103a159ad890c37ba64f307214c63d46
MD5 92b0d678d485b32d1f4790d3fbd42b00
BLAKE2b-256 a052dd000d6468da3687a1dd696a6f8962a7965c8b076b2c9a6a4c5556af9d70

See more details on using hashes here.

Provenance

The following attestation bundles were made for neuralbrok-3.0.2.tar.gz:

Publisher: pypi-publish.yml on khan-sha/neuralbroker

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file neuralbrok-3.0.2-py3-none-any.whl.

File metadata

  • Download URL: neuralbrok-3.0.2-py3-none-any.whl
  • Upload date:
  • Size: 150.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for neuralbrok-3.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 7b19757fbd356c277653acccb39e2c3c9459969aa11c8312c714e531fbbff612
MD5 941517d371db7fd91e98f9ed97344e05
BLAKE2b-256 0f216928f06f6b203d05d57ac54b831216f0eab412dc0547b2d8ff4dd67457cd

See more details on using hashes here.

Provenance

The following attestation bundles were made for neuralbrok-3.0.2-py3-none-any.whl:

Publisher: pypi-publish.yml on khan-sha/neuralbroker

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page