Skip to main content

Local-first LLM routing gateway — use model='neuralbroker' and it routes intelligently between local Ollama, discovered subscriptions (Claude Pro, Codex), and paid API fallbacks.

Project description

NeuralBroker

NeuralBroker

The intelligent LLM gateway that makes your $20/mo subscriptions work everywhere.

PyPI version Python 3.10+ License: MIT


NeuralBroker is a local-first LLM routing daemon that sits between your AI tools (Claude Code, Cursor, Codex, Cline) and your models. It exposes a single OpenAI-compatible endpoint and a virtual model called neuralbroker — tools talk to that name, and NeuralBroker silently picks the best backend for every request.

The idea is simple: Why pay per-token on the API when you already pay $20/month for Claude Pro? NeuralBroker discovers your existing subscriptions, uses them for hard tasks, and sends easy tasks to your local GPU for free.


How it works

Your IDE / Tool
       │  model: "neuralbroker"
       ▼
 ┌─────────────────────────────────┐
 │         NeuralBroker            │
 │                                 │
 │  1. Score prompt (15 dims, <1ms)│
 │  2. Classify → SIMPLE/MEDIUM/   │
 │                COMPLEX/REASONING│
 │  3. Pick backend:               │
 │     SIMPLE/MEDIUM → Local Ollama│
 │     COMPLEX/REASONING →         │
 │       ① Discovered subscription │  ← Claude Pro / Codex / ChatGPT
 │       ② Paid API key fallback   │  ← Groq / OpenAI / Anthropic
 └─────────────────────────────────┘
       │
       ▼
 Best model for the job
 (you never choose manually again)

The 3-Tier Cost Strategy

Task Tier Example Backend Your Cost
SIMPLE "What is the capital of France?" Local Ollama (llama3.2:1b) $0.00
MEDIUM "Write a short cover letter" Local Ollama (qwen2.5:7b) $0.00
COMPLEX "Refactor this 500-line module" Claude Pro subscription $0.00 (already paying)
REASONING "Prove this math theorem step by step" Claude Pro subscription $0.00 (already paying)
Fallback No local + no subscription Groq/OpenAI API ~$0.002

Quick Start

pip install neuralbrok
neuralbrok setup    # Detect your GPU and generate config
neuralbrok start    # Start the gateway on http://localhost:8000

Point any OpenAI-compatible tool to http://localhost:8000/v1 with model=neuralbroker and you're done.


Features

🧠 Intelligent Routing (No Config Required)

  • 15-dimension prompt scoring classifies every request in under 1ms — no external LLM needed for routing decisions
  • NeuralFit hardware scoring picks the best local model for your specific GPU and VRAM capacity
  • Virtual model name — set model=neuralbroker once, never touch it again

💸 Subscription Inheritance

  • Auto-discovers Claude Code OAuth sessions, Codex auth, and env-based API keys on startup
  • Inherited subscriptions are treated as zero marginal cost — they're preferred over paid API keys for high-tier tasks
  • Works with: Claude Pro/Max, GitHub Copilot (Codex), ChatGPT Plus

🖥️ Local-First

  • Ollama and llama.cpp supported out of the box
  • VRAM-aware: automatically avoids routing to local when VRAM is critically low
  • Models are ranked by NeuralFit composite score (quality, speed, context fit, hardware fit)

🔌 One-Command IDE Integration

neuralbrok setup claude-code   # Wires NeuralBroker into Claude Code
neuralbrok setup cursor        # Wires NeuralBroker into Cursor
neuralbrok setup codex         # Wires NeuralBroker into Codex CLI
neuralbrok setup cline         # Wires NeuralBroker into Cline (VS Code)

Supports 20+ tools: Claude Code, Cursor, Cline, GitHub Copilot, Gemini CLI, OpenCode, Warp, Codex, Amp, Kimi Code, Firebender, Windsurf, and more.

📡 MCP Server

NeuralBroker ships with an MCP server that exposes routing intelligence directly to Claude Code and Cursor:

neuralbrok mcp   # Start MCP server on stdio

Available MCP tools:

  • nb_route_preview — Preview routing tier for any prompt
  • nb_get_active_auth — See which subscriptions are currently discovered

Configuration

NeuralBroker auto-detects your hardware and generates a config on first run. The config lives at ~/.neuralbrok/config.yaml.

local_nodes:
  - name: local
    runtime: ollama
    host: localhost:11434

routing:
  default_mode: smart   # smart | cost | speed | fallback

# Optional: Specify which models are allowed for smart mode
# allowed_models:
#   - qwen2.5:7b
#   - llama3.2:3b

# Optional: Cloud fallback models (Ollama pull tags)
# ollama_cloud_models:
#   - claude-sonnet-4-5

Routing Modes

Mode Behavior
smart 15-dim scoring decides local vs cloud per-request (default)
cost Always prefer cheapest backend
speed Always prefer lowest-latency backend
fallback Try local first; spill to cloud only on failure

Subscription Discovery

NeuralBroker automatically scans for auth on startup. View what it found:

curl http://localhost:8000/nb/discovered

To disable auto-discovery:

NB_DISABLE_AUTO_DISCOVERY=1 neuralbrok start

API Reference

NeuralBroker is fully OpenAI-compatible.

# Chat completions — use "neuralbroker" to activate smart routing
curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "neuralbroker",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

# List available models
curl http://localhost:8000/v1/models

# Check routing stats
curl http://localhost:8000/nb/stats

# Last 500 routing decisions
curl http://localhost:8000/nb/routing-log

# Live hardware info
curl http://localhost:8000/nb/hardware

# Change routing mode at runtime
curl -X POST http://localhost:8000/nb/mode \
  -H "Content-Type: application/json" \
  -d '{"mode": "speed"}'

Supported Providers

Local Cloud (API Key) Cloud (Subscription Auto-Discovered)
Ollama Groq Claude Pro / Max (Claude Code)
llama.cpp Together AI GitHub Copilot (Codex)
LM Studio OpenAI ChatGPT Plus
Anthropic API
Gemini
Mistral
Perplexity
DeepSeek
+ 15 more

Observability

  • Dashboard: http://localhost:8000/dashboard — Live routing log, VRAM gauge, per-provider stats
  • Prometheus: http://localhost:8000/metrics
  • Grafana: Pre-built dashboards in grafana/

Security Note

NeuralBroker inherits auth tokens from tools already installed and authenticated on your machine. It never sends your credentials to external services — tokens are used directly against their respective provider APIs. You remain in full control.


License

MIT © NeuralBroker contributors

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

neuralbrok-0.9.2.tar.gz (134.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

neuralbrok-0.9.2-py3-none-any.whl (150.7 kB view details)

Uploaded Python 3

File details

Details for the file neuralbrok-0.9.2.tar.gz.

File metadata

  • Download URL: neuralbrok-0.9.2.tar.gz
  • Upload date:
  • Size: 134.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for neuralbrok-0.9.2.tar.gz
Algorithm Hash digest
SHA256 83b572e0d232b80d2d7d957cfd4532c628deb6c89b87f005a513748e9080e5ba
MD5 7dfb7d788f399f8cddd2f38dc8ceccb8
BLAKE2b-256 36267f58a6022579f8143e077c9e55834eea7baa04934b6883be119ad94c4781

See more details on using hashes here.

Provenance

The following attestation bundles were made for neuralbrok-0.9.2.tar.gz:

Publisher: pypi-publish.yml on khan-sha/neuralbroker

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file neuralbrok-0.9.2-py3-none-any.whl.

File metadata

  • Download URL: neuralbrok-0.9.2-py3-none-any.whl
  • Upload date:
  • Size: 150.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for neuralbrok-0.9.2-py3-none-any.whl
Algorithm Hash digest
SHA256 03c81c81b0d9bc08621d95e8a69e8e53563be6700cefb5c8a2d1245bf930ea38
MD5 b8f95f29eaf7b52d455fa31ceec2f754
BLAKE2b-256 9e2dee247054594726772b3a8f22c1086db5959b77bbeac03f409a915e64a6d7

See more details on using hashes here.

Provenance

The following attestation bundles were made for neuralbrok-0.9.2-py3-none-any.whl:

Publisher: pypi-publish.yml on khan-sha/neuralbroker

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page