Skip to main content

Local-first LLM routing gateway — use model='neuralbroker' and it routes intelligently between local Ollama, discovered subscriptions (Claude Pro, Codex), and paid API fallbacks.

Project description

NeuralBroker

NeuralBroker

The intelligent LLM gateway that makes your $20/mo subscriptions work everywhere.

PyPI version Python 3.10+ License: MIT


NeuralBroker is a local-first LLM routing daemon that sits between your AI tools (Claude Code, Cursor, Codex, Cline) and your models. It exposes a single OpenAI-compatible endpoint and a virtual model called neuralbroker — tools talk to that name, and NeuralBroker silently picks the best backend for every request.

The idea is simple: Why pay per-token on the API when you already pay $20/month for Claude Pro? NeuralBroker discovers your existing subscriptions, uses them for hard tasks, and sends easy tasks to your local GPU for free.


How it works

Your IDE / Tool
       │  model: "neuralbroker"
       ▼
 ┌─────────────────────────────────┐
 │         NeuralBroker            │
 │                                 │
 │  1. Score prompt (15 dims, <1ms)│
 │  2. Classify → SIMPLE/MEDIUM/   │
 │                COMPLEX/REASONING│
 │  3. Pick backend:               │
 │     SIMPLE/MEDIUM → Local Ollama│
 │     COMPLEX/REASONING →         │
 │       ① Discovered subscription │  ← Claude Pro / Codex / ChatGPT
 │       ② Paid API key fallback   │  ← Groq / OpenAI / Anthropic
 └─────────────────────────────────┘
       │
       ▼
 Best model for the job
 (you never choose manually again)

The 3-Tier Cost Strategy

Task Tier Example Backend Your Cost
SIMPLE "What is the capital of France?" Local Ollama (llama3.2:1b) $0.00
MEDIUM "Write a short cover letter" Local Ollama (qwen2.5:7b) $0.00
COMPLEX "Refactor this 500-line module" Claude Pro subscription $0.00 (already paying)
REASONING "Prove this math theorem step by step" Claude Pro subscription $0.00 (already paying)
Fallback No local + no subscription Groq/OpenAI API ~$0.002

Quick Start

pip install neuralbrok
neuralbrok setup    # Detect your GPU and generate config
neuralbrok start    # Start the gateway on http://localhost:8000

Point any OpenAI-compatible tool to http://localhost:8000/v1 with model=neuralbroker and you're done.


Features

🧠 Intelligent Routing (No Config Required)

  • 15-dimension prompt scoring classifies every request in under 1ms — no external LLM needed for routing decisions
  • NeuralFit hardware scoring picks the best local model for your specific GPU and VRAM capacity
  • Virtual model name — set model=neuralbroker once, never touch it again

💸 Subscription Inheritance

  • Auto-discovers Claude Code OAuth sessions, Codex auth, and env-based API keys on startup
  • Inherited subscriptions are treated as zero marginal cost — they're preferred over paid API keys for high-tier tasks
  • Works with: Claude Pro/Max, GitHub Copilot (Codex), ChatGPT Plus

🖥️ Local-First

  • Ollama and llama.cpp supported out of the box
  • VRAM-aware: automatically avoids routing to local when VRAM is critically low
  • Models are ranked by NeuralFit composite score (quality, speed, context fit, hardware fit)

🔌 One-Command IDE Integration

neuralbrok setup claude-code   # Wires NeuralBroker into Claude Code
neuralbrok setup cursor        # Wires NeuralBroker into Cursor
neuralbrok setup codex         # Wires NeuralBroker into Codex CLI
neuralbrok setup cline         # Wires NeuralBroker into Cline (VS Code)

Supports 20+ tools: Claude Code, Cursor, Cline, GitHub Copilot, Gemini CLI, OpenCode, Warp, Codex, Amp, Kimi Code, Firebender, Windsurf, and more.

📡 MCP Server

NeuralBroker ships with an MCP server that exposes routing intelligence directly to Claude Code and Cursor:

neuralbrok mcp   # Start MCP server on stdio

Available MCP tools:

  • nb_route_preview — Preview routing tier for any prompt
  • nb_get_active_auth — See which subscriptions are currently discovered

Configuration

NeuralBroker auto-detects your hardware and generates a config on first run. The config lives at ~/.neuralbrok/config.yaml.

local_nodes:
  - name: local
    runtime: ollama
    host: localhost:11434

routing:
  default_mode: smart   # smart | cost | speed | fallback

# Optional: Specify which models are allowed for smart mode
# allowed_models:
#   - qwen2.5:7b
#   - llama3.2:3b

# Optional: Cloud fallback models (Ollama pull tags)
# ollama_cloud_models:
#   - claude-sonnet-4-5

Routing Modes

Mode Behavior
smart 15-dim scoring decides local vs cloud per-request (default)
cost Always prefer cheapest backend
speed Always prefer lowest-latency backend
fallback Try local first; spill to cloud only on failure

Subscription Discovery

NeuralBroker automatically scans for auth on startup. View what it found:

curl http://localhost:8000/nb/discovered

To disable auto-discovery:

NB_DISABLE_AUTO_DISCOVERY=1 neuralbrok start

API Reference

NeuralBroker is fully OpenAI-compatible.

# Chat completions — use "neuralbroker" to activate smart routing
curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "neuralbroker",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

# List available models
curl http://localhost:8000/v1/models

# Check routing stats
curl http://localhost:8000/nb/stats

# Last 500 routing decisions
curl http://localhost:8000/nb/routing-log

# Live hardware info
curl http://localhost:8000/nb/hardware

# Change routing mode at runtime
curl -X POST http://localhost:8000/nb/mode \
  -H "Content-Type: application/json" \
  -d '{"mode": "speed"}'

Supported Providers

Local Cloud (API Key) Cloud (Subscription Auto-Discovered)
Ollama Groq Claude Pro / Max (Claude Code)
llama.cpp Together AI GitHub Copilot (Codex)
LM Studio OpenAI ChatGPT Plus
Anthropic API
Gemini
Mistral
Perplexity
DeepSeek
+ 15 more

Observability

  • Dashboard: http://localhost:8000/dashboard — Live routing log, VRAM gauge, per-provider stats
  • Prometheus: http://localhost:8000/metrics
  • Grafana: Pre-built dashboards in grafana/

Security Note

NeuralBroker inherits auth tokens from tools already installed and authenticated on your machine. It never sends your credentials to external services — tokens are used directly against their respective provider APIs. You remain in full control.


License

MIT © NeuralBroker contributors

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

neuralbrok-0.9.0.tar.gz (130.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

neuralbrok-0.9.0-py3-none-any.whl (143.5 kB view details)

Uploaded Python 3

File details

Details for the file neuralbrok-0.9.0.tar.gz.

File metadata

  • Download URL: neuralbrok-0.9.0.tar.gz
  • Upload date:
  • Size: 130.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for neuralbrok-0.9.0.tar.gz
Algorithm Hash digest
SHA256 2d6c9f39d39c386e4562d8807b6df0e5e120ebb58e258c69bb44a1e9f738c25c
MD5 6de5ff36991380ccd34775c3c5c4db06
BLAKE2b-256 7f86636211aee9d976ee7f3672e9104d4c55036d4498b674bf1c81e7c23e1f8d

See more details on using hashes here.

Provenance

The following attestation bundles were made for neuralbrok-0.9.0.tar.gz:

Publisher: pypi-publish.yml on khan-sha/neuralbroker

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file neuralbrok-0.9.0-py3-none-any.whl.

File metadata

  • Download URL: neuralbrok-0.9.0-py3-none-any.whl
  • Upload date:
  • Size: 143.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for neuralbrok-0.9.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c412d6f57208de9f3807a88d411e54901def7e1ea45899ae19a024eee32b9f2a
MD5 0416c63f0dcf27b9297a6a3c08a1d876
BLAKE2b-256 b4c8a8fabca68162abacbaca45591a0fa746dc82f276c97d1ad263258741e878

See more details on using hashes here.

Provenance

The following attestation bundles were made for neuralbrok-0.9.0-py3-none-any.whl:

Publisher: pypi-publish.yml on khan-sha/neuralbroker

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page