Local-first LLM routing gateway — use model='neuralbroker' and it routes intelligently between local Ollama, discovered subscriptions (Claude Pro, Codex), and paid API fallbacks.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

khan-ash

These details have not been verified by PyPI

Project description

NeuralBroker

The intelligent LLM gateway that makes your $20/mo subscriptions work everywhere.

NeuralBroker is a local-first LLM routing daemon that sits between your AI tools (Claude Code, Cursor, Codex, Cline) and your models. It exposes a single OpenAI-compatible endpoint and a virtual model called neuralbroker — tools talk to that name, and NeuralBroker silently picks the best backend for every request.

The idea is simple: Why pay per-token on the API when you already pay $20/month for Claude Pro? NeuralBroker discovers your existing subscriptions, uses them for hard tasks, and sends easy tasks to your local GPU for free.

How it works

Your IDE / Tool
       │  model: "neuralbroker"
       ▼
 ┌─────────────────────────────────┐
 │         NeuralBroker            │
 │                                 │
 │  1. Score prompt (15 dims, <1ms)│
 │  2. Classify → SIMPLE/MEDIUM/   │
 │                COMPLEX/REASONING│
 │  3. Pick backend:               │
 │     SIMPLE/MEDIUM → Local Ollama│
 │     COMPLEX/REASONING →         │
 │       ① Discovered subscription │  ← Claude Pro / Codex / ChatGPT
 │       ② Paid API key fallback   │  ← Groq / OpenAI / Anthropic
 └─────────────────────────────────┘
       │
       ▼
 Best model for the job
 (you never choose manually again)

The 3-Tier Cost Strategy

Task Tier	Example	Backend	Your Cost
`SIMPLE`	"What is the capital of France?"	Local Ollama (llama3.2:1b)	$0.00
`MEDIUM`	"Write a short cover letter"	Local Ollama (qwen2.5:7b)	$0.00
`COMPLEX`	"Refactor this 500-line module"	Claude Pro subscription	$0.00 (already paying)
`REASONING`	"Prove this math theorem step by step"	Claude Pro subscription	$0.00 (already paying)
Fallback	No local + no subscription	Groq/OpenAI API	~$0.002

Quick Start

pip install neuralbrok
neuralbrok setup    # Detect your GPU and generate config
neuralbrok start    # Start the gateway on http://localhost:8000

Point any OpenAI-compatible tool to http://localhost:8000/v1 with model=neuralbroker and you're done.

Features

🧠 Intelligent Routing (No Config Required)

15-dimension prompt scoring classifies every request in under 1ms — no external LLM needed for routing decisions
NeuralFit hardware scoring picks the best local model for your specific GPU and VRAM capacity
Virtual model name — set model=neuralbroker once, never touch it again

💸 Subscription Inheritance

Auto-discovers Claude Code OAuth sessions, Codex auth, and env-based API keys on startup
Inherited subscriptions are treated as zero marginal cost — they're preferred over paid API keys for high-tier tasks
Works with: Claude Pro/Max, GitHub Copilot (Codex), ChatGPT Plus

🖥️ Local-First

Ollama and llama.cpp supported out of the box
VRAM-aware: automatically avoids routing to local when VRAM is critically low
Models are ranked by NeuralFit composite score (quality, speed, context fit, hardware fit)

🔌 One-Command IDE Integration

neuralbrok setup claude-code   # Wires NeuralBroker into Claude Code
neuralbrok setup cursor        # Wires NeuralBroker into Cursor
neuralbrok setup codex         # Wires NeuralBroker into Codex CLI
neuralbrok setup cline         # Wires NeuralBroker into Cline (VS Code)

Supports 20+ tools: Claude Code, Cursor, Cline, GitHub Copilot, Gemini CLI, OpenCode, Warp, Codex, Amp, Kimi Code, Firebender, Windsurf, and more.

📡 MCP Server

NeuralBroker ships with an MCP server that exposes routing intelligence directly to Claude Code and Cursor:

neuralbrok mcp   # Start MCP server on stdio

Available MCP tools:

nb_route_preview — Preview routing tier for any prompt
nb_get_active_auth — See which subscriptions are currently discovered

Configuration

NeuralBroker auto-detects your hardware and generates a config on first run. The config lives at ~/.neuralbrok/config.yaml.

local_nodes:
  - name: local
    runtime: ollama
    host: localhost:11434

routing:
  default_mode: smart   # smart | cost | speed | fallback

# Optional: Specify which models are allowed for smart mode
# allowed_models:
#   - qwen2.5:7b
#   - llama3.2:3b

# Optional: Cloud fallback models (Ollama pull tags)
# ollama_cloud_models:
#   - claude-sonnet-4-5

Routing Modes

Mode	Behavior
`smart`	15-dim scoring decides local vs cloud per-request (default)
`cost`	Always prefer cheapest backend
`speed`	Always prefer lowest-latency backend
`fallback`	Try local first; spill to cloud only on failure

Subscription Discovery

NeuralBroker automatically scans for auth on startup. View what it found:

curl http://localhost:8000/nb/discovered

To disable auto-discovery:

NB_DISABLE_AUTO_DISCOVERY=1 neuralbrok start

API Reference

NeuralBroker is fully OpenAI-compatible.

# Chat completions — use "neuralbroker" to activate smart routing
curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "neuralbroker",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

# List available models
curl http://localhost:8000/v1/models

# Check routing stats
curl http://localhost:8000/nb/stats

# Last 500 routing decisions
curl http://localhost:8000/nb/routing-log

# Live hardware info
curl http://localhost:8000/nb/hardware

# Change routing mode at runtime
curl -X POST http://localhost:8000/nb/mode \
  -H "Content-Type: application/json" \
  -d '{"mode": "speed"}'

Supported Providers

Local	Cloud (API Key)	Cloud (Subscription Auto-Discovered)
Ollama	Groq	Claude Pro / Max (Claude Code)
llama.cpp	Together AI	GitHub Copilot (Codex)
LM Studio	OpenAI	ChatGPT Plus
	Anthropic API
	Gemini
	Mistral
	Perplexity
	DeepSeek
	+ 15 more

Observability

Dashboard: http://localhost:8000/dashboard — Live routing log, VRAM gauge, per-provider stats
Prometheus: http://localhost:8000/metrics
Grafana: Pre-built dashboards in grafana/

Security Note

NeuralBroker inherits auth tokens from tools already installed and authenticated on your machine. It never sends your credentials to external services — tokens are used directly against their respective provider APIs. You remain in full control.

License

MIT © NeuralBroker contributors

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

khan-ash

These details have not been verified by PyPI

Release history Release notifications | RSS feed

3.0.2

May 15, 2026

3.0.1

May 4, 2026

3.0.0

May 4, 2026

2.1.2

May 1, 2026

2.1.1

May 1, 2026

2.1.0

May 1, 2026

2.0.9

May 1, 2026

2.0.7

May 1, 2026

2.0.6

May 1, 2026

2.0.5

May 1, 2026

2.0.4

May 1, 2026

2.0.0

May 1, 2026

This version

0.9.2

May 15, 2026

0.9.0

May 5, 2026

0.8.3

Apr 29, 2026

0.8.2

Apr 29, 2026

0.8.1

Apr 29, 2026

0.8.0

Apr 29, 2026

0.7.5

Apr 26, 2026

0.7.4

Apr 26, 2026

0.7.3

Apr 25, 2026

0.7.2

Apr 25, 2026

0.7.1

Apr 25, 2026

0.7.0

Apr 25, 2026

0.6.7

Apr 25, 2026

0.6.6

Apr 25, 2026

0.6.5

Apr 25, 2026

0.6.4

Apr 25, 2026

0.6.3

Apr 25, 2026

0.6.2

Apr 25, 2026

0.6.1

Apr 25, 2026

0.6.0

Apr 25, 2026

0.5.3

Apr 24, 2026

0.5.2

Apr 24, 2026

0.5.0

Apr 24, 2026

0.4.4

Apr 24, 2026

0.4.3

Apr 24, 2026

0.4.2

Apr 24, 2026

0.4.0

Apr 23, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

neuralbrok-0.9.2.tar.gz (134.8 kB view details)

Uploaded May 15, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

neuralbrok-0.9.2-py3-none-any.whl (150.7 kB view details)

Uploaded May 15, 2026 Python 3

File details

Details for the file neuralbrok-0.9.2.tar.gz.

File metadata

Download URL: neuralbrok-0.9.2.tar.gz
Upload date: May 15, 2026
Size: 134.8 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for neuralbrok-0.9.2.tar.gz
Algorithm	Hash digest
SHA256	`83b572e0d232b80d2d7d957cfd4532c628deb6c89b87f005a513748e9080e5ba`
MD5	`7dfb7d788f399f8cddd2f38dc8ceccb8`
BLAKE2b-256	`36267f58a6022579f8143e077c9e55834eea7baa04934b6883be119ad94c4781`

See more details on using hashes here.

Provenance

The following attestation bundles were made for neuralbrok-0.9.2.tar.gz:

Publisher: pypi-publish.yml on khan-sha/neuralbroker

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: neuralbrok-0.9.2.tar.gz
- Subject digest: 83b572e0d232b80d2d7d957cfd4532c628deb6c89b87f005a513748e9080e5ba
- Sigstore transparency entry: 1543825625
- Sigstore integration time: May 15, 2026
Source repository:
- Permalink: khan-sha/neuralbroker@9cc151cf316ed700fe9c408e5e602f6b757df1d8
- Branch / Tag: refs/tags/v0.9.2
- Owner: https://github.com/khan-sha
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: pypi-publish.yml@9cc151cf316ed700fe9c408e5e602f6b757df1d8
- Trigger Event: push

File details

Details for the file neuralbrok-0.9.2-py3-none-any.whl.

File metadata

Download URL: neuralbrok-0.9.2-py3-none-any.whl
Upload date: May 15, 2026
Size: 150.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for neuralbrok-0.9.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`03c81c81b0d9bc08621d95e8a69e8e53563be6700cefb5c8a2d1245bf930ea38`
MD5	`b8f95f29eaf7b52d455fa31ceec2f754`
BLAKE2b-256	`9e2dee247054594726772b3a8f22c1086db5959b77bbeac03f409a915e64a6d7`

See more details on using hashes here.

Provenance

The following attestation bundles were made for neuralbrok-0.9.2-py3-none-any.whl:

Publisher: pypi-publish.yml on khan-sha/neuralbroker

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: neuralbrok-0.9.2-py3-none-any.whl
- Subject digest: 03c81c81b0d9bc08621d95e8a69e8e53563be6700cefb5c8a2d1245bf930ea38
- Sigstore transparency entry: 1543825728
- Sigstore integration time: May 15, 2026
Source repository:
- Permalink: khan-sha/neuralbroker@9cc151cf316ed700fe9c408e5e602f6b757df1d8
- Branch / Tag: refs/tags/v0.9.2
- Owner: https://github.com/khan-sha
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: pypi-publish.yml@9cc151cf316ed700fe9c408e5e602f6b757df1d8
- Trigger Event: push

neuralbrok 0.9.2

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

NeuralBroker

How it works

The 3-Tier Cost Strategy

Quick Start

Features

🧠 Intelligent Routing (No Config Required)

💸 Subscription Inheritance

🖥️ Local-First

🔌 One-Command IDE Integration

📡 MCP Server

Configuration

Routing Modes

Subscription Discovery

API Reference

Supported Providers

Observability

Security Note

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance