Privacy-aware, local-first router across CLI coding agents (Codex, Claude Code) and local LLMs (Ollama).
Project description
Switchboard
A privacy-aware, local-first router across your CLI coding agents and local LLMs.
Why this matters now. In mid-2026, employers started rationing AI: Uber capped engineers at $1,500/month per AI coding tool after burning its 2026 AI budget in four months (Bloomberg); Microsoft is moving its Experiences + Devices org off Claude Code to GitHub Copilot CLI (Windows Central). A spend cap is a blunt instrument: it throttles your best engineers and does nothing about proprietary code leaving for third-party models. The structural fix is routing, not rationing — a thin local-first layer that sends only what's worth it to a premium model, keeps sensitive work on-device, and compresses context.
Switchboard is a reference implementation of that pattern. On a 100-case benchmark it kept 62% of requests off premium agents (38% premium usage) at near-premium quality, full coverage, and zero measured privacy leaks (see the benchmark below). It is not (yet) an enterprise product — it's the smallest honest proof that the pattern works, with a reproducible benchmark to back it.
Install · How it works · Context · Proof · Privacy · Paper · Docs
Switchboard routes prompts to the right model while preserving context through local semantic memory and context compression, keeping sensitive work local, and reducing unnecessary premium-model usage.
It's built for the single-workstation setup where the scarce resources aren't dollars-per-token but subscription quota, privacy, and a pile of heterogeneous agent interfaces.
What it does
- Routes across local Ollama models, the Codex CLI, and Claude Code — deterministic rules first, with optional tiny learned classifiers for recall.
- Private mode — a deterministic keyword/PII/secret-format floor blocks sensitive prompts from ever reaching a subscription backend, even on fallback.
- Grounds answers with deterministic tools (time/date, safe calculator, unit conversion, keyless live stock & news) instead of letting a model guess.
- Carries context across backend switches: recent user, assistant, and tool turns are assembled into one redacted session prompt.
- Compresses long context with a Headroom-inspired layer; the model-boundary pass only summarizes recent conversation, while trusted facts, retrieved memory, and the current request survive intact.
- Remembers across backends via local embedding-based semantic memory, with SQLite search available for direct memory lookup.
- Explains every decision and records metadata-only telemetry (no prompt/response bodies).
- Ships its own evaluation — a 100-case quality benchmark, a local LLM-as-judge, and a multi-run statistical harness.
How it works
UI / CLI ──► Session manager (shared history across all backends)
│
▼
Capability detector (regex) ◄──► deterministic tools
│ (learned tool dispatcher recovers misses; tool verifies)
▼
Privacy floor (keywords + PII + secret formats — a match is FINAL)
│ (learned sensitivity escalator may only ADD protection)
▼
Deterministic policy ← always wins; unknown ⇒ local
│ (learned router supplies recall: tool / local / coding / reasoning)
▼
Context builder + redaction ◄── semantic memory
│
▼
Compression (metadata + history-only context pass)
│
▼
Ollama (default) │ Codex (coding) │ Claude Code (reasoning)
│
▼
Response sanitizer ─► metadata-only telemetry
The organizing invariant: deterministic policy always precedes and overrides the learned components. Privacy, tool grounding, forced selection, and fallback keep working even when the local model runtime — and therefore every learned component — is down.
Get started (60 seconds)
pip install switchboard-local
# point it at a local model runtime (install Ollama, then pull a small model)
ollama pull llama3.2:3b
# sanity-check your setup
switchboard doctor
# ask — Switchboard routes it, grounds it, and tells you why
switchboard ask "summarize this error log and suggest a fix"
# see the routing decision without running anything
switchboard route "refactor the auth module and add tests"
# prefer your browser? launch the local web UI, then open http://127.0.0.1:8080/ui
switchboard ui
Requires Python 3.11+. Codex / Claude Code backends are optional — without them, everything routes locally. See docs/usage.md.
Context, memory, and tokens
Switchboard has two user-facing CLI surfaces:
switchboard route ...previews the same core backend decision without calling a model.- The web UI, bare
switchboard ask ..., andswitchboard ask --backend auto ...use the stateful core workflow: shared sessions, model switching, semantic-memory retrieval, context-boundary compression, and backend telemetry all run on the same path.
Example stateful CLI session:
switchboard ask --backend auto --new-session "Remember: prefer local models for private notes."
switchboard ask --backend auto --session <session_id> --memory "What should you remember?"
Long prompts and long sessions record token estimates and savings metadata. The request-level pass can shorten an oversized raw prompt; the context-boundary pass then compresses only <recent_conversation>. The <trusted_facts>, <long_term_memory>, and <current_user_request> blocks are protected from that second pass so grounding and intent are not traded away for token budget.
Memory is local. switchboard memory add stores the item in SQLite and, when semantic_memory_enabled is on and Ollama can serve nomic-embed-text, indexes an embedding for cross-backend retrieval. switchboard memory search works as local text search even when embeddings are unavailable.
Details: docs/context-memory-compression.md.
Proof
A 100-case benchmark across five task categories (coding, reasoning, summarization, private, grounding), run on real backends and judged by a local model, over multiple independent runs (means shown; full per-condition numbers, confidence intervals, and significance tests are in the paper):
| Policy | Quality (1–5) | Premium usage | Privacy leaks | Answered |
|---|---|---|---|---|
| always-local | 3.4 | 0% | 0 | 100% |
| rules | 3.8 | 27% | 0 | 100% |
| hybrid | 3.9 | 28% | 0 | 100% |
| learned | 4.1 | 38% | 0 | 100% |
| always-premium | 4.6 | 100% | 0 | 61%¹ |
¹ The "just use the premium agent for everything" baseline must block every sensitive prompt to stay leak-free, so its coverage collapses — exactly the gap Switchboard closes. Zero measured leaks in every condition and every run.
These numbers come from a real-backend benchmark whose full harness travels with the paper's reproduction bundle on Zenodo.
Privacy
Switchboard is local-first and privacy-aware by construction:
- The deterministic privacy floor runs before any non-local routing; a positive verdict is final and cannot be overridden by a learned component or by prompt wording.
- Secret-format detection (cloud keys, JWTs, PEM blocks, env credentials) shares its patterns with context redaction, so the routing boundary and the redactor can't drift apart.
- Metadata-only telemetry — prompt and response bodies are not stored by default.
- Semantic-memory embeddings and the eval judge run locally.
Switchboard deliberately does not resell API access, scrape web UIs, or bypass provider limits — subscription CLIs are invoked exactly as the authenticated user could invoke them, in read-only sandbox modes. See SECURITY.md and docs/privacy.md.
What's inside
- Deterministic router — keyword rules; unknown prompts default local-first.
- Learned router / tool dispatcher / sensitivity escalator — tiny softmax classifiers over a locally-computed embedding (~50 ms, pure-Python inference), each retrainable in seconds from your own thumbs-down corrections behind golden-accuracy gates. They fail closed to the deterministic path.
- Tools — time/date with timezones, safe abstract-syntax-tree calculator, unit conversion, keyless live stock quotes & news.
- Compression — structure-aware, deterministic, dependency-free; preserves task header, code blocks, tracebacks, and grounded facts.
- Semantic memory —
nomic-embed-textembeddings, cosine retrieval, local memory commands, and SQLite text-search fallback for direct search. - Evaluation — mock evals (CI), real-backend smoke suite, 100-case quality benchmark, adversarial tester/developer dogfooding loop.
Configuration
Settings live in config/personal.yaml (ships with safe local-first defaults —
see config/personal.example.yaml). Highlights:
preferences:
router_mode: "learned" # rules | llm | hybrid | learned
private_mode: true # block sensitive prompts from non-local backends
allow_cloud: false
compression_enabled: true
compression_threshold_tokens: 1000
semantic_memory_enabled: true
semantic_memory_top_k: 3
finance_provider: "yahoo"
news_provider: "google_news_rss"
Provider API keys are referenced by environment-variable name (e.g.
OPENAI_API_KEY), never inline. See docs/overrides.md.
The paper
Switchboard is described in a preprint — "Privacy-Aware Hybrid Routing Across Heterogeneous AI Agents." The manuscript, the multi-run benchmark harness, the statistical-aggregation and figure scripts, and the per-case records are archived together as a reproduction bundle on Zenodo: 10.5281/zenodo.20836918.
This repository ships only the software. It deliberately does not carry the paper's experiment-running or figure-generation tooling — that lives with the archival record so the code stays focused on the router itself.
Development
make install # .venv + editable install with dev extras
make check # ruff + mypy + the full test suite
See CONTRIBUTING.md. Issues and PRs welcome — please preserve the privacy invariant described there.
Citing Switchboard
A preprint is available on Zenodo with a citable DOI — 10.5281/zenodo.20836918. See CITATION.cff for machine-readable metadata.
V. Gupta, "Switchboard: Privacy-Aware Hybrid Routing Across Heterogeneous AI Agents," Zenodo, 2026, doi:10.5281/zenodo.20836918.
License
MIT © 2026 Vinay Gupta
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file switchboard_local-0.2.0.tar.gz.
File metadata
- Download URL: switchboard_local-0.2.0.tar.gz
- Upload date:
- Size: 429.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
394200d2c130caf31f7ec30e782ffe86fcdbf34648390471a214f21dc2acc855
|
|
| MD5 |
e5ddaa1ace6fb698a8fd0e2d4e31f594
|
|
| BLAKE2b-256 |
314fa6d0582713270c84353dff8e331fd404617d1ea0260c1b2ab2baed916d31
|
Provenance
The following attestation bundles were made for switchboard_local-0.2.0.tar.gz:
Publisher:
release.yml on aivinay/switchboard
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
switchboard_local-0.2.0.tar.gz -
Subject digest:
394200d2c130caf31f7ec30e782ffe86fcdbf34648390471a214f21dc2acc855 - Sigstore transparency entry: 2008854625
- Sigstore integration time:
-
Permalink:
aivinay/switchboard@909693acc202d9340d892734f31561b8698efd39 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/aivinay
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@909693acc202d9340d892734f31561b8698efd39 -
Trigger Event:
release
-
Statement type:
File details
Details for the file switchboard_local-0.2.0-py3-none-any.whl.
File metadata
- Download URL: switchboard_local-0.2.0-py3-none-any.whl
- Upload date:
- Size: 392.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f4cfac0889ac9f60b16fe1ee0f9e025fd17d263558fcda44ee79895ffd725bf2
|
|
| MD5 |
956e217e440427aa9169aa16249588a1
|
|
| BLAKE2b-256 |
060b607941a571e48f94d528fe34b816a15d964a2a35736ddb621e3ac59e8923
|
Provenance
The following attestation bundles were made for switchboard_local-0.2.0-py3-none-any.whl:
Publisher:
release.yml on aivinay/switchboard
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
switchboard_local-0.2.0-py3-none-any.whl -
Subject digest:
f4cfac0889ac9f60b16fe1ee0f9e025fd17d263558fcda44ee79895ffd725bf2 - Sigstore transparency entry: 2008854732
- Sigstore integration time:
-
Permalink:
aivinay/switchboard@909693acc202d9340d892734f31561b8698efd39 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/aivinay
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@909693acc202d9340d892734f31561b8698efd39 -
Trigger Event:
release
-
Statement type: