switchboard-local

Privacy-aware, local-first router across CLI coding agents (Codex, Claude Code) and local LLMs (Ollama).

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

vingugup

These details have not been verified by PyPI

Project description

62% fewer premium-agent calls · 4.1/5 quality vs 4.6/5 always-premium · 0 benchmark leaks observed

Python 3.11+

Install · Evaluation · How it works · Privacy · Paper · Docs

Switchboard automatic routing demo

One session, three backends: local by default, Codex for code, Claude Code for reasoning.

Switchboard wraps the CLI tools you already use — no separate service, no proxy, no resold API access — and routes each prompt with deterministic rules before any learned classifier runs.

In its 100-case benchmark, Switchboard kept 62% of requests off premium agents while reaching 4.1/5 quality against a 4.6/5 always-premium baseline, with 100% answered and no benchmark leaks observed. See Evaluation for the numbers and reproduction bundle.

Use it when you want to:

Spend premium agent quota where it matters instead of sending every prompt to the most expensive backend.
Keep sensitive prompts local with a deterministic privacy floor that learned routing cannot override.
Switch backends mid-session without losing context — shared session history, semantic memory, and redaction travel with you across Ollama, Codex, and Claude Code.

What it does

Routes across local Ollama models, the Codex CLI, and Claude Code — deterministic rules first, with optional tiny learned classifiers for recall.
Private mode — a deterministic keyword/PII/secret-format floor blocks sensitive prompts from ever reaching a subscription backend, even on fallback.
Grounds answers with deterministic tools (time/date, safe calculator, unit conversion, keyless live stock & news) instead of letting a model guess.
Carries context across backend switches: recent user, assistant, and tool turns are assembled into one redacted session prompt.
Compresses long context with a Headroom-inspired layer; the model-boundary pass only summarizes recent conversation, while trusted facts, retrieved memory, and the current request survive intact.
Remembers across backends via local embedding-based semantic memory, with SQLite search available for direct memory lookup.
Explains every decision and records metadata-only telemetry (no prompt/response bodies).
Ships its own evaluation — a 100-case quality benchmark, a local LLM-as-judge, and a multi-run statistical harness.

How it works

  UI / CLI  ──►  Session manager (shared history across all backends)
                      │
                      ▼
              Capability detector (regex) ◄──► deterministic tools
                      │  (learned tool dispatcher recovers misses; tool verifies)
                      ▼
              Privacy floor  (keywords + PII + secret formats — a match is FINAL)
                      │  (learned sensitivity escalator may only ADD protection)
                      ▼
              Deterministic policy   ← always wins; unknown ⇒ local
                      │  (learned router supplies recall: tool / local / coding / reasoning)
                      ▼
              Context builder + redaction ◄── semantic memory
                      │
                      ▼
              Compression (metadata + history-only context pass)
                      │
                      ▼
        Ollama (default) │ Codex (coding) │ Claude Code (reasoning)
                      │
                      ▼
              Response sanitizer ─► metadata-only telemetry

The organizing invariant: deterministic policy always precedes and overrides the learned components. Privacy, tool grounding, forced selection, and fallback keep working even when the local model runtime — and therefore every learned component — is down.

Get started

pip install switchboard-local

# point it at a local model runtime (install Ollama, then pull a small model)
ollama pull llama3.2:3b

# sanity-check your setup
switchboard doctor

# ask — Switchboard routes it, grounds it, and tells you why
switchboard ask "summarize this error log and suggest a fix"

# see the routing decision without running anything
switchboard route "refactor the auth module and add tests"

# prefer your browser? launch the local web UI, then open http://127.0.0.1:8080/ui
switchboard ui

Requires Python 3.11+. Codex / Claude Code backends are optional — without them, everything routes locally. See docs/usage.md.

Context, memory, and tokens

Switchboard has two user-facing CLI surfaces:

switchboard route ... previews the same core backend decision without calling a model.
The web UI, bare switchboard ask ..., and switchboard ask --backend auto ... use the stateful core workflow: shared sessions, model switching, semantic-memory retrieval, context-boundary compression, and backend telemetry all run on the same path.

Example stateful CLI session:

switchboard ask --backend auto --new-session "Remember: prefer local models for private notes."
switchboard ask --backend auto --session <session_id> --memory "What should you remember?"

Long prompts and long sessions record token estimates and savings metadata. The request-level pass can shorten an oversized raw prompt; the context-boundary pass then compresses only <recent_conversation>. The <trusted_facts>, <long_term_memory>, and <current_user_request> blocks are protected from that second pass so grounding and intent are not traded away for token budget.

Memory is local. switchboard memory add stores the item in SQLite and, when semantic_memory_enabled is on and Ollama can serve nomic-embed-text, indexes an embedding for cross-backend retrieval. switchboard memory search works as local text search even when embeddings are unavailable.

Details: docs/context-memory-compression.md.

Evaluation

A 100-case benchmark across five task categories (coding, reasoning, summarization, private, grounding), run on real backends and judged by a local model, over multiple independent runs (means shown; full per-condition numbers, confidence intervals, and significance tests are in the paper):

Policy	Quality (1–5)	Premium usage	Answered
always-local	3.4	0%	100%
rules	3.8	27%	100%
hybrid	3.9	28%	100%
learned	4.1	38%	100%
always-premium	4.6	100%	61%¹

_{¹ The "just use the premium agent for everything" baseline must block every
sensitive prompt to stay leak-free, so its coverage collapses — exactly the gap
Switchboard closes. No benchmark leaks were observed in any condition or run.}

These numbers come from a real-backend benchmark whose full harness travels with the paper's reproduction bundle on Zenodo.

Context: why this exists (Uber, Microsoft, 2026)

Some employers have begun rationing AI coding-tool spend: Uber reportedly capped engineers at $1,500/month per AI tool after burning its 2026 AI budget in four months (Bloomberg); Microsoft's Experiences + Devices org reportedly moved off Claude Code to GitHub Copilot CLI (Windows Central).

A spend cap controls the invoice, but it does not decide which work actually needs a premium model or which prompts should never leave the machine. A better pattern is routing, not blanket rationing: decide request by request what belongs local, what needs a coding agent, and what is worth premium reasoning.

Switchboard is a reference implementation of that pattern for a single workstation. It is not yet an enterprise product; it is the smallest honest proof that local-first routing can work, with a reproducible benchmark to back it.

Privacy

Switchboard is local-first and privacy-aware by construction:

The deterministic privacy floor runs before any non-local routing; a positive verdict is final and cannot be overridden by a learned component or by prompt wording.
Secret-format detection (cloud keys, JWTs, PEM blocks, env credentials) shares its patterns with context redaction, so the routing boundary and the redactor can't drift apart.
Metadata-only telemetry — prompt and response bodies are not stored by default.
Semantic-memory embeddings and the eval judge run locally.

Switchboard deliberately does not resell API access, scrape web UIs, or bypass provider limits — subscription CLIs are invoked exactly as the authenticated user could invoke them, in read-only sandbox modes. See SECURITY.md and docs/privacy.md.

What's inside

Deterministic router — keyword rules; unknown prompts default local-first.
Learned router / tool dispatcher / sensitivity escalator — tiny softmax classifiers over a locally-computed embedding (~50 ms, pure-Python inference), each retrainable in seconds from your own thumbs-down corrections behind golden-accuracy gates. They fail closed to the deterministic path.
Tools — time/date with timezones, safe abstract-syntax-tree calculator, unit conversion, keyless live stock quotes & news.
Compression — structure-aware, deterministic, dependency-free; preserves task header, code blocks, tracebacks, and grounded facts.
Semantic memory — nomic-embed-text embeddings, cosine retrieval, local memory commands, and SQLite text-search fallback for direct search.
Evaluation — mock evals (CI), real-backend smoke suite, 100-case quality benchmark, adversarial tester/developer dogfooding loop.

Configuration

Settings live in config/personal.yaml (ships with safe local-first defaults — see config/personal.example.yaml). Highlights:

preferences:
  router_mode: "learned"      # rules | llm | hybrid | learned
  private_mode: true          # block sensitive prompts from non-local backends
  allow_cloud: false
  compression_enabled: true
  compression_threshold_tokens: 1000
  semantic_memory_enabled: true
  semantic_memory_top_k: 3
  claude_code_web_search: true  # allow Claude Code WebSearch for live-data fallback
  finance_provider: "yahoo"
  news_provider: "google_news_rss"

Provider API keys are referenced by environment-variable name (e.g. OPENAI_API_KEY), never inline. See docs/overrides.md.

The paper

Switchboard is described in a preprint — "Privacy-Aware Hybrid Routing Across Heterogeneous AI Agents." The manuscript, the multi-run benchmark harness, the statistical-aggregation and figure scripts, and the per-case records are archived together as a reproduction bundle on Zenodo: 10.5281/zenodo.20836918.

This repository ships only the software. It deliberately does not carry the paper's experiment-running or figure-generation tooling — that lives with the archival record so the code stays focused on the router itself.

Development

make install     # .venv + editable install with dev extras
make check       # ruff + mypy + the full test suite

See CONTRIBUTING.md. Issues and PRs welcome — please preserve the privacy invariant described there.

Citing Switchboard

A preprint is available on Zenodo with a citable DOI — 10.5281/zenodo.20836918. See CITATION.cff for machine-readable metadata.

V. Gupta, "Switchboard: Privacy-Aware Hybrid Routing Across Heterogeneous AI Agents," Zenodo, 2026, doi:10.5281/zenodo.20836918.

License

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

vingugup

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.2.2

Jun 29, 2026

This version

0.2.1

Jun 29, 2026

0.2.0

Jun 29, 2026

0.1.1

Jun 24, 2026

0.1.0

Jun 24, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

switchboard_local-0.2.1.tar.gz (432.0 kB view details)

Uploaded Jun 29, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

switchboard_local-0.2.1-py3-none-any.whl (393.6 kB view details)

Uploaded Jun 29, 2026 Python 3

File details

Details for the file switchboard_local-0.2.1.tar.gz.

File metadata

Download URL: switchboard_local-0.2.1.tar.gz
Upload date: Jun 29, 2026
Size: 432.0 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for switchboard_local-0.2.1.tar.gz
Algorithm	Hash digest
SHA256	`7b991752eff82271dbd78eb2c770d99e6de7bf1d24e643e6c4723ea6266af662`
MD5	`9f3a8c927a655315c2b61ef5419d7d36`
BLAKE2b-256	`34e57d438a4b6125217fd6bf6e12cd11e69871a787af6c7e6113b8538c8ecad3`

See more details on using hashes here.

Provenance

The following attestation bundles were made for switchboard_local-0.2.1.tar.gz:

Publisher: release.yml on aivinay/switchboard

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: switchboard_local-0.2.1.tar.gz
- Subject digest: 7b991752eff82271dbd78eb2c770d99e6de7bf1d24e643e6c4723ea6266af662
- Sigstore transparency entry: 2013066185
- Sigstore integration time: Jun 29, 2026
Source repository:
- Permalink: aivinay/switchboard@18c91edabf6b90e75aad4dbeb2de4e65fb71195f
- Branch / Tag: refs/tags/v0.2.1
- Owner: https://github.com/aivinay
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@18c91edabf6b90e75aad4dbeb2de4e65fb71195f
- Trigger Event: release

File details

Details for the file switchboard_local-0.2.1-py3-none-any.whl.

File metadata

Download URL: switchboard_local-0.2.1-py3-none-any.whl
Upload date: Jun 29, 2026
Size: 393.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for switchboard_local-0.2.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`e9b93bccd2205d37f5c5e1294c2597ed2cba50e8823f7b1c027923b30688e26d`
MD5	`1224872eab96e4421f854a00f2211aab`
BLAKE2b-256	`5950bbabd4fd858a405a7193da100084ba07af53f2b8e82473d16cbf514995f1`

See more details on using hashes here.

Provenance

The following attestation bundles were made for switchboard_local-0.2.1-py3-none-any.whl:

Publisher: release.yml on aivinay/switchboard

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: switchboard_local-0.2.1-py3-none-any.whl
- Subject digest: e9b93bccd2205d37f5c5e1294c2597ed2cba50e8823f7b1c027923b30688e26d
- Sigstore transparency entry: 2013066243
- Sigstore integration time: Jun 29, 2026
Source repository:
- Permalink: aivinay/switchboard@18c91edabf6b90e75aad4dbeb2de4e65fb71195f
- Branch / Tag: refs/tags/v0.2.1
- Owner: https://github.com/aivinay
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@18c91edabf6b90e75aad4dbeb2de4e65fb71195f
- Trigger Event: release

switchboard-local 0.2.1

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

What it does

How it works

Get started

Context, memory, and tokens

Evaluation

Context: why this exists (Uber, Microsoft, 2026)

Privacy

Configuration

The paper

Development

Citing Switchboard

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance