Skip to main content

Privacy-aware, local-first router across CLI coding agents (Codex, Claude Code) and local LLMs (Ollama).

Project description

Switchboard

A privacy-aware, local-first router across your CLI coding agents and local LLMs.

CI PyPI: pending Python 3.11+ License: MIT DOI

Install · How it works · Context · Proof · Privacy · Paper · Docs


Switchboard routes prompts to the right model while preserving context through local semantic memory and context compression, keeping sensitive work local, and reducing unnecessary premium-model usage.

It's built for the single-workstation setup where the scarce resources aren't dollars-per-token but subscription quota, privacy, and a pile of heterogeneous agent interfaces.

What it does

  • Routes across local Ollama models, the Codex CLI, and Claude Code — deterministic rules first, with optional tiny learned classifiers for recall.
  • Private mode — a deterministic keyword/PII/secret-format floor blocks sensitive prompts from ever reaching a subscription backend, even on fallback.
  • Grounds answers with deterministic tools (time/date, safe calculator, unit conversion, keyless live stock & news) instead of letting a model guess.
  • Carries context across backend switches: recent user, assistant, and tool turns are assembled into one redacted session prompt.
  • Compresses long context with a Headroom-inspired layer; the model-boundary pass only summarizes recent conversation, while trusted facts, retrieved memory, and the current request survive intact.
  • Remembers across backends via local embedding-based semantic memory, with SQLite search available for direct memory lookup.
  • Explains every decision and records metadata-only telemetry (no prompt/response bodies).
  • Ships its own evaluation — a 100-case quality benchmark, a local LLM-as-judge, and a multi-run statistical harness.

How it works

  UI / CLI  ──►  Session manager (shared history across all backends)
                      │
                      ▼
              Capability detector (regex) ◄──► deterministic tools
                      │  (learned tool dispatcher recovers misses; tool verifies)
                      ▼
              Privacy floor  (keywords + PII + secret formats — a match is FINAL)
                      │  (learned sensitivity escalator may only ADD protection)
                      ▼
              Deterministic policy   ← always wins; unknown ⇒ local
                      │  (learned router supplies recall: tool / local / coding / reasoning)
                      ▼
              Context builder + redaction ◄── semantic memory
                      │
                      ▼
              Compression (metadata + history-only context pass)
                      │
                      ▼
        Ollama (default) │ Codex (coding) │ Claude Code (reasoning)
                      │
                      ▼
              Response sanitizer ─► metadata-only telemetry

The organizing invariant: deterministic policy always precedes and overrides the learned components. Privacy, tool grounding, forced selection, and fallback keep working even when the local model runtime — and therefore every learned component — is down.

Get started (60 seconds)

PyPI release is pending. Until the first release is published:

pip install "git+https://github.com/aivinay/switchboard.git"

After the PyPI release:

pip install switchboard-local
# point it at a local model runtime (install Ollama, then pull a small model)
ollama pull llama3.2:3b

# sanity-check your setup
switchboard doctor

# ask — Switchboard routes it, grounds it, and tells you why
switchboard ask "summarize this error log and suggest a fix"

# see the routing decision without running anything
switchboard route "refactor the auth module and add tests"

# prefer your browser? launch the local web UI, then open http://127.0.0.1:8080/ui
switchboard ui

Requires Python 3.11+. Codex / Claude Code backends are optional — without them, everything routes locally. See docs/usage.md.

Context, memory, and tokens

Switchboard has two user-facing CLI surfaces:

  • switchboard route ... and bare switchboard ask ... use the personal local-first route/call workflow.
  • The web UI and switchboard ask --backend auto ... use the stateful core workflow: shared sessions, model switching, semantic-memory retrieval, context-boundary compression, and backend telemetry all run on the same path.

Example stateful CLI session:

switchboard ask --backend auto --new-session "Remember: prefer local models for private notes."
switchboard ask --backend auto --session <session_id> --memory "What should you remember?"

Long prompts and long sessions record token estimates and savings metadata. The request-level pass can shorten an oversized raw prompt; the context-boundary pass then compresses only <recent_conversation>. The <trusted_facts>, <long_term_memory>, and <current_user_request> blocks are protected from that second pass so grounding and intent are not traded away for token budget.

Memory is local. switchboard memory add stores the item in SQLite and, when semantic_memory_enabled is on and Ollama can serve nomic-embed-text, indexes an embedding for cross-backend retrieval. switchboard memory search works as local text search even when embeddings are unavailable.

Details: docs/context-memory-compression.md.

Proof

A 100-case benchmark across five task categories (coding, reasoning, summarization, private, grounding), run on real backends and judged by a local model, over multiple independent runs (means shown; full per-condition numbers, confidence intervals, and significance tests are in the paper):

Policy Quality (1–5) Premium usage Privacy leaks Answered
always-local 3.4 0% 0 100%
rules 3.8 27% 0 100%
hybrid 3.9 28% 0 100%
learned 4.1 38% 0 100%
always-premium 4.6 100% 0 61%¹

¹ The "just use the premium agent for everything" baseline must block every sensitive prompt to stay leak-free, so its coverage collapses — exactly the gap Switchboard closes. Zero measured leaks in every condition and every run.

These numbers come from a real-backend benchmark whose full harness travels with the paper's reproduction bundle on Zenodo.

Privacy

Switchboard is local-first and privacy-aware by construction:

  • The deterministic privacy floor runs before any non-local routing; a positive verdict is final and cannot be overridden by a learned component or by prompt wording.
  • Secret-format detection (cloud keys, JWTs, PEM blocks, env credentials) shares its patterns with context redaction, so the routing boundary and the redactor can't drift apart.
  • Metadata-only telemetry — prompt and response bodies are not stored by default.
  • Semantic-memory embeddings and the eval judge run locally.

Switchboard deliberately does not resell API access, scrape web UIs, or bypass provider limits — subscription CLIs are invoked exactly as the authenticated user could invoke them, in read-only sandbox modes. See SECURITY.md and docs/privacy.md.

What's inside
  • Deterministic router — keyword rules; unknown prompts default local-first.
  • Learned router / tool dispatcher / sensitivity escalator — tiny softmax classifiers over a locally-computed embedding (~50 ms, pure-Python inference), each retrainable in seconds from your own thumbs-down corrections behind golden-accuracy gates. They fail closed to the deterministic path.
  • Tools — time/date with timezones, safe abstract-syntax-tree calculator, unit conversion, keyless live stock quotes & news.
  • Compression — structure-aware, deterministic, dependency-free; preserves task header, code blocks, tracebacks, and grounded facts.
  • Semantic memorynomic-embed-text embeddings, cosine retrieval, local memory commands, and SQLite text-search fallback for direct search.
  • Evaluation — mock evals (CI), real-backend smoke suite, 100-case quality benchmark, adversarial tester/developer dogfooding loop.

Configuration

Settings live in config/personal.yaml (ships with safe local-first defaults — see config/personal.example.yaml). Highlights:

preferences:
  router_mode: "learned"      # rules | llm | hybrid | learned
  private_mode: true          # block sensitive prompts from non-local backends
  allow_cloud: false
  compression_enabled: true
  compression_threshold_tokens: 1000
  semantic_memory_enabled: true
  semantic_memory_top_k: 3
  finance_provider: "yahoo"
  news_provider: "google_news_rss"

Provider API keys are referenced by environment-variable name (e.g. OPENAI_API_KEY), never inline. See docs/overrides.md.

The paper

Switchboard is described in a preprint — "Privacy-Aware Hybrid Routing Across Heterogeneous AI Agents on a Single Workstation." The manuscript, the multi-run benchmark harness, the statistical-aggregation and figure scripts, and the per-case records are archived together as a reproduction bundle on Zenodo: 10.5281/zenodo.20789935.

This repository ships only the software. It deliberately does not carry the paper's experiment-running or figure-generation tooling — that lives with the archival record so the code stays focused on the router itself.

Development

make install     # .venv + editable install with dev extras
make check       # ruff + mypy + the full test suite

See CONTRIBUTING.md. Issues and PRs welcome — please preserve the privacy invariant described there.

Citing Switchboard

A preprint is available on Zenodo with a citable DOI — 10.5281/zenodo.20789935. See CITATION.cff for machine-readable metadata.

V. Gupta, "Switchboard: Privacy-Aware Hybrid Routing Across Heterogeneous AI Agents on a Single Workstation," Zenodo, 2026, doi:10.5281/zenodo.20789935.

License

MIT © 2026 Vinay Gupta

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

switchboard_local-0.1.0.tar.gz (419.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

switchboard_local-0.1.0-py3-none-any.whl (383.5 kB view details)

Uploaded Python 3

File details

Details for the file switchboard_local-0.1.0.tar.gz.

File metadata

  • Download URL: switchboard_local-0.1.0.tar.gz
  • Upload date:
  • Size: 419.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for switchboard_local-0.1.0.tar.gz
Algorithm Hash digest
SHA256 93603c13ed716d3c19b33fc72fa9858c87bd336e7ad691824b0bc292321b08be
MD5 cc7480b2f520320759e5e9de07266086
BLAKE2b-256 020879636494049e853785df3cb2167920b4a14b2c435431ff811ddb6730683a

See more details on using hashes here.

Provenance

The following attestation bundles were made for switchboard_local-0.1.0.tar.gz:

Publisher: release.yml on aivinay/switchboard

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file switchboard_local-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for switchboard_local-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c65748c05789274fd1d3a059d9d594fccde571bf52a7218c96a7970dcdb2e947
MD5 ed665ac15368376d52b746b5980ca556
BLAKE2b-256 df89c3d4bf623391b79f07ac5691f71dea775ebc239e4c7dde92463ec5f13bb7

See more details on using hashes here.

Provenance

The following attestation bundles were made for switchboard_local-0.1.0-py3-none-any.whl:

Publisher: release.yml on aivinay/switchboard

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page