Skip to main content

An MCP server that offloads cheap work from your cloud LLM agent to a local Ollama model.

Project description

ollama-handoff

An MCP server that offloads cheap work from your cloud LLM agent to a local Ollama model.

CI Python MCP License: MIT

Your frontier model (Claude, GPT, etc.) is brilliant and metered. A lot of the work it gets handed — summarizing a log, drafting a commit message, pulling every URL out of a file, a quick first-pass code review — doesn't need frontier reasoning at all. ollama-handoff exposes your local Ollama instance as a handful of purpose-built MCP tools, so your agent can route that work to a model on your own GPU — at zero cloud cost — and spend its (paid) reasoning budget on the things that actually need it.

This isn't a generic "wrap the Ollama API" server. Each tool ships with a baked-in system prompt and a description written for the calling agent, so the agent knows when to hand off and gets a tuned result back without re-stating instructions every call.


Why you'd want this

  • 💸 Spend less. Routine offloads run locally and bill nothing.
  • Keep the big model focused. Summaries, extractions, and drafts don't eat its context or your budget.
  • 🧠 Tuned, not raw. summarize_local, code_review_local, draft_commit_message_local, and extract_local come with reviewer/summarizer/extractor system prompts already dialed in.
  • 🔌 Drop-in. One MCP registration; works with Claude Code, Claude Desktop, Cursor, and any MCP client.
  • 🪶 Tiny & auditable. Two dependencies (mcp, httpx), fully typed, unit-tested, no telemetry.

Requirements

  • Ollama running locally (ollama serve) with at least one model pulled, e.g. ollama pull qwen2.5-coder:14b.
  • Python 3.11+ (or just uvx, which manages it for you).

Install

The fastest path is uv — no manual venv needed. Run straight from the repo:

uvx --from git+https://github.com/Michael-WhiteCapData/ollama-handoff ollama-handoff

📦 A PyPI release is on the way; once published, uvx ollama-handoff and pip install ollama-handoff will work directly.

Claude Code

claude mcp add ollama-handoff -- uvx --from git+https://github.com/Michael-WhiteCapData/ollama-handoff ollama-handoff

Claude Desktop / Cursor (mcp config block)

{
  "mcpServers": {
    "ollama-handoff": {
      "command": "uvx",
      "args": [
        "--from",
        "git+https://github.com/Michael-WhiteCapData/ollama-handoff",
        "ollama-handoff"
      ],
      "env": {
        "OLLAMA_DEFAULT_MODEL": "qwen2.5-coder:14b"
      }
    }
  }
}

Tools

Tool What it does When the agent should reach for it
ask_local One-shot prompt to the local model Any handoff that doesn't need frontier reasoning
chat_local Multi-turn local chat Handoffs needing more than one turn of context
summarize_local Structured summary (headline + bullets) Long files, logs, transcripts, docs
code_review_local Quick first-pass review of a diff/code Cheap pre-filter before a deep review
draft_commit_message_local Conventional commit message from a diff Routine commits
extract_local Pull structured items from unstructured text URLs, function names, error codes, TODOs
list_models List locally available Ollama models Discovery / choosing a model
server_info Report the effective configuration Debugging setup

Configuration

All configuration is via environment variables set in your MCP registration:

Variable Default Description
OLLAMA_URL http://localhost:11434 Base URL of the Ollama server
OLLAMA_DEFAULT_MODEL qwen2.5-coder:14b Default model for handoffs
OLLAMA_NUM_CTX 32768 Context window in tokens
OLLAMA_KEEP_ALIVE 30m How long to keep the model resident in VRAM
OLLAMA_TIMEOUT_S 600 Per-request timeout, seconds

Example

Once registered, you don't call the tools yourself — your agent does. A typical exchange:

You: Summarize the errors in build.log and draft a commit for the staged fix.

Agent: (calls summarize_local(build.log, focus="errors and stack traces") and draft_commit_message_local(git diff --staged) — both run on your GPU, nothing billed) → returns the summary + commit message.

Development

git clone https://github.com/Michael-WhiteCapData/ollama-handoff
cd ollama-handoff
uv pip install -e ".[dev]"
ruff check .
pytest          # tests use httpx.MockTransport — no running Ollama required

See CONTRIBUTING.md. Contributions welcome — especially new specialized handoff tools.

License

MIT © Michael Tierney

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ollama_handoff-0.1.0.tar.gz (12.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ollama_handoff-0.1.0-py3-none-any.whl (10.6 kB view details)

Uploaded Python 3

File details

Details for the file ollama_handoff-0.1.0.tar.gz.

File metadata

  • Download URL: ollama_handoff-0.1.0.tar.gz
  • Upload date:
  • Size: 12.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.19 {"installer":{"name":"uv","version":"0.11.19","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"26.04","id":"resolute","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for ollama_handoff-0.1.0.tar.gz
Algorithm Hash digest
SHA256 05364cf691ccce26f55d5dbf4995c97e698e362dfbe603d119141e7202a0d6c8
MD5 7c6a756c25fe4c0a022efd26d1b5b4a3
BLAKE2b-256 e1c65db3c11ea683aa5ea1ed48b46b8a7f7f9907359090d200a3dd2e08267c3c

See more details on using hashes here.

File details

Details for the file ollama_handoff-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: ollama_handoff-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 10.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.19 {"installer":{"name":"uv","version":"0.11.19","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"26.04","id":"resolute","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for ollama_handoff-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c4b5f94caabc018d27394221873205399cd58b403563fb52d40a17f7bc8d562f
MD5 717655d70eca1d3e8241434099954284
BLAKE2b-256 00b8643342dce45b6fef791cbaabc80f260e0b41d1d0dfe0f4bf6fff7a44bdb2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page