Skip to main content

An MCP server that offloads cheap work from your cloud LLM agent to a local Ollama model.

Project description

ollama-handoff

An MCP server that offloads cheap work from your cloud LLM agent to a local Ollama model.

CI PyPI Python MCP License: MIT

Your frontier model (Claude, GPT, etc.) is brilliant and metered. A lot of the work it gets handed — summarizing a log, drafting a commit message, pulling every URL out of a file, a quick first-pass code review — doesn't need frontier reasoning at all. ollama-handoff exposes your local Ollama instance as a handful of purpose-built MCP tools, so your agent can route that work to a model on your own GPU — at zero cloud cost — and spend its (paid) reasoning budget on the things that actually need it.

This isn't a generic "wrap the Ollama API" server. Each tool ships with a baked-in system prompt and a description written for the calling agent, so the agent knows when to hand off and gets a tuned result back without re-stating instructions every call.


Why you'd want this

  • 💸 Spend less. Routine offloads run locally and bill nothing.
  • Keep the big model focused. Summaries, extractions, and drafts don't eat its context or your budget.
  • 🧠 Tuned, not raw. summarize_local, code_review_local, draft_commit_message_local, and extract_local come with reviewer/summarizer/extractor system prompts already dialed in.
  • 🔌 Drop-in. One MCP registration; works with Claude Code, Claude Desktop, Cursor, and any MCP client.
  • 🪶 Tiny & auditable. Two dependencies (mcp, httpx), fully typed, unit-tested, no telemetry.

Requirements

  • Ollama running locally (ollama serve) with at least one model pulled, e.g. ollama pull qwen2.5-coder:14b.
  • Python 3.11+ (or just uvx, which manages it for you).

Install

The fastest path is uv — no manual venv needed:

uvx ollama-handoff          # run directly
# or
pip install ollama-handoff  # then run: ollama-handoff

Claude Code

claude mcp add ollama-handoff -- uvx ollama-handoff

Claude Desktop / Cursor (mcp config block)

{
  "mcpServers": {
    "ollama-handoff": {
      "command": "uvx",
      "args": ["ollama-handoff"],
      "env": {
        "OLLAMA_DEFAULT_MODEL": "qwen2.5-coder:14b"
      }
    }
  }
}

Tools

Tool What it does When the agent should reach for it
ask_local One-shot prompt to the local model Any handoff that doesn't need frontier reasoning
chat_local Multi-turn local chat Handoffs needing more than one turn of context
summarize_local Structured summary (headline + bullets) Long files, logs, transcripts, docs
code_review_local Quick first-pass review of a diff/code Cheap pre-filter before a deep review
draft_commit_message_local Conventional commit message from a diff Routine commits
extract_local Pull structured items from unstructured text URLs, function names, error codes, TODOs
list_models List locally available Ollama models Discovery / choosing a model
server_info Report the effective configuration Debugging setup

Configuration

All configuration is via environment variables set in your MCP registration:

Variable Default Description
OLLAMA_URL http://localhost:11434 Base URL of the Ollama server
OLLAMA_DEFAULT_MODEL qwen2.5-coder:14b Default model for handoffs
OLLAMA_NUM_CTX 32768 Context window in tokens
OLLAMA_KEEP_ALIVE 30m How long to keep the model resident in VRAM
OLLAMA_TIMEOUT_S 600 Per-request timeout, seconds

Example

Once registered, you don't call the tools yourself — your agent does. A typical exchange:

You: Summarize the errors in build.log and draft a commit for the staged fix.

Agent: (calls summarize_local(build.log, focus="errors and stack traces") and draft_commit_message_local(git diff --staged) — both run on your GPU, nothing billed) → returns the summary + commit message.

Development

git clone https://github.com/Michael-WhiteCapData/ollama-handoff
cd ollama-handoff
uv pip install -e ".[dev]"
ruff check .
pytest          # tests use httpx.MockTransport — no running Ollama required

See CONTRIBUTING.md. Contributions welcome — especially new specialized handoff tools.

License

MIT © Michael Tierney

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ollama_handoff-0.1.1.tar.gz (12.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ollama_handoff-0.1.1-py3-none-any.whl (10.6 kB view details)

Uploaded Python 3

File details

Details for the file ollama_handoff-0.1.1.tar.gz.

File metadata

  • Download URL: ollama_handoff-0.1.1.tar.gz
  • Upload date:
  • Size: 12.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.19 {"installer":{"name":"uv","version":"0.11.19","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"26.04","id":"resolute","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for ollama_handoff-0.1.1.tar.gz
Algorithm Hash digest
SHA256 2cdf9406a870f31d972fd27e8dc0e47c67d6614ea1f155cb887b8ae26c653bc6
MD5 e93a173932a8a8108f04055db23a5483
BLAKE2b-256 419b277ddf44d43f6da6a41e4e42a794e98c2da3d35858611fe7b75fdf263562

See more details on using hashes here.

File details

Details for the file ollama_handoff-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: ollama_handoff-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 10.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.19 {"installer":{"name":"uv","version":"0.11.19","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"26.04","id":"resolute","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for ollama_handoff-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 694b531f7207da2a44509b63bc2023ebafb7de15f5cc2e44044ac2630307c3a7
MD5 315ae4bc58d35a3177d5f084144d7692
BLAKE2b-256 55ee8ef608f08ce830bae79af0712be584e6460c353af0deef93af769271f032

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page