Skip to main content

Local Ollama-backed MCP server — 45 tools, smart buffer, execution filters, persistent scratchpad, settings GUI

Project description

localthink-mcp

Local LLM context compression for Claude Code. Offloads large file queries and document processing to Ollama so they never burn Claude's context window.

v0.1.0 benchmarked at ~30× token savings on 16 KB file queries. v1.1 adds 13 new tools covering every major token-waste pattern. v1.2 adds pre-injection: local_improve_prompt and local_preplan run locally before Claude sees the task — sharpening prompts and scaffolding plans so Claude executes rather than guesses. v2.1 adds smart buffer, execution filters, session scratchpad, persistent notes, response refinement, and a disk-backed result cache — 14 new tools, 45 total.


Quick start

# 1. Pull a model (once)
ollama pull qwen2.5:14b-instruct-q4_K_M

# 2. Register with Claude Code
claude mcp add localthink -- uvx localthink-mcp

# 3. Verify
claude mcp list   # localthink → Connected

Requirements


All 45 tools

v0.1.0 — Core compression

Tool When to use
local_answer(file_path, question) Query a large file without loading it into context
local_summarize(text, focus?) Compress a large text blob already in context
local_extract(text, query) Pull only the cited passages you need from a document

v1.1 — New routes

File operations

Tool What it does
local_shrink_file(file_path, focus?) Read a file → return compressed content (not an answer). Hold the compressed version in context for repeated reference.
local_batch_answer(file_paths, question) Answer one question across many files in a single call. No files enter Claude's context.
local_scan_dir(dir_path, pattern, question?, max_files?) Walk a directory, summarize or query every matching file. Glob pattern support (**/*.ts, config/*.yaml).

Composition (fewer round-trips)

Tool What it does
local_pipeline(text, steps) Chain summarizeextractanswer in one call. Up to 5 steps. Eliminates back-and-forth for predictable multi-stage workflows.
local_auto(input, question?) Meta-tool: detects file path vs text, picks the right op, handles large docs with auto extract-then-answer. Zero decision overhead.

Stateful document chat

Tool What it does
local_chat(document, message, history?) Multi-turn Q&A. Document is compressed on first call and stays with Ollama. Claude holds only conversation history — the original doc never enters Claude's window.

Semantic & structural

Tool What it does
local_grep_semantic(file_path, meaning, max_results?) Find passages matching a concept, not a literal string. "Find where rate limiting is enforced" works even if the word "rate" isn't there.
local_outline(text) Structural table of contents with line ranges — no content returned. Use before local_extract to find the right section.
local_code_surface(file_path) Public API skeleton. Python: pure AST (no Ollama, instant). Other languages: fast LLM. Typically 5-10% of original size.

Analysis / meta

Tool What it does
local_classify(text) Classify content type + recommend the best tool. Returns JSON. Use for programmatic routing in hooks/scripts.
local_audit(file_path, checklist) Checklist-based file audit: PASS / FAIL / PARTIAL / N/A per item. File never enters Claude's context.
local_models() List local Ollama models and show current DEFAULT / FAST model config.

v1.2 — Pre-injection (run before Claude thinks)

These tools run a local model pass before Claude engages with a task. Claude never sees the raw input — only the pre-processed output. Eliminates waste at the source rather than compressing after the fact.

Tool What it does
local_improve_prompt(prompt, context?) Rewrite a vague or rough prompt into a clear, specific, unambiguous version. Claude receives only the sharpened result. Uses the fast model — minimal overhead.
local_preplan(task, context?, depth?) Generate a structured implementation plan (goal / assumptions / ordered steps / risks / open questions) via local model. Claude executes the scaffold rather than planning from scratch. depth: "quick" (3-5 steps), "standard" (default), "detailed" (sub-bullets + rationale).

local_improve_prompt example:

"make the auth faster"
→ local_improve_prompt(prompt, context="Next.js, JWT, DB bottleneck suspected")
→ "Optimise JWT validation latency in src/auth/middleware.ts — profile the verify()
   hot path, remove redundant DB round-trips, target p95 < 5 ms."
→ Feed that to Claude as the actual task

local_preplan example:

plan = local_preplan(
  task="add rate limiting to the API",
  context="Express.js, Redis available, routes in src/routes/",
  depth="standard"
)
# Returns: Goal / Assumptions / Steps with file paths / Risks / Open questions
# Then: "Execute this plan: <plan>"

v1.1 expansion — high-context compression + smart reading

High-context compression

Tool What it does
local_compress_log(file_path, level?, since?) Compress a log file to its essential signal. Groups repeated errors with counts, extracts key events, surfaces anomalies. Optional level (ERROR/WARN) and timestamp-prefix filters. Turns 5 MB logs into ~500-token summaries.
local_compress_stack_trace(text) Distil a stack trace (+ source context) to: root cause, failure point, 3-5 key frames, fix hint. Eliminates framework boilerplate that inflates traces to thousands of tokens.
local_compress_data(data, keep_fields?, question?) Compress JSON objects, CSV exports, and API responses. Strips nulls, samples large arrays, keeps IDs/status codes. REST responses commonly shrink 20:1.
local_session_compress(file_path) Recursive meta-tool. Compress a saved Claude conversation transcript to a re-entry briefing: context, decisions, current state, open items, constraints. The transcript never enters Claude's context.
local_prompt_compress(text) Compress a long CLAUDE.md or system prompt to its minimal directive set. Preserves every unique rule; removes duplicates and verbose prose.

Smart reading (avoid loading files at all)

Tool What it does
local_symbols(file_path) Full symbol table: every definition with type, line number, and one-line description. Replaces "read file to see what's in it."
local_find_impl(file_path, spec) Natural-language code search inside a file. Returns the complete matching logical unit with line numbers. E.g. spec="where JWT token is verified".
local_strip_to_skeleton(file_path) All function bodies → ..., everything else preserved (docstrings, decorators, type annotations, comments). Typically 30-50% of original.

Format transformation

Tool What it does
local_translate(text, target_format) Convert formats without loading source into context: json↔yaml↔toml, csv→markdown_table, code→pseudocode, sql→english, env→json.
local_schema_infer(data) Sample data → compact JSON Schema (draft-07). API samples are often 100:1 data-to-schema ratio.

Temporal & multi-file diff

Tool What it does
local_timeline(text) Chronological event sequence from logs, changelogs, git log, or incident reports. Deduplicates repeated events.
local_diff_files(path_a, path_b, focus?) Diff two files by path — neither file loaded into context. Counterpart to local_diff which takes in-context text.

v2.1 — Smart buffer, execution filters, scratchpad, notes, cache

Smart Buffer (raw output triage)

Tool What it does
local_gate(raw_output) Triage any raw output (test results, build logs, lint dumps) into Pattern + Anomalies + Signal. Always fits in budget. Use before injecting any raw tool output into context.
local_slice(file_path, offset_lines) Read a window of lines from a file at an offset. On-demand raw access when local_gate identifies a region worth inspecting.
local_diff_semantic(before, after) Meaning-level diff — noise (whitespace, formatting, minor rewording) suppressed. Only semantic changes surface.

Execution Filters (project tools → local LLM)

Tool What it does
local_run_tests() Run the project test suite. Returns only {failed, delta, pointer}. Nothing else enters context.
local_run_lint() Run the linter. Violations grouped by rule; passing rules suppressed.
local_run_build() Run the build. Returns root cause + affected symbols only.

Session Scratchpad (stateful decisions)

Tool What it does
local_memo_write(section, content) Write to a named scratchpad section: decisions, assumptions, pitfalls, open_questions. Auto-compacts beyond threshold.
local_memo_read() Read the full scratchpad as a distilled summary. Restore context mid-session without re-reading files.
local_memo_checkpoint() Freeze scratchpad into a RESUME_PROMPT string. Paste after /clear to continue with full context.

Persistent Notes (cross-session knowledge)

Tool What it does
local_note_write(category, content) Write a permanent note to disk (architecture, gotcha, pattern). Survives /clear and new sessions.
local_note_search(query) Full-text search across all persisted notes. Run at session start to surface relevant prior knowledge.

Response Quality & Cache

Tool What it does
local_refine(prompt, draft, instructions?) Post-process an LLM draft through a refinement pass. Optional instructions target tone, brevity, or accuracy.
local_cache_stats() Show cache hit/miss counts, entry count, and total disk usage.
local_cache_clear() Evict all cached results.
local_config() Open the settings GUI — configure all 18 settings across Ollama, Timeouts, Limits, Cache, and Memo. Saves to ~/.localthink-mcp/config.json and hot-reloads the running server.

Decision guide

Situation Tool
File > 5 KB, one specific question local_answer
File > 5 KB, need to reference it multiple times local_shrink_file
Text already in context, want to compress it local_summarize
"Find me the part about X" local_extract
Need to outline a doc before extracting local_outlinelocal_extract
Want to know what's in a code file local_symbols
Want to understand a code file's structure local_code_surface
Want the full file but bodies stripped local_strip_to_skeleton
"Find the function that does X" local_find_impl
Multi-step process on the same document local_pipeline
Unsure which tool to use local_auto
Multiple questions about the same large doc local_chat
Same question across 5+ files local_batch_answer
Understand what's in a directory local_scan_dir
"Find where X is handled" (concept search) local_grep_semantic
Security or quality checklist local_audit
Unsure of content type before processing local_classify
Large log file local_compress_log
Stack trace + source context local_compress_stack_trace
JSON / CSV / API response payload local_compress_data
Session too long, need to restart local_session_compress
CLAUDE.md grown too large local_prompt_compress
Need JSON as YAML (or any format swap) local_translate
Need a schema for sample data local_schema_infer
Need a timeline from a log or changelog local_timeline
Compare two files without loading them local_diff_files
Compare two in-context text blobs local_diff
Prompt is vague — sharpen before sending to Claude local_improve_prompt
Task is large — plan locally before Claude touches it local_preplan
Raw test/build/lint output about to enter context local_gate
local_gate flagged a specific region worth reading local_slice
Two text blobs — want only the meaningful diff local_diff_semantic
Run tests without dumping output into context local_run_tests
Run lint without dumping output into context local_run_lint
Run build without dumping output into context local_run_build
Want to record a decision or assumption mid-session local_memo_write
Resuming work, need to restore session context local_memo_read
About to /clear — want to resume with full context local_memo_checkpoint
Want to save a pattern or gotcha for future sessions local_note_write
Starting a session — check for relevant prior notes local_note_search
LLM draft needs a quality pass local_refine
Check or clear the result cache local_cache_stats / local_cache_clear
Change any setting via GUI local_config

local_pipeline examples

# Extract auth sections, then summarize for security review
local_pipeline(text=big_doc, steps=[
    {"op": "extract",   "query": "authentication and authorization"},
    {"op": "summarize", "focus": "security risks and gotchas"},
])

# Answer a question after narrowing to the relevant section
local_pipeline(text=api_docs, steps=[
    {"op": "extract",  "query": "rate limiting"},
    {"op": "answer",   "question": "what headers control retry behaviour?"},
])

local_chat example

# Turn 1 — document is compressed automatically
r = local_chat(full_doc, "What does this library do?", "")
# r["doc"]     = compressed version (hold this)
# r["history"] = conversation so far (hold this)
# r["answer"]  = the answer

# Turn 2 — pass compressed doc + history back
r = local_chat(r["doc"], "How do I configure auth?", r["history"])

# Turn 3
r = local_chat(r["doc"], "Show me the relevant config keys", r["history"])

Configuration

The easiest way to configure LocalThink is to call local_config from Claude Code — it opens a GUI that covers every setting below.

Settings are saved to ~/.localthink-mcp/config.json and applied automatically on the next server start.

Ollama

Env var Default Recommended
OLLAMA_BASE_URL http://localhost:11434 Change only if Ollama runs on a remote machine or non-default port
OLLAMA_MODEL qwen2.5:14b-instruct-q4_K_M Match your VRAM tier — see SETUP.md for the full table
OLLAMA_FAST_MODEL (same as MODEL) One tier smaller than the default (e.g. qwen2.5:7b if default is 14b). Used by classify, outline, translate, schema_infer
OLLAMA_TINY_MODEL (same as FAST) qwen2.5:3b or smaller. Used by trivial ops on small inputs

Timeouts

Env var Default Recommended
LOCALTHINK_TIMEOUT 360 360 for 14b models · 600 for 32b+ · 120 for 7b on fast GPU
LOCALTHINK_FAST_TIMEOUT 180 60180 — fast model calls should be quick
LOCALTHINK_TINY_TIMEOUT 60 Rarely needs changing
LOCALTHINK_HEALTH_TIMEOUT 2 Leave at 2 — this is just an Ollama ping
LOCALTHINK_CODE_SURFACE_TIMEOUT 600 Increase to 900 for large TS/Go/Rust files on slow hardware

Limits

Env var Default Recommended
LOCALTHINK_MAX_FILE_BYTES 200000 200000 (~200 KB) is right for most codebases · increase to 500000 for monorepos with giant files
LOCALTHINK_MAX_PIPELINE_STEPS 5 Leave at 5 unless you're building complex custom pipelines
LOCALTHINK_MAX_SCAN_FILES 20 Increase to 50100 for large directory scans; watch memory
LOCALTHINK_CLASSIFY_SAMPLE 8000 8000 chars is enough for most inputs — rarely needs changing
LOCALTHINK_MAX_CONCURRENCY 4 12 on low VRAM · 4 default · 68 if Ollama handles parallel slots well

Cache

Env var Default Recommended
LOCALTHINK_CACHE_DIR ~/.cache/localthink-mcp Change if the default drive is low on space
LOCALTHINK_CACHE_TTL_DAYS 30 7 if disk space is tight · 90 if you want long-lived results across projects

Memo / Notes

Env var Default Recommended
LOCALTHINK_MEMO_DIR ~/.localthink-mcp Point to a synced folder (Dropbox, OneDrive) to share notes across machines
LOCALTHINK_COMPACT_THRESHOLD 3000 1500 for faster reads · 5000 to preserve more raw content before auto-compact

Example: 3-tier model setup

{
  "mcpServers": {
    "localthink": {
      "env": {
        "OLLAMA_MODEL":      "qwen2.5:14b-instruct-q4_K_M",
        "OLLAMA_FAST_MODEL": "qwen2.5:7b-instruct-q4_K_M",
        "OLLAMA_TINY_MODEL": "qwen2.5:3b"
      }
    }
  }
}

Install options

uvx (recommended — zero setup)

claude mcp add localthink -- uvx localthink-mcp

pip

pip install localthink-mcp
claude mcp add localthink -- localthink-mcp

Windows — if uvx isn't on Claude's PATH

claude mcp add --transport stdio localthink -- cmd /c uvx localthink-mcp

Security

  • Local only — runs as a stdio child process, never exposed to the network.
  • local_answer / local_shrink_file / local_audit read any path your shell can access. Same trust level as Claude's built-in Read tool.
  • Ollama has no auth by default. Don't expose port 11434 to the internet.
  • No data leaves your machine. All inference is local.

Troubleshooting

[localthink] Ollama is not running

ollama serve
curl http://localhost:11434/api/tags

Slow responses Switch to a smaller model or set a fast model:

OLLAMA_MODEL=qwen2.5:7b-instruct claude

Windows: uvx not found Install uv, then retry. Or use cmd /c uvx fallback.


License

MIT © 2026 H3xabah

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

localthink_mcp-2.1.1.tar.gz (72.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

localthink_mcp-2.1.1-py3-none-any.whl (51.5 kB view details)

Uploaded Python 3

File details

Details for the file localthink_mcp-2.1.1.tar.gz.

File metadata

  • Download URL: localthink_mcp-2.1.1.tar.gz
  • Upload date:
  • Size: 72.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for localthink_mcp-2.1.1.tar.gz
Algorithm Hash digest
SHA256 aa16b6fa1482a2b608b68a24c009092c06f3540ea041b85f2fc92e89c2059500
MD5 3040b65250f4c9b2d337a6460c353846
BLAKE2b-256 d755fda0284e5ca1f6c1afed0865abd65fc3f76adf328fe94a23e19f0e8fa47a

See more details on using hashes here.

Provenance

The following attestation bundles were made for localthink_mcp-2.1.1.tar.gz:

Publisher: publish.yml on H3xabah/Localthink-MCP

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file localthink_mcp-2.1.1-py3-none-any.whl.

File metadata

  • Download URL: localthink_mcp-2.1.1-py3-none-any.whl
  • Upload date:
  • Size: 51.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for localthink_mcp-2.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 8554606443fe4653dda8e8bf815c7843af8e32248847adc9bbb2eae0071a6436
MD5 9452f384c5785b216f95f0ecb4295a3b
BLAKE2b-256 16d03b56bc5faba9d6d578a32b252e0cdcff4a0c6c12ebec6856a413e549cf65

See more details on using hashes here.

Provenance

The following attestation bundles were made for localthink_mcp-2.1.1-py3-none-any.whl:

Publisher: publish.yml on H3xabah/Localthink-MCP

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page