MCP server bridging Claude to local MLX LM (and any OpenAI-compatible backend)
Project description
mlx-mcp-server
A Model Context Protocol (MCP) server that gives Claude Code and Claude Desktop a set of tools to talk to a locally-running LLM. Optimised for oMLX and MLX LM on Apple Silicon, with support for any OpenAI-compatible backend (Ollama, LM Studio, etc.).
The idea: Claude stays Claude. Your local model becomes a tool Claude can call — fast, private, free, and clearly labelled 🏠 LOCAL in every response.
How it works
You (in Claude Code or Claude Desktop)
│
▼
Claude (Sonnet / your tier) ← still the primary AI
│
│ calls MCP tools when useful
▼
mlx-mcp-server (subprocess) ← this repo
│
│ HTTP POST /v1/chat/completions
▼
Your local LLM backend ← oMLX · MLX LM · Ollama · LM Studio
│
▼
Response with 🏠 LOCAL badge ← so you always know which model answered
Claude Code spawns mlx-mcp-server as a background subprocess at startup. The server sits idle until you — or Claude — explicitly invoke one of its tools. Nothing is routed automatically; you're always talking to real Claude unless a tool is called.
Quick install
macOS with Homebrew Python — use
uv(pip is blocked by PEP 668):uv tool install mlx-mcp-serverOther environments:
pip install mlx-mcp-server
oMLX (recommended on Apple Silicon)
# Add to Claude Code
mlx-mcp-server install --claude-code \
--base-url http://localhost:8000 \
--api-key YOUR_OMLX_KEY \
--model "Qwen3.6-35B-A3B-4bit"
# Add to Claude Desktop
mlx-mcp-server install \
--base-url http://localhost:8000 \
--api-key YOUR_OMLX_KEY \
--model "Qwen3.6-35B-A3B-4bit"
MLX LM
# Start the server first
mlx_lm.server --model mlx-community/Mistral-7B-Instruct-v0.3-4bit
# Then install (no API key needed, model auto-detected)
mlx-mcp-server install --claude-code --base-url http://localhost:8080
Ollama
ollama serve && ollama pull mistral
mlx-mcp-server install --claude-code \
--base-url http://localhost:11434 \
--model mistral
Restart Claude Code / Claude Desktop after installing.
Tools
These are the MCP tools Claude can call. You can invoke them directly by name in conversation, or ask Claude to use the local model for a specific task.
chat
Send a message to your local LLM and get a response.
# In Claude Code — just say it:
"Use the local model to write a SQL migration for adding a users table"
"Ask the local model to summarise this error log"
"Use local: write boilerplate for a new Go HTTP handler"
Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
message |
string | required | The prompt to send |
system_prompt |
string | "" |
Optional system prompt (overrides default) |
temperature |
float | 0.7 |
Sampling temperature |
max_tokens |
int | 512 |
Max response tokens |
top_p |
float | 1.0 |
Nucleus sampling |
top_k |
int | 0 |
Top-k sampling (0 = disabled) |
Response format:
🏠 LOCAL · Qwen3.6-35B-A3B-4bit
[model response here]
---
Tokens: 12 prompt + 48 completion = 60 total | 1.24s
quick_test
Run a predefined diagnostic prompt to benchmark your model and verify it's working.
quick_test math # 347 × 28 — tests reasoning + speed
quick_test hello # intro prompt — tests personality / identity
quick_test creative # two-sentence robot story — tests creativity
quick_test code_review # Python snippet review — tests code understanding
Response format:
Test: math
Prompt: What is 347 × 28? Show your working.
Response:
347 × 28 = 9,716 [working shown]
---
🏠 LOCAL · Qwen3.6-35B-A3B-4bit · 54.7 tok/s · 312 tokens · 5.71s
health_check
Verify your LLM backend is reachable and report what's loaded.
health_check
Response (oMLX):
{
"status": "ok",
"url": "http://localhost:8000",
"models_loaded": "1/2"
}
Response (unreachable):
{
"status": "unreachable",
"url": "http://localhost:8000",
"hint": "Make sure your LLM backend is running at http://localhost:8000."
}
list_models
List the models available on your backend.
list models
Response:
- Qwen3.5-27B-4bit
- Qwen3.6-35B-A3B-4bit
Configuration
Set via environment variables, or use the install command to write them automatically.
| Variable | Default | Description |
|---|---|---|
MLX_BASE_URL |
http://localhost:8080 |
Backend URL |
MLX_DEFAULT_MODEL |
"" |
Model name. If empty, auto-detected from /v1/models on first call |
MLX_API_KEY |
"" |
API key for secured backends (e.g. oMLX) |
MLX_TIMEOUT |
30 |
Request timeout in seconds |
Auto-detection
When MLX_DEFAULT_MODEL is not set, the server queries /v1/models on the first chat call and uses whatever model the backend reports. The result is cached for the session. This works well for single-model backends (MLX LM, Ollama). For oMLX with multiple configured models, set MLX_DEFAULT_MODEL explicitly since oMLX lists all configured models, not just the loaded one.
Install command reference
mlx-mcp-server install [options]
| Flag | Description |
|---|---|
--claude-code |
Target Claude Code (~/.claude/settings.json) instead of Claude Desktop |
--base-url URL |
Backend URL (default: http://localhost:8080) |
--model NAME |
Model name — optional, auto-detected if omitted |
--api-key KEY |
API key for secured backends |
--dry-run |
Print the config that would be written without touching any files |
Preview before writing:
mlx-mcp-server install --claude-code \
--base-url http://localhost:8000 \
--api-key mykey \
--model "Qwen3.6-35B-A3B-4bit" \
--dry-run
Update model without changing other settings (just re-run with new --model):
mlx-mcp-server install --claude-code \
--base-url http://localhost:8000 \
--api-key mykey \
--model "Qwen3.5-27B-4bit"
Manual config
If you prefer to edit the config file directly:
Claude Desktop — ~/Library/Application Support/Claude/claude_desktop_config.json (macOS)
Claude Code — ~/.claude/settings.json
{
"mcpServers": {
"mlx": {
"command": "mlx-mcp-server",
"env": {
"MLX_BASE_URL": "http://localhost:8000",
"MLX_DEFAULT_MODEL": "Qwen3.6-35B-A3B-4bit",
"MLX_API_KEY": "your-key-here"
}
}
}
}
Supported backends
| Backend | Platform | Default port | Notes |
|---|---|---|---|
| oMLX | macOS (Apple Silicon) | 8000 | Requires API key + explicit model name |
| MLX LM | macOS (Apple Silicon) | 8080 | No auth needed, model auto-detected |
| Ollama | macOS / Linux / Windows | 11434 | Set MLX_DEFAULT_MODEL to model name |
| LM Studio | macOS / Windows | 1234 | Enable "Local Server" in LM Studio |
oMLX-specific notes
oMLX is a native macOS GUI for running MLX models on Apple Silicon. A few quirks to know:
- Port: listens on
127.0.0.1:8000(not 8080) - API key required: set one in oMLX settings and pass it via
--api-key - Model field required: oMLX returns 422 if
modelis omitted from requests — always setMLX_DEFAULT_MODEL /healthendpoint: unauthenticated, returns engine pool info —health_checkuses this first- MoE models:
Qwen3.6-35B-A3B-4bitactivates only ~3B params per token — 5–6× faster than dense 27B models at the same quality level
Requirements
- Python 3.11+
- A running OpenAI-compatible LLM backend
Development
git clone https://github.com/deresolution20/mlx-mcp-server
cd mlx-mcp-server
# Install with dev dependencies
uv sync --dev
# Run tests
uv run pytest tests/ -v
# Install locally for testing
uv tool uninstall mlx-mcp-server 2>/dev/null
uv tool install . --no-cache
License
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mlx_mcp_server-0.2.0.tar.gz.
File metadata
- Download URL: mlx_mcp_server-0.2.0.tar.gz
- Upload date:
- Size: 91.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
18953e27646855bd5ae5909127f5b69f0f8e7fbbc06f0ae8cc204d8789aab3ca
|
|
| MD5 |
d1b9b6298a8ad700cbeef6db47db0544
|
|
| BLAKE2b-256 |
de03aa1738d91ebca4eb0a9bf0d81442c017625f7f703ce76b9cfc3e1a1a5ea7
|
Provenance
The following attestation bundles were made for mlx_mcp_server-0.2.0.tar.gz:
Publisher:
publish.yml on deresolution20/mlx-mcp-server
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
mlx_mcp_server-0.2.0.tar.gz -
Subject digest:
18953e27646855bd5ae5909127f5b69f0f8e7fbbc06f0ae8cc204d8789aab3ca - Sigstore transparency entry: 1767198209
- Sigstore integration time:
-
Permalink:
deresolution20/mlx-mcp-server@9e08db7f3653eb1db5272ca24f8f18a8fbdfad38 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/deresolution20
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@9e08db7f3653eb1db5272ca24f8f18a8fbdfad38 -
Trigger Event:
push
-
Statement type:
File details
Details for the file mlx_mcp_server-0.2.0-py3-none-any.whl.
File metadata
- Download URL: mlx_mcp_server-0.2.0-py3-none-any.whl
- Upload date:
- Size: 20.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fcf0f1af8fc6f26299b41ad166accbbf7d3791a4652f1266da19bea567632c95
|
|
| MD5 |
821b8df79ff56da10025e195a3b5d15b
|
|
| BLAKE2b-256 |
91b8b5c64d05ba1564607d21d9fb22358bcb463ca3e7e1986c89cc07b09afd0d
|
Provenance
The following attestation bundles were made for mlx_mcp_server-0.2.0-py3-none-any.whl:
Publisher:
publish.yml on deresolution20/mlx-mcp-server
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
mlx_mcp_server-0.2.0-py3-none-any.whl -
Subject digest:
fcf0f1af8fc6f26299b41ad166accbbf7d3791a4652f1266da19bea567632c95 - Sigstore transparency entry: 1767198627
- Sigstore integration time:
-
Permalink:
deresolution20/mlx-mcp-server@9e08db7f3653eb1db5272ca24f8f18a8fbdfad38 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/deresolution20
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@9e08db7f3653eb1db5272ca24f8f18a8fbdfad38 -
Trigger Event:
push
-
Statement type: