MCP server for accessing Ollama models on a remote GPU server via Model Context Protocol
Project description
ollama-almasrv-mcp
MCP (Model Context Protocol) server for accessing Ollama models on a remote GPU server.
Designed for ALMASRV (NVIDIA RTX PRO 6000 96GB VRAM) but works with any Ollama instance behind a compatible HTTP gateway.
Features
- 6 MCP tools: chat, think, embed, similarity, models, health
- 10 models: 4 local (llama3.3:70b, qwen3:32b, llama3.2:3b, mxbai-embed-large) + 6 cloud
- 1024-dim embeddings compatible with SQL Server 2025
VECTOR(1024) - Thinking models with reasoning traces (qwen3:32b, kimi-k2-thinking, glm-5, kimi-k2.5, minimax-m2.5)
- Configurable endpoints via environment variables
Quick Install
pip install ollama-almasrv-mcp
Setup with Claude Code
# Add to Claude Code (user-level, available everywhere)
claude mcp add --scope user ollama-almasrv -- ollama-almasrv-mcp
# Or with custom server URLs
claude mcp add --scope user \
-e ALMASRV_GATEWAY_URL=http://your-server:8030 \
-e ALMASRV_EMBED_URL=http://your-server:8031 \
ollama-almasrv -- ollama-almasrv-mcp
Available Tools
| Tool | Description |
|---|---|
ollama_chat |
Chat with any Ollama model (default: qwen3:32b) |
ollama_think |
Chat with thinking models that return reasoning traces |
ollama_embed |
Generate 1024-dim embedding vectors |
ollama_similarity |
Calculate cosine similarity between two texts |
ollama_models |
List all available models |
ollama_health |
Check gateway and embedding service health |
Configuration
| Environment Variable | Default | Description |
|---|---|---|
ALMASRV_GATEWAY_URL |
http://192.168.50.78:8030 |
Ollama gateway (chat/think/models) |
ALMASRV_EMBED_URL |
http://192.168.50.78:8031 |
Embedding service (embed/similarity) |
Requirements
- Python >= 3.10
- A running Ollama instance with a compatible HTTP gateway
- Gateway endpoints:
/chat,/models,/health - Embedding endpoints:
/embeddings,/similarity,/health
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ollama_almasrv_mcp-1.0.0.tar.gz.
File metadata
- Download URL: ollama_almasrv_mcp-1.0.0.tar.gz
- Upload date:
- Size: 4.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
168b61215897859da2a929f2a8e6cc549e7b69a93d76adf43be7e79589f9cce9
|
|
| MD5 |
c97fcea6f1e5046ab4557530702b4ce9
|
|
| BLAKE2b-256 |
8f48a6ef410615fb354af521fed60cbcb49152abb5ca7ea929859519f27af955
|
File details
Details for the file ollama_almasrv_mcp-1.0.0-py3-none-any.whl.
File metadata
- Download URL: ollama_almasrv_mcp-1.0.0-py3-none-any.whl
- Upload date:
- Size: 5.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
01ec119c6ceac61162c1739159b23432f00447834fcdf53a17dec5018c1dec08
|
|
| MD5 |
fdec2f62257a363f2c0f0aeefa75c502
|
|
| BLAKE2b-256 |
4bb0258671a6e90b5794eca49bec95f4ffa48f19d9d455c22aaae33e93b9c897
|