Multi-server MCP client for LLM tool orchestration
Project description
๐ง Casual MCP
Casual MCP is a Python framework for building, evaluating, and serving LLMs with tool-calling capabilities using Model Context Protocol (MCP). It includes:
- โ A multi-server MCP client using FastMCP
- โ Provider support for OpenAI and Ollama (powered by casual-llm)
- โ A recursive tool-calling chat loop
- โ Usage statistics tracking (tokens, tool calls, LLM calls)
- โ System prompt templating with Jinja2
- โ A basic API exposing a chat endpoint
โจ Features
- Plug-and-play multi-server tool orchestration
- OpenAI and Ollama LLM providers (via casual-llm)
- Usage statistics tracking (tokens, tool calls, LLM calls)
- Prompt templating with Jinja2
- Configurable via JSON
- CLI and API access
- Extensible architecture
๐ง Installation
Uv
uv add casual-mcp
Pip
pip install casual-mcp
Or for development:
git clone https://github.com/casualgenius/casual-mcp.git
cd casual-mcp
uv sync --group dev
๐งฉ System Prompt Templates
System prompts are defined as Jinja2 templates in the prompt-templates/ directory.
They are used in the config file to specify a system prompt to use per model.
This allows you to define custom prompts for each model โ useful when using models that do not natively support tools. Templates are passed the tool list in the tools variable.
# prompt-templates/example_prompt.j2
Here is a list of functions in JSON format that you can invoke:
[
{% for tool in tools %}
{
"name": "{{ tool.name }}",
"description": "{{ tool.description }}",
"parameters": {
{% for param_name, param in tool.inputSchema.items() %}
"{{ param_name }}": {
"description": "{{ param.description }}",
"type": "{{ param.type }}"{% if param.default is defined %},
"default": "{{ param.default }}"{% endif %}
}{% if not loop.last %},{% endif %}
{% endfor %}
}
}{% if not loop.last %},{% endif %}
{% endfor %}
]
โ๏ธ Configuration File (casual_mcp_config.json)
๐ See the Programmatic Usage section to build configs and messages with typed models.
The CLI and API can be configured using a casual_mcp_config.json file that defines:
- ๐ง Available models and their providers
- ๐งฐ Available MCP tool servers
- ๐งฉ Optional tool namespacing behavior
๐ธ Example
{
"models": {
"gpt-4.1": {
"provider": "openai",
"model": "gpt-4.1"
},
"lm-qwen-3": {
"provider": "openai",
"endpoint": "http://localhost:1234/v1",
"model": "qwen3-8b",
"template": "lm-studio-native-tools"
},
"ollama-qwen": {
"provider": "ollama",
"endpoint": "http://localhost:11434",
"model": "qwen2.5:7b-instruct"
}
},
"servers": {
"time": {
"command": "python",
"args": ["mcp-servers/time/server.py"]
},
"weather": {
"url": "http://localhost:5050/mcp"
}
}
}
๐น models
Each model has:
provider:"openai"or"ollama"model: the model name (e.g.,gpt-4.1,qwen2.5:7b-instruct)endpoint: optional custom endpoint- For OpenAI: custom OpenAI-compatible backends (e.g., LM Studio at
http://localhost:1234/v1) - For Ollama: defaults to
http://localhost:11434if not specified
- For OpenAI: custom OpenAI-compatible backends (e.g., LM Studio at
template: optional Jinja2 template name for custom system prompt formatting (useful for models without native tool support)
๐น servers
Servers can either be local (over stdio) or remote.
Local Config:
command: the command to run the server, e.gpython,npmargs: the arguments to pass to the server as a list, e.g["time/server.py"]- Optional:
env: for subprocess environments,system_promptto override server prompt
Remote Config:
url: the url of the mcp server- Optional:
transport: the type of transport,http,sse,streamable-http. Defaults tohttp
Environmental Variables
OPENAI_API_KEY: required when using theopenaiprovider (can be any string when using local OpenAI-compatible APIs)TOOL_RESULT_FORMAT: adjusts the format of tool results returned to the LLM- Options:
result,function_result,function_args_result - Default:
result
- Options:
MCP_TOOL_CACHE_TTL: tool cache TTL in seconds (default: 30, set to 0 for indefinite caching)LOG_LEVEL: logging level (default:INFO)
You can set them using export or by creating a .env file.
๐ CLI Reference
casual-mcp serve
Start the API server.
Options:
--host: Host to bind (default0.0.0.0)--port: Port to serve on (default8000)
casual-mcp servers
Loads the config and outputs the list of MCP servers you have configured.
Example Output
$ casual-mcp servers
โโโโโโโโโโโณโโโโโโโโโณโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโณโโโโโโ
โ Name โ Type โ Command / Url โ Env โ
โกโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฉ
โ math โ local โ mcp-servers/math/server.py โ โ
โ time โ local โ mcp-servers/time-v2/server.py โ โ
โ weather โ local โ mcp-servers/weather/server.py โ โ
โ words โ remote โ https://localhost:3000/mcp โ โ
โโโโโโโโโโโดโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโดโโโโโโ
casual-mcp models
Loads the config and outputs the list of models you have configured.
Example Output
$ casual-mcp models
โโโโโโโโโโโโโโโโโโโโโณโโโโโโโโโโโณโโโโโโโโโโโโโโโโโโโโโโโโโโโโณโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Name โ Provider โ Model โ Endpoint โ
โกโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฉ
โ lm-phi-4-mini โ openai โ phi-4-mini-instruct โ http://kovacs:1234/v1 โ
โ lm-hermes-3 โ openai โ hermes-3-llama-3.2-3b โ http://kovacs:1234/v1 โ
โ lm-groq โ openai โ llama-3-groq-8b-tool-use โ http://kovacs:1234/v1 โ
โ gpt-4o-mini โ openai โ gpt-4o-mini โ โ
โ gpt-4.1-nano โ openai โ gpt-4.1-nano โ โ
โ gpt-4.1-mini โ openai โ gpt-4.1-mini โ โ
โ gpt-4.1 โ openai โ gpt-4.1 โ โ
โโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโ
๐ง Programmatic Usage
You can import and use the core framework in your own Python code.
โ Exposed Interfaces
McpToolChat
Orchestrates LLM interaction with tools using a recursive loop.
Accepts any provider that implements the LLMProvider protocol from casual-llm. This means you can use casual-llm's built-in providers (OpenAI, Ollama) or create your own custom provider.
from casual_llm import LLMProvider, SystemMessage, UserMessage
from casual_mcp import McpToolChat
from casual_mcp.tool_cache import ToolCache
# provider can be any object implementing the LLMProvider protocol
tool_cache = ToolCache(mcp_client)
chat = McpToolChat(mcp_client, provider, system_prompt, tool_cache=tool_cache)
# Generate method to take user prompt
response = await chat.generate("What time is it in London?")
# Generate method with session
response = await chat.generate("What time is it in London?", "my-session-id")
# Chat method that takes list of chat messages
# note: system prompt ignored if sent in messages so no need to set
chat = McpToolChat(mcp_client, provider, tool_cache=tool_cache)
messages = [
SystemMessage(content="You are a cool dude who likes to help the user"),
UserMessage(content="What time is it in London?")
]
response = await chat.chat(messages)
# Get usage statistics from the last call
stats = chat.get_stats()
if stats:
print(f"Tokens used: {stats.tokens.total_tokens}")
print(f"Tool calls: {stats.tool_calls.total}")
print(f"LLM calls: {stats.llm_calls}")
Usage Statistics
After calling chat() or generate(), you can retrieve usage statistics via get_stats():
response = await chat.chat(messages)
stats = chat.get_stats()
# Token usage (accumulated across all LLM calls in the agentic loop)
stats.tokens.prompt_tokens # Input tokens
stats.tokens.completion_tokens # Output tokens
stats.tokens.total_tokens # Total (computed)
# Tool call stats
stats.tool_calls.by_tool # Dict of tool name -> call count, e.g. {"math_add": 2}
stats.tool_calls.by_server # Dict of server name -> call count, e.g. {"math": 2}
stats.tool_calls.total # Total tool calls (computed)
# LLM call count
stats.llm_calls # Number of LLM calls made (1 = no tools, 2+ = tool loop)
Stats are reset at the start of each new chat() or generate() call. Returns None if no calls have been made yet.
ProviderFactory
Instantiates LLM providers (from casual-llm) based on the selected model config.
from casual_mcp import ProviderFactory
provider_factory = ProviderFactory()
provider = provider_factory.get_provider("lm-qwen-3", model_config)
The factory returns an LLMProvider from casual-llm that can be used with McpToolChat.
โน๏ธ Tool catalogues are cached to avoid repeated
ListToolscalls. The cache refreshes every 30 seconds by default. Override this with theMCP_TOOL_CACHE_TTLenvironment variable (set to0or a negative value to cache indefinitely).
load_config
Loads your casual_mcp_config.json into a validated config object.
from casual_mcp import load_config
config = load_config("casual_mcp_config.json")
load_mcp_client
Creats a multi server FastMCP client from the config object
from casual_mcp import load_mcp_client
config = load_mcp_client(config)
Model and Server Configs
Exported from casual_mcp.models:
StdioServerConfigRemoteServerConfigOpenAIModelConfigOllamaModelConfigChatStatsTokenUsageStatsToolCallStats
Use these types to build valid configs:
from casual_mcp.models import OpenAIModelConfig, OllamaModelConfig, StdioServerConfig
openai_model = OpenAIModelConfig(provider="openai", model="gpt-4.1")
ollama_model = OllamaModelConfig(provider="ollama", model="qwen2.5:7b-instruct", endpoint="http://localhost:11434")
server = StdioServerConfig(command="python", args=["time/server.py"])
Chat Messages
Exported from casual_llm (re-exported from casual_mcp.models for backwards compatibility):
AssistantMessageSystemMessageToolResultMessageUserMessageChatMessage
Use these types to build message chains:
from casual_llm import SystemMessage, UserMessage
messages = [
SystemMessage(content="You are a friendly tool calling assistant."),
UserMessage(content="What is the time?")
]
Example
from casual_llm import SystemMessage, UserMessage
from casual_mcp import McpToolChat, ProviderFactory, load_config, load_mcp_client
model = "gpt-4.1-nano"
messages = [
SystemMessage(content="""You are a tool calling assistant.
You have access to up-to-date information through the tools.
Respond naturally and confidently, as if you already know all the facts."""),
UserMessage(content="Will I need to take my umbrella to London today?")
]
# Load the Config from the File
config = load_config("casual_mcp_config.json")
# Setup the MCP Client
mcp_client = load_mcp_client(config)
# Get the Provider for the Model
provider_factory = ProviderFactory()
provider = provider_factory.get_provider(model, config.models[model])
# Perform the Chat and Tool calling
chat = McpToolChat(mcp_client, provider)
response_messages = await chat.chat(messages)
๐๏ธ Architecture Overview
Casual MCP orchestrates a flow between LLMs and MCP tool servers:
- MCP Client connects to multiple tool servers (local via stdio or remote via HTTP/SSE)
- Tool Cache fetches and caches available tools from all connected servers
- Tool Conversion converts MCP tools to casual-llm's
Toolformat automatically - ProviderFactory creates LLM providers from casual-llm based on model config
- McpToolChat orchestrates the recursive loop:
- Sends messages + tools to LLM provider
- LLM returns response (potentially with tool calls)
- Executes tool calls via MCP client
- Feeds results back to LLM
- Repeats until LLM provides final answer
โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ
โ MCP Servers โโโโโโโถโ Tool Cache โโโโโโโถโ Tool Converterโ
โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ
โ โ
โผ โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ McpToolChat Loop โ
โ โ
โ LLM โโโถ Tool Calls โโโถ MCP โ
โ โฒ โ โ
โ โโโโโโโโโ Results โโโโโโ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Tool Conversion
MCP tools are automatically converted from MCP's format to casual-llm's Tool format using the convert_tools module. This happens transparently in McpToolChat.chat() via tools_from_mcp().
๐ Response Structure
The chat() and generate() methods return a list of ChatMessage objects (from casual-llm):
response_messages = await chat.chat(messages)
# Returns: list[ChatMessage]
# Each message can be:
# - AssistantMessage: LLM's response (content + optional tool_calls)
# - ToolResultMessage: Result from tool execution
# Access the final response:
final_answer = response_messages[-1].content
# Check for tool calls in any message:
for msg in response_messages:
if hasattr(msg, 'tool_calls') and msg.tool_calls:
# Message contains tool calls
for tool_call in msg.tool_calls:
print(f"Called: {tool_call.function.name}")
๐ก Common Patterns
Using Templates for Models Without Native Tool Support
Some models don't natively support tool calling. Use Jinja2 templates to format tools in the system prompt:
{
"models": {
"custom-model": {
"provider": "ollama",
"model": "some-model:7b",
"template": "custom-tool-format"
}
}
}
Create prompt-templates/custom-tool-format.j2:
You are a helpful assistant with access to these tools:
{% for tool in tools %}
- {{ tool.name }}: {{ tool.description }}
Parameters: {{ tool.inputSchema.properties | tojson }}
{% endfor %}
To use a tool, respond with JSON: {"tool": "tool_name", "args": {...}}
Formatting Tool Results
Control how tool results are presented to the LLM using TOOL_RESULT_FORMAT:
# Just the raw result
export TOOL_RESULT_FORMAT=result
# Function name โ result
export TOOL_RESULT_FORMAT=function_result
# Example: "get_weather โ Temperature: 72ยฐF"
# Function with args โ result
export TOOL_RESULT_FORMAT=function_args_result
# Example: "get_weather(location='London') โ Temperature: 15ยฐC"
Session Management
Important: Sessions are for testing/development only. In production, manage sessions in your own application.
Sessions are stored in-memory and cleared on server restart:
# Using sessions for development/testing
response = await chat.generate("What's the weather?", session_id="test-123")
response = await chat.generate("How about tomorrow?", session_id="test-123")
# For production: manage your own message history
messages = []
messages.append(UserMessage(content="What's the weather?"))
response_msgs = await chat.chat(messages)
messages.extend(response_msgs)
# Next turn
messages.append(UserMessage(content="How about tomorrow?"))
response_msgs = await chat.chat(messages)
๐ง Troubleshooting
Tool Not Found
If you see errors about tools not being found:
- Check MCP servers are running:
casual-mcp servers - List available tools:
casual-mcp tools - Check tool cache TTL: Tools are cached for 30 seconds by default. Wait or restart if you just added a server.
- Verify server config: Ensure
command,args, orurlare correct in your config
Provider Initialization Issues
OpenAI Provider:
# Ensure API key is set (even for local APIs)
export OPENAI_API_KEY=your-key-here
# For local OpenAI-compatible APIs (LM Studio, etc):
export OPENAI_API_KEY=dummy-key # Can be any string
Ollama Provider:
# Check Ollama is running
curl http://localhost:11434/api/version
# Ensure model is pulled
ollama pull qwen2.5:7b-instruct
Cache Refresh Behavior
Tools are cached with a 30-second TTL by default. If you add/remove MCP servers:
- Option 1: Wait 30 seconds for automatic refresh
- Option 2: Restart the application
- Option 3: Set
MCP_TOOL_CACHE_TTL=0for indefinite caching (refresh only on restart) - Option 4: Set a shorter TTL like
MCP_TOOL_CACHE_TTL=5for 5-second refresh
Common Configuration Errors
// โ Missing required fields
{
"models": {
"my-model": {
"provider": "openai"
// Missing "model" field!
}
}
}
// โ
Correct
{
"models": {
"my-model": {
"provider": "openai",
"model": "gpt-4.1"
}
}
}
// โ Invalid provider
{
"models": {
"my-model": {
"provider": "anthropic", // Not supported!
"model": "claude-3"
}
}
}
// โ
Supported providers
{
"models": {
"openai-model": {
"provider": "openai",
"model": "gpt-4.1"
},
"ollama-model": {
"provider": "ollama",
"model": "qwen2.5:7b"
}
}
}
๐ API Usage
Start the API Server
casual-mcp serve --host 0.0.0.0 --port 8000
Chat
Endpoint: POST /chat
Request Body:
model: the LLM model to usemessages: list of chat messages (system, assistant, user, etc) that you can pass to the api, allowing you to keep your own chat session in the client calling the apiinclude_stats: (optional, default:false) include usage statistics in the response
Example:
{
"model": "gpt-4.1-nano",
"messages": [
{
"role": "user",
"content": "can you explain what the word consistent means?"
}
],
"include_stats": true
}
Response with stats:
{
"messages": [...],
"response": "Consistent means...",
"stats": {
"tokens": {
"prompt_tokens": 150,
"completion_tokens": 75,
"total_tokens": 225
},
"tool_calls": {
"by_tool": {"words_define": 1},
"by_server": {"words": 1},
"total": 1
},
"llm_calls": 2
}
}
Generate
The generate endpoint allows you to send a user prompt as a string.
It also support sessions that keep a record of all messages in the session and feeds them back into the LLM for context. Sessions are stored in memory so are cleared when the server is restarted
Endpoint: POST /generate
Request Body:
model: the LLM model to useprompt: the user promptsession_id: an optional ID that stores all the messages from the session and provides them back to the LLM for contextinclude_stats: (optional, default:false) include usage statistics in the response
Example:
{
"session_id": "my-session",
"model": "gpt-4o-mini",
"prompt": "can you explain what the word consistent means?",
"include_stats": true
}
Get Session
Get all the messages from a session
Endpoint: GET /generate/session/{session_id}
License
This software is released under the MIT License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file casual_mcp-0.6.0.tar.gz.
File metadata
- Download URL: casual_mcp-0.6.0.tar.gz
- Upload date:
- Size: 32.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
aac1de9da037121eb2f8215c039cd003da2b7882512db0a00ca6020013d69d76
|
|
| MD5 |
497ee29d063d768d94ddc2c91c0f0248
|
|
| BLAKE2b-256 |
31ca738fca15ce895fbf10be64d3acd07a9ad1f5bdc62f964cc5deef649f360d
|
Provenance
The following attestation bundles were made for casual_mcp-0.6.0.tar.gz:
Publisher:
release.yml on casualgenius/casual-mcp
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
casual_mcp-0.6.0.tar.gz -
Subject digest:
aac1de9da037121eb2f8215c039cd003da2b7882512db0a00ca6020013d69d76 - Sigstore transparency entry: 907762396
- Sigstore integration time:
-
Permalink:
casualgenius/casual-mcp@21bcb31a53ad851b2c19ede9dbb63f8cc48fe638 -
Branch / Tag:
refs/tags/v0.6.0 - Owner: https://github.com/casualgenius
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@21bcb31a53ad851b2c19ede9dbb63f8cc48fe638 -
Trigger Event:
push
-
Statement type:
File details
Details for the file casual_mcp-0.6.0-py3-none-any.whl.
File metadata
- Download URL: casual_mcp-0.6.0-py3-none-any.whl
- Upload date:
- Size: 21.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
458ffd8e97d76893973a12b464e277ac655e592646e16dd1bb9cd5dece9ccd58
|
|
| MD5 |
141de406209785ccb82d52aede8d6933
|
|
| BLAKE2b-256 |
359d701cf72d5efde4cf50ef983ba7e63e2795163461b2fb9a678cfe1b6e24bf
|
Provenance
The following attestation bundles were made for casual_mcp-0.6.0-py3-none-any.whl:
Publisher:
release.yml on casualgenius/casual-mcp
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
casual_mcp-0.6.0-py3-none-any.whl -
Subject digest:
458ffd8e97d76893973a12b464e277ac655e592646e16dd1bb9cd5dece9ccd58 - Sigstore transparency entry: 907762406
- Sigstore integration time:
-
Permalink:
casualgenius/casual-mcp@21bcb31a53ad851b2c19ede9dbb63f8cc48fe638 -
Branch / Tag:
refs/tags/v0.6.0 - Owner: https://github.com/casualgenius
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@21bcb31a53ad851b2c19ede9dbb63f8cc48fe638 -
Trigger Event:
push
-
Statement type: