Enhanced Langfuse MCP server with training data extraction for fine-tuning and reinforcement learning. Supports LangGraph node filtering and multiple output formats.
Project description
Langfuse MCP Better (Model Context Protocol)
An enhanced Model Context Protocol (MCP) server for Langfuse with powerful training data extraction capabilities. This fork adds specialized tools for extracting LLM training data from LangGraph applications, supporting fine-tuning and reinforcement learning workflows.
What's New in Better?
- 🎯 Training Data Extraction: Extract LLM interactions filtered by LangGraph node hierarchy
- 🔄 Multiple Output Formats: OpenAI, Anthropic, generic prompt/completion, and DPO formats
- 🎨 Smart Filtering: Filter by node name, node path, model, and time range
- 📊 Rich Metadata: Token usage, model parameters, timestamps, and node information
- 🚀 Production Ready: Full test coverage and comprehensive documentation
Based on the excellent langfuse-mcp by Aviv Sinai.
Quick Start
Installation
Install via pip or uvx:
# Using pip
pip install langfuse-mcp-better
# Using uvx (recommended)
uvx langfuse-mcp-better --public-key YOUR_KEY --secret-key YOUR_SECRET --host https://cloud.langfuse.com
Cursor IDE Integration
For Cursor IDE, you can use the deeplink (replace with your credentials):
{
"mcpServers": {
"langfuse-better": {
"command": "uvx",
"args": ["langfuse-mcp-better", "--public-key", "YOUR_KEY", "--secret-key", "YOUR_SECRET", "--host", "https://cloud.langfuse.com"]
}
}
}
💡 Note: Cursor IDE deeplinks work best when configured manually in
.cursor/mcp.json. See Configuration section below for details.
Features
- Integration with Langfuse for trace and observation data
- Tool suite for AI agents to query trace data
- Exception and error tracking capabilities
- Session and user activity monitoring
- Training data extraction for fine-tuning and reinforcement learning
- LangGraph node hierarchy filtering
- Multiple output formats (OpenAI, Anthropic, generic, DPO)
- Rich metadata including token usage and model parameters
Available Tools
The MCP server provides the following tools for AI agents:
Core Tools
fetch_traces- Find traces based on criteria like user ID, session ID, etc.fetch_trace- Get a specific trace by IDfetch_observations- Get observations filtered by typefetch_observation- Get a specific observation by IDfetch_sessions- List sessions in the current projectget_session_details- Get detailed information about a sessionget_user_sessions- Get all sessions for a user
Exception & Error Tools
find_exceptions- Find exceptions and errors in tracesfind_exceptions_in_file- Find exceptions in a specific fileget_exception_details- Get detailed information about an exceptionget_error_count- Get the count of errors
Training Data Tools
fetch_llm_training_data- [NEW] Extract LLM training data from LangGraph nodes for fine-tuning and reinforcement learning. Supports multiple output formats (OpenAI, Anthropic, generic, DPO) and filtering by node hierarchy.
Utility Tools
get_data_schema- Get schema information for the data structures
Setup
Install uv
First, make sure uv is installed. For installation instructions, see the uv installation docs.
If you already have an older version of uv installed, you might need to update it with uv self update.
Installation from PyPI
Requirement: The server depends on the Langfuse Python SDK v3. Installations automatically pull
langfuse>=3.0.0and require Python 3.10–3.13.
# Using pip
pip install langfuse-mcp-better
# Using uv
uv pip install langfuse-mcp-better
Development Installation
If you're iterating on this repository, install the local checkout:
# from the repo root
uv pip install --editable .
Recommended local environment
For development we suggest creating an isolated environment pinned to Python 3.11 (the version used in CI):
uv venv --python 3.11 .venv
source .venv/bin/activate # On Windows use: .venv\Scripts\activate
uv pip install --python .venv/bin/python -e .
All subsequent examples assume the virtual environment is activated.
Obtain Langfuse credentials
You'll need your Langfuse credentials:
- Public key
- Secret key
- Host URL (usually https://cloud.langfuse.com or your self-hosted URL)
You can store these in a local .env file instead of passing CLI flags each time:
LANGFUSE_PUBLIC_KEY=your_public_key
LANGFUSE_SECRET_KEY=your_secret_key
LANGFUSE_HOST=https://cloud.langfuse.com
When present, the MCP server reads these values automatically. CLI arguments still override the environment if provided.
Running the Server
Run the server using uvx or the installed command:
# Using uvx (no installation needed)
uvx langfuse-mcp-better --public-key YOUR_KEY --secret-key YOUR_SECRET --host https://cloud.langfuse.com
# Using the installed command
langfuse-mcp-better --public-key YOUR_KEY --secret-key YOUR_SECRET --host https://cloud.langfuse.com
# Backward compatible command also available
langfuse-mcp --public-key YOUR_KEY --secret-key YOUR_SECRET --host https://cloud.langfuse.com
Local checkout tip: During development run
uv run python -m langfuse_mcp ...to execute the code in your working tree.
The server writes diagnostic logs to /tmp/langfuse_mcp.log. Remove the --host switch if you are targeting the default Cloud endpoint.
Use --log-level (e.g., --log-level DEBUG) and --log-to-console to control verbosity during debugging.
Run with Docker
Option 1: Pull from GitHub Container Registry (Recommended)
Pull and run the pre-built image:
docker pull ghcr.io/avivsinai/langfuse-mcp:latest
docker run --rm -i \
-e LANGFUSE_PUBLIC_KEY=YOUR_PUBLIC_KEY \
-e LANGFUSE_SECRET_KEY=YOUR_SECRET_KEY \
-e LANGFUSE_HOST=https://cloud.langfuse.com \
-e LANGFUSE_MCP_LOG_FILE=/logs/langfuse_mcp.log \
-v "$(pwd)/logs:/logs" \
ghcr.io/avivsinai/langfuse-mcp:latest
Available tags:
latest- Most recent releasev0.2.0- Specific version0.2- Major.minor version
Option 2: Build from source
Build the image from the repository root so the container installs the current checkout instead of the latest PyPI release:
docker build -t langfuse-logs-mcp .
docker run --rm -i \
-e LANGFUSE_PUBLIC_KEY=YOUR_PUBLIC_KEY \
-e LANGFUSE_SECRET_KEY=YOUR_SECRET_KEY \
-e LANGFUSE_HOST=https://cloud.langfuse.com \
-e LANGFUSE_MCP_LOG_FILE=/logs/langfuse_mcp.log \
-v "$(pwd)/logs:/logs" \
langfuse-logs-mcp
Why no
-t? Allocating a pseudo-TTY can interfere with MCP stdio clients. Use-ionly so the server communicates over plain stdin/stdout.
The Dockerfile copies the local source tree and installs it with pip install ., so the container always runs your latest commits - a must while testing features that have not shipped on PyPI.
Configuration with MCP clients
Configure for Cursor
Create a .cursor/mcp.json file in your project root:
{
"mcpServers": {
"langfuse-better": {
"command": "uvx",
"args": ["langfuse-mcp-better", "--public-key", "YOUR_KEY", "--secret-key", "YOUR_SECRET", "--host", "https://cloud.langfuse.com"]
}
}
}
Configure for Claude Desktop
Add to your Claude settings:
{
"command": ["uvx"],
"args": ["langfuse-mcp-better"],
"type": "stdio",
"env": {
"LANGFUSE_PUBLIC_KEY": "YOUR_KEY",
"LANGFUSE_SECRET_KEY": "YOUR_SECRET",
"LANGFUSE_HOST": "https://cloud.langfuse.com"
}
}
Output Modes
Each tool supports different output modes to control the level of detail in responses:
compact(default): Returns a summary with large values truncatedfull_json_string: Returns the complete data as a JSON stringfull_json_file: Saves the complete data to a file and returns a summary with file information
Using the Training Data Tool
The fetch_llm_training_data tool is specifically designed for extracting training data from LangGraph applications. It provides powerful filtering and formatting capabilities for machine learning workflows.
Key Features
- 🚀 Automatic Pagination: Request any amount of data (1000, 10000+) - pagination is handled automatically
- 🔍 Smart Filtering:
ls_model_name: Partial matching (case-insensitive) - "Qwen3_235B" matches all variantslanggraph_nodeandagent_name: Exact matching for precision- At least one filter required
- Multiple Output Formats: Support for OpenAI, Anthropic, generic, and DPO formats
- Rich Metadata: Includes token usage, model parameters, timestamps, and node information
- Time-based Queries: Extract data from specific time ranges
- Flexible Combinations: Combine multiple filters for precise data extraction
- Transparent: Shows
pages_fetchedandtotal_raw_observationsin metadata
Output Formats
OpenAI Format (output_format="openai")
Perfect for OpenAI fine-tuning:
{
"messages": [
{"role": "system", "content": "You are a helpful assistant"},
{"role": "user", "content": "What is AI?"},
{"role": "assistant", "content": "AI is artificial intelligence..."}
],
"metadata": {
"model": "gpt-4",
"usage": {"total_tokens": 150},
"langgraph_node": "llm_call",
"agent_name": "supervisor",
"ls_model_name": "gpt-4-turbo"
}
}
Anthropic Format (output_format="anthropic")
Optimized for Claude fine-tuning:
{
"system": "You are a helpful assistant",
"messages": [
{"role": "user", "content": "What is AI?"},
{"role": "assistant", "content": "AI is artificial intelligence..."}
],
"metadata": {...}
}
Generic Format (output_format="generic")
Simple prompt/completion pairs:
{
"prompt": "What is AI?",
"completion": "AI is artificial intelligence...",
"metadata": {...}
}
DPO Format (output_format="dpo")
For Direct Preference Optimization:
{
"prompt": "What is AI?",
"chosen": "AI is artificial intelligence...",
"rejected": null,
"metadata": {
"_note": "rejected field is null - add negative samples for DPO training"
}
}
Automatic Pagination
No more API limit errors! The tool automatically handles pagination for large data requests:
# Request 5000 samples - no problem!
fetch_llm_training_data(
age=10080,
ls_model_name="gpt-4-turbo",
limit=5000, # Automatically fetches across multiple API calls
output_format="openai"
)
# The tool will:
# 1. Break this into 50 API calls (100 items each)
# 2. Automatically fetch all pages
# 3. Aggregate and return all 5000 samples
# 4. Show metadata: pages_fetched=50, total_raw_observations=5000
Usage Examples
Extract all LLM calls from a specific LangGraph node
# Get 1000 LLM interactions from the "agent_llm" node in the last 24 hours
fetch_llm_training_data(
age=1440, # 24 hours in minutes
langgraph_node="agent_llm",
limit=1000, # Default: will auto-paginate if needed
output_format="openai"
)
Filter by agent name
# Get 5000 LLM calls from the "supervisor" agent in the last week
fetch_llm_training_data(
age=10080, # 7 days
agent_name="supervisor",
limit=5000, # Automatically handles pagination
output_format="generic"
)
Filter by model name (partial matching)
# Extract 10,000 Qwen model calls using partial name
# "Qwen3_235B" will match all variants like:
# - Qwen3_235B_A22B_Instruct_2507
# - Qwen3_235B_A22B_Instruct_2507_ShenZhen
# - Qwen3_235B_A22B_Instruct_2507_Beijing
fetch_llm_training_data(
age=43200, # 30 days
ls_model_name="Qwen3_235B", # Partial name - matches all variants!
limit=10000, # Large scale - automatically paginated
output_format="openai"
)
Combine multiple filters
# Extract data with specific node and model combination
fetch_llm_training_data(
age=10080,
langgraph_node="reasoning_node",
ls_model_name="gpt-4-turbo",
output_format="openai"
)
Save complete data to file
# Extract data and save to file for offline processing
fetch_llm_training_data(
age=10080,
agent_name="supervisor",
output_format="openai",
output_mode="full_json_file" # Saves to configured dump directory
)
LangGraph Integration
The tool expects LangGraph applications to include specific metadata in their observations:
# In your LangGraph application, add metadata to track nodes
from langfuse import Langfuse
langfuse = Langfuse()
# When creating observations, include the required metadata fields
generation = langfuse.generation(
name="llm_call",
input=messages,
output=response,
metadata={
"langgraph_node": "reasoning_node", # Required for filtering by node
"agent_name": "supervisor", # Required for filtering by agent
"ls_model_name": "gpt-4-turbo" # Required for filtering by model
}
)
Metadata Fields
When include_metadata=True (default), each training sample includes:
observation_id: Unique identifier for the observationtrace_id: Parent trace ID for tracing back to original requesttimestamp: When the LLM call was mademodel: LLM model used (e.g., "gpt-4", "claude-3-opus")model_parameters: Model configuration (temperature, max_tokens, etc.)usage: Token usage statistics (prompt_tokens, completion_tokens, total_tokens)langgraph_node: LangGraph node name (for node-based filtering)agent_name: Agent name (for agent-based filtering)ls_model_name: LangSmith model name (for model-based filtering)
This metadata is valuable for:
- Filtering and analyzing training data
- Cost analysis and optimization
- Understanding model performance across different nodes and agents
- Reproducibility and debugging
Development
Clone the repository
git clone https://github.com/futumaster/langfuse-mcp-better.git
cd langfuse-mcp-better
Create a virtual environment and install dependencies
uv venv --python 3.11 .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
uv pip install --python .venv/bin/python -e ".[dev]"
Set up environment variables
export LANGFUSE_SECRET_KEY="your-secret-key"
export LANGFUSE_PUBLIC_KEY="your-public-key"
export LANGFUSE_HOST="https://cloud.langfuse.com" # Or your self-hosted URL
Testing
Run the unit test suite (mirrors CI):
pytest
To run the demo client:
uv run examples/langfuse_client_demo.py --public-key YOUR_PUBLIC_KEY --secret-key YOUR_SECRET_KEY
Version Management
This project uses dynamic versioning based on Git tags:
- The version is automatically determined from git tags using
uv-dynamic-versioning - To create a new release:
- Tag your commit with
git tag v0.1.2(following semantic versioning) - Push the tag with
git push --tags - Create a GitHub release from the tag
- Tag your commit with
- The GitHub workflow will automatically build and publish the package with the correct version to PyPI
For a detailed history of changes, please see the CHANGELOG.md file.
Langfuse 3.x migration notes
- The MCP server now uses the Langfuse Python SDK v3 resource clients (
langfuse.api.trace.list,langfuse.api.observations.get_many, etc.) and must currently run on Python 3.10–3.13 because the upstream SDK still relies on Pydantic v1 internals. - Unit tests use a v3-style fake client that fails if legacy
fetch_*helpers are invoked, helping catch regressions early. - Tool responses now include pagination metadata when the Langfuse API returns cursors, while retaining the existing MCP interface.
- Diagnostic logs continue to stream to
/tmp/langfuse_mcp.log; this is useful when verifying the upgraded integration against a live Langfuse deployment.
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
License
This project is licensed under the MIT License - see the LICENSE file for details.
Cache Management
We use the cachetools library to implement efficient caching with proper size limits:
- Uses
cachetools.LRUCachefor better reliability - Configurable cache size via the
CACHE_SIZEconstant - Automatically evicts the least recently used items when caches exceed their size limits
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file langfuse_mcp_better-1.2.2-py3-none-any.whl.
File metadata
- Download URL: langfuse_mcp_better-1.2.2-py3-none-any.whl
- Upload date:
- Size: 30.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.7.19
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a8b10dc47f1a704a233574266009b8cf5d4ec15e2f5fe0ffb3f9e08d3899ab4d
|
|
| MD5 |
ad29cd97fd058c7ac1181e070b6badd8
|
|
| BLAKE2b-256 |
b23d6dab64f364b353f82782cec48313cd9fc00a58126c81744dc433dcb6fd47
|