Skip to main content

Enhanced Langfuse MCP server with training data extraction for fine-tuning and reinforcement learning. Supports LangGraph node filtering and multiple output formats.

Project description

Langfuse MCP Better (Model Context Protocol)

PyPI version Python 3.10-3.13 License: MIT Based on langfuse-mcp

An enhanced Model Context Protocol (MCP) server for Langfuse with powerful training data extraction capabilities. This fork adds specialized tools for extracting LLM training data from LangGraph applications, supporting fine-tuning and reinforcement learning workflows.

What's New in Better?

  • 🎯 Training Data Extraction: Extract LLM interactions filtered by LangGraph node hierarchy
  • 🔄 Multiple Output Formats: OpenAI, Anthropic, generic prompt/completion, and DPO formats
  • 🎨 Smart Filtering: Filter by node name, node path, model, and time range
  • 📊 Rich Metadata: Token usage, model parameters, timestamps, and node information
  • 🚀 Production Ready: Full test coverage and comprehensive documentation

Based on the excellent langfuse-mcp by Aviv Sinai.

Quick Start

Installation

Install via pip or uvx:

# Using pip
pip install langfuse-mcp-better

# Using uvx (recommended)
uvx langfuse-mcp-better --public-key YOUR_KEY --secret-key YOUR_SECRET --host https://cloud.langfuse.com

Cursor IDE Integration

For Cursor IDE, you can use the deeplink (replace with your credentials):

{
  "mcpServers": {
    "langfuse-better": {
      "command": "uvx",
      "args": ["langfuse-mcp-better", "--public-key", "YOUR_KEY", "--secret-key", "YOUR_SECRET", "--host", "https://cloud.langfuse.com"]
    }
  }
}

💡 Note: Cursor IDE deeplinks work best when configured manually in .cursor/mcp.json. See Configuration section below for details.

Features

  • Integration with Langfuse for trace and observation data
  • Tool suite for AI agents to query trace data
  • Exception and error tracking capabilities
  • Session and user activity monitoring
  • Training data extraction for fine-tuning and reinforcement learning
    • LangGraph node hierarchy filtering
    • Multiple output formats (OpenAI, Anthropic, generic, DPO)
    • Rich metadata including token usage and model parameters

Available Tools

The MCP server provides the following tools for AI agents:

Core Tools

  • fetch_traces - Find traces based on criteria like user ID, session ID, etc.
  • fetch_trace - Get a specific trace by ID
  • fetch_observations - Get observations filtered by type
  • fetch_observation - Get a specific observation by ID
  • fetch_sessions - List sessions in the current project
  • get_session_details - Get detailed information about a session
  • get_user_sessions - Get all sessions for a user

Exception & Error Tools

  • find_exceptions - Find exceptions and errors in traces
  • find_exceptions_in_file - Find exceptions in a specific file
  • get_exception_details - Get detailed information about an exception
  • get_error_count - Get the count of errors

Training Data Tools

  • fetch_llm_training_data - [NEW] Extract LLM training data from LangGraph nodes for fine-tuning and reinforcement learning. Supports multiple output formats (OpenAI, Anthropic, generic, DPO) and filtering by node hierarchy.

Utility Tools

  • get_data_schema - Get schema information for the data structures

Setup

Install uv

First, make sure uv is installed. For installation instructions, see the uv installation docs.

If you already have an older version of uv installed, you might need to update it with uv self update.

Installation from PyPI

Requirement: The server depends on the Langfuse Python SDK v3. Installations automatically pull langfuse>=3.0.0 and require Python 3.10–3.13.

# Using pip
pip install langfuse-mcp-better

# Using uv
uv pip install langfuse-mcp-better

Development Installation

If you're iterating on this repository, install the local checkout:

# from the repo root
uv pip install --editable .

Recommended local environment

For development we suggest creating an isolated environment pinned to Python 3.11 (the version used in CI):

uv venv --python 3.11 .venv
source .venv/bin/activate  # On Windows use: .venv\Scripts\activate
uv pip install --python .venv/bin/python -e .

All subsequent examples assume the virtual environment is activated.

Obtain Langfuse credentials

You'll need your Langfuse credentials:

You can store these in a local .env file instead of passing CLI flags each time:

LANGFUSE_PUBLIC_KEY=your_public_key
LANGFUSE_SECRET_KEY=your_secret_key
LANGFUSE_HOST=https://cloud.langfuse.com

When present, the MCP server reads these values automatically. CLI arguments still override the environment if provided.

Running the Server

Run the server using uvx or the installed command:

# Using uvx (no installation needed)
uvx langfuse-mcp-better --public-key YOUR_KEY --secret-key YOUR_SECRET --host https://cloud.langfuse.com

# Using the installed command
langfuse-mcp-better --public-key YOUR_KEY --secret-key YOUR_SECRET --host https://cloud.langfuse.com

# Backward compatible command also available
langfuse-mcp --public-key YOUR_KEY --secret-key YOUR_SECRET --host https://cloud.langfuse.com

Local checkout tip: During development run uv run python -m langfuse_mcp ... to execute the code in your working tree.

The server writes diagnostic logs to /tmp/langfuse_mcp.log. Remove the --host switch if you are targeting the default Cloud endpoint. Use --log-level (e.g., --log-level DEBUG) and --log-to-console to control verbosity during debugging.

Run with Docker

Option 1: Pull from GitHub Container Registry (Recommended)

Pull and run the pre-built image:

docker pull ghcr.io/avivsinai/langfuse-mcp:latest
docker run --rm -i \
  -e LANGFUSE_PUBLIC_KEY=YOUR_PUBLIC_KEY \
  -e LANGFUSE_SECRET_KEY=YOUR_SECRET_KEY \
  -e LANGFUSE_HOST=https://cloud.langfuse.com \
  -e LANGFUSE_MCP_LOG_FILE=/logs/langfuse_mcp.log \
  -v "$(pwd)/logs:/logs" \
  ghcr.io/avivsinai/langfuse-mcp:latest

Available tags:

  • latest - Most recent release
  • v0.2.0 - Specific version
  • 0.2 - Major.minor version

Option 2: Build from source

Build the image from the repository root so the container installs the current checkout instead of the latest PyPI release:

docker build -t langfuse-logs-mcp .
docker run --rm -i \
  -e LANGFUSE_PUBLIC_KEY=YOUR_PUBLIC_KEY \
  -e LANGFUSE_SECRET_KEY=YOUR_SECRET_KEY \
  -e LANGFUSE_HOST=https://cloud.langfuse.com \
  -e LANGFUSE_MCP_LOG_FILE=/logs/langfuse_mcp.log \
  -v "$(pwd)/logs:/logs" \
  langfuse-logs-mcp

Why no -t? Allocating a pseudo-TTY can interfere with MCP stdio clients. Use -i only so the server communicates over plain stdin/stdout.

The Dockerfile copies the local source tree and installs it with pip install ., so the container always runs your latest commits - a must while testing features that have not shipped on PyPI.

Configuration with MCP clients

Configure for Cursor

Create a .cursor/mcp.json file in your project root:

{
  "mcpServers": {
    "langfuse-better": {
      "command": "uvx",
      "args": ["langfuse-mcp-better", "--public-key", "YOUR_KEY", "--secret-key", "YOUR_SECRET", "--host", "https://cloud.langfuse.com"]
    }
  }
}

Configure for Claude Desktop

Add to your Claude settings:

{
  "command": ["uvx"],
  "args": ["langfuse-mcp-better"],
  "type": "stdio",
  "env": {
    "LANGFUSE_PUBLIC_KEY": "YOUR_KEY",
    "LANGFUSE_SECRET_KEY": "YOUR_SECRET",
    "LANGFUSE_HOST": "https://cloud.langfuse.com"
  }
}

Output Modes

Each tool supports different output modes to control the level of detail in responses:

  • compact (default): Returns a summary with large values truncated
  • full_json_string: Returns the complete data as a JSON string
  • full_json_file: Saves the complete data to a file and returns a summary with file information

Using the Training Data Tool

The fetch_llm_training_data tool is specifically designed for extracting training data from LangGraph applications. It provides powerful filtering and formatting capabilities for machine learning workflows.

Key Features

  • 🚀 Automatic Pagination: Request any amount of data (1000, 10000+) - pagination is handled automatically
  • 🔍 Smart Filtering:
    • ls_model_name: Partial matching (case-insensitive) - "Qwen3_235B" matches all variants
    • langgraph_node and agent_name: Exact matching for precision
    • At least one filter required
  • Multiple Output Formats: Support for OpenAI, Anthropic, generic, and DPO formats
  • Rich Metadata: Includes token usage, model parameters, timestamps, and node information
  • Time-based Queries: Extract data from specific time ranges
  • Flexible Combinations: Combine multiple filters for precise data extraction
  • Transparent: Shows pages_fetched and total_raw_observations in metadata

Output Formats

OpenAI Format (output_format="openai")

Perfect for OpenAI fine-tuning:

{
  "messages": [
    {"role": "system", "content": "You are a helpful assistant"},
    {"role": "user", "content": "What is AI?"},
    {"role": "assistant", "content": "AI is artificial intelligence..."}
  ],
  "metadata": {
    "model": "gpt-4",
    "usage": {"total_tokens": 150},
    "langgraph_node": "llm_call",
    "agent_name": "supervisor",
    "ls_model_name": "gpt-4-turbo"
  }
}

Anthropic Format (output_format="anthropic")

Optimized for Claude fine-tuning:

{
  "system": "You are a helpful assistant",
  "messages": [
    {"role": "user", "content": "What is AI?"},
    {"role": "assistant", "content": "AI is artificial intelligence..."}
  ],
  "metadata": {...}
}

Generic Format (output_format="generic")

Simple prompt/completion pairs:

{
  "prompt": "What is AI?",
  "completion": "AI is artificial intelligence...",
  "metadata": {...}
}

DPO Format (output_format="dpo")

For Direct Preference Optimization:

{
  "prompt": "What is AI?",
  "chosen": "AI is artificial intelligence...",
  "rejected": null,
  "metadata": {
    "_note": "rejected field is null - add negative samples for DPO training"
  }
}

Automatic Pagination

No more API limit errors! The tool automatically handles pagination for large data requests:

# Request 5000 samples - no problem!
fetch_llm_training_data(
    age=10080,
    ls_model_name="gpt-4-turbo",
    limit=5000,  # Automatically fetches across multiple API calls
    output_format="openai"
)

# The tool will:
# 1. Break this into 50 API calls (100 items each)
# 2. Automatically fetch all pages
# 3. Aggregate and return all 5000 samples
# 4. Show metadata: pages_fetched=50, total_raw_observations=5000

Usage Examples

Extract all LLM calls from a specific LangGraph node

# Get 1000 LLM interactions from the "agent_llm" node in the last 24 hours
fetch_llm_training_data(
    age=1440,  # 24 hours in minutes
    langgraph_node="agent_llm",
    limit=1000,  # Default: will auto-paginate if needed
    output_format="openai"
)

Filter by agent name

# Get 5000 LLM calls from the "supervisor" agent in the last week
fetch_llm_training_data(
    age=10080,  # 7 days
    agent_name="supervisor",
    limit=5000,  # Automatically handles pagination
    output_format="generic"
)

Filter by model name (partial matching)

# Extract 10,000 Qwen model calls using partial name
# "Qwen3_235B" will match all variants like:
#   - Qwen3_235B_A22B_Instruct_2507
#   - Qwen3_235B_A22B_Instruct_2507_ShenZhen
#   - Qwen3_235B_A22B_Instruct_2507_Beijing
fetch_llm_training_data(
    age=43200,  # 30 days
    ls_model_name="Qwen3_235B",  # Partial name - matches all variants!
    limit=10000,  # Large scale - automatically paginated
    output_format="openai"
)

Combine multiple filters

# Extract data with specific node and model combination
fetch_llm_training_data(
    age=10080,
    langgraph_node="reasoning_node",
    ls_model_name="gpt-4-turbo",
    output_format="openai"
)

Save complete data to file

# Extract data and save to file for offline processing
fetch_llm_training_data(
    age=10080,
    agent_name="supervisor",
    output_format="openai",
    output_mode="full_json_file"  # Saves to configured dump directory
)

LangGraph Integration

The tool expects LangGraph applications to include specific metadata in their observations:

# In your LangGraph application, add metadata to track nodes
from langfuse import Langfuse

langfuse = Langfuse()

# When creating observations, include the required metadata fields
generation = langfuse.generation(
    name="llm_call",
    input=messages,
    output=response,
    metadata={
        "langgraph_node": "reasoning_node",      # Required for filtering by node
        "agent_name": "supervisor",              # Required for filtering by agent
        "ls_model_name": "gpt-4-turbo"          # Required for filtering by model
    }
)

Metadata Fields

When include_metadata=True (default), each training sample includes:

  • observation_id: Unique identifier for the observation
  • trace_id: Parent trace ID for tracing back to original request
  • timestamp: When the LLM call was made
  • model: LLM model used (e.g., "gpt-4", "claude-3-opus")
  • model_parameters: Model configuration (temperature, max_tokens, etc.)
  • usage: Token usage statistics (prompt_tokens, completion_tokens, total_tokens)
  • langgraph_node: LangGraph node name (for node-based filtering)
  • agent_name: Agent name (for agent-based filtering)
  • ls_model_name: LangSmith model name (for model-based filtering)

This metadata is valuable for:

  • Filtering and analyzing training data
  • Cost analysis and optimization
  • Understanding model performance across different nodes and agents
  • Reproducibility and debugging

Development

Clone the repository

git clone https://github.com/futumaster/langfuse-mcp-better.git
cd langfuse-mcp-better

Create a virtual environment and install dependencies

uv venv --python 3.11 .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
uv pip install --python .venv/bin/python -e ".[dev]"

Set up environment variables

export LANGFUSE_SECRET_KEY="your-secret-key"
export LANGFUSE_PUBLIC_KEY="your-public-key"
export LANGFUSE_HOST="https://cloud.langfuse.com"  # Or your self-hosted URL

Testing

Run the unit test suite (mirrors CI):

pytest

To run the demo client:

uv run examples/langfuse_client_demo.py --public-key YOUR_PUBLIC_KEY --secret-key YOUR_SECRET_KEY

Version Management

This project uses dynamic versioning based on Git tags:

  1. The version is automatically determined from git tags using uv-dynamic-versioning
  2. To create a new release:
    • Tag your commit with git tag v0.1.2 (following semantic versioning)
    • Push the tag with git push --tags
    • Create a GitHub release from the tag
  3. The GitHub workflow will automatically build and publish the package with the correct version to PyPI

For a detailed history of changes, please see the CHANGELOG.md file.

Langfuse 3.x migration notes

  • The MCP server now uses the Langfuse Python SDK v3 resource clients (langfuse.api.trace.list, langfuse.api.observations.get_many, etc.) and must currently run on Python 3.10–3.13 because the upstream SDK still relies on Pydantic v1 internals.
  • Unit tests use a v3-style fake client that fails if legacy fetch_* helpers are invoked, helping catch regressions early.
  • Tool responses now include pagination metadata when the Langfuse API returns cursors, while retaining the existing MCP interface.
  • Diagnostic logs continue to stream to /tmp/langfuse_mcp.log; this is useful when verifying the upgraded integration against a live Langfuse deployment.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Cache Management

We use the cachetools library to implement efficient caching with proper size limits:

  • Uses cachetools.LRUCache for better reliability
  • Configurable cache size via the CACHE_SIZE constant
  • Automatically evicts the least recently used items when caches exceed their size limits

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

langfuse_mcp_better-1.2.1-py3-none-any.whl (30.3 kB view details)

Uploaded Python 3

File details

Details for the file langfuse_mcp_better-1.2.1-py3-none-any.whl.

File metadata

File hashes

Hashes for langfuse_mcp_better-1.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 99f9aca006a8baea76e283b2e2fd9aac7aaf22407351eaccc551360558c90ca5
MD5 20d2b92ccdc4983794c369e712cc6c25
BLAKE2b-256 d60aeab8a96da0822fcfe0784d3b1e37d25ea5440ae6a57213c899bd4eb0d08b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page