Langfuse MCP server with built-in analytics, multi-project routing, and Google OAuth. Token percentiles, accuracy metrics, failure detection, cost breakdowns, session analytics, latency analysis, context breach scanning — plus a hosted-remote deployment story.
Project description
Langfuse MCP Server
Model Context Protocol server for Langfuse observability. Query traces, analyze accuracy, detect failures, track costs, debug latency, manage prompts and datasets.
56 tools across data access and analytics. Multi-project support so one instance can serve several Langfuse projects. Works with Claude Code, Codex CLI, Cursor, and any MCP-compatible client.
Why this MCP server?
Comparison with official Langfuse MCP (as of March 2026):
| Capability | This server | Official Langfuse MCP |
|---|---|---|
| Traces & Observations | Yes | No |
| Sessions & Users | Yes | No |
| Exception Tracking | Yes | No |
| Prompt Management | Yes | Yes |
| Dataset Management | Yes | No |
| Annotation Queues | Yes | No |
| Scores v2 API | Yes | No |
| Score Write-back | Yes | No |
| Multi-project support | Yes | No |
| Accuracy Metrics | Yes | No |
| Failure Detection | Yes | No |
| Token Percentiles | Yes | No |
| Cost Breakdown | Yes | No |
| Latency Analysis | Yes | No |
| Session Analytics | Yes | No |
| Context Breach Scanning | Yes | No |
| User Group Aggregation | Yes | No |
The official MCP focuses on prompt management. This server provides a full observability and analytics toolkit — traces, observations, sessions, scores, exceptions, prompts, datasets, annotation queues, plus 9 built-in analytics tools that compute insights server-side and return LLM-sized summaries. Multi-project routing lets a single instance serve several Langfuse projects behind one connector URL.
Quick Start
1. Get your API keys
- Langfuse Cloud: cloud.langfuse.com → Settings → API Keys
- Self-hosted: Your Langfuse instance → Settings → API Keys. Set
LANGFUSE_HOSTto your instance URL (e.g.,https://langfuse.yourcompany.com)
2. Add the MCP server
Claude Code
claude mcp add \
-e LANGFUSE_PUBLIC_KEY=pk-lf-... \
-e LANGFUSE_SECRET_KEY=sk-lf-... \
-e LANGFUSE_HOST=https://cloud.langfuse.com \
--scope project \
langfuse-mcp -- uvx langfuse-mcp-server
Codex CLI
codex mcp add langfuse-mcp \
--env LANGFUSE_PUBLIC_KEY=pk-lf-... \
--env LANGFUSE_SECRET_KEY=sk-lf-... \
--env LANGFUSE_HOST=https://cloud.langfuse.com \
-- uvx langfuse-mcp-server
Cursor
Add to .cursor/mcp.json:
{
"mcpServers": {
"langfuse-mcp": {
"command": "uvx",
"args": ["langfuse-mcp-server"],
"env": {
"LANGFUSE_PUBLIC_KEY": "pk-lf-...",
"LANGFUSE_SECRET_KEY": "sk-lf-...",
"LANGFUSE_HOST": "https://cloud.langfuse.com"
}
}
}
}
3. Verify
Restart your CLI, then test with /mcp (Claude Code) or codex mcp list (Codex).
Manual install (alternative to uvx)
pip install langfuse-mcp-server
langfuse-mcp-server
Hosting as a remote service
Run as a long-lived HTTP service so multiple users connect to a single instance — required for Claude.ai custom Connectors, and useful for team-wide access without distributing Langfuse API keys per user.
Enabled via env vars; no code changes.
Minimum setup
MCP_TRANSPORT=streamable-http
MCP_BASE_URL=https://mcp.yourcompany.com
LANGFUSE_PUBLIC_KEY=pk-lf-...
LANGFUSE_SECRET_KEY=sk-lf-...
LANGFUSE_HOST=https://your-langfuse-instance.example
Without OAuth env vars, the endpoint is unauthenticated — suitable only for local testing. See Google OAuth setup below for production.
Docker
A production-ready Dockerfile is checked into the repo (non-root user, pinned base, .dockerignore to prevent secret leakage). Each tagged release auto-publishes a multi-arch image to GitHub Container Registry via .github/workflows/docker-publish.yml.
Pull the published image:
docker pull ghcr.io/drishtantkaushal/langfusemcp:latest
Or build from source:
docker build -t langfuse-mcp .
Run (all secrets injected via -e, never baked into the image):
docker run -d \
--name langfuse-mcp \
--restart unless-stopped \
-p 8000:8000 \
-e MCP_TRANSPORT=streamable-http \
-e MCP_BASE_URL=https://mcp.yourcompany.com \
-e LANGFUSE_PUBLIC_KEY=pk-lf-... \
-e LANGFUSE_SECRET_KEY=sk-lf-... \
-e LANGFUSE_HOST=https://cloud.langfuse.com \
-e GOOGLE_CLIENT_ID=... \
-e GOOGLE_CLIENT_SECRET=... \
-e ALLOWED_EMAIL_DOMAINS=yourcompany.com \
ghcr.io/drishtantkaushal/langfusemcp:latest
Reverse proxy
Terminate TLS in front (nginx, Caddy, Cloudflare). MCP endpoint is at /mcp/ (trailing slash). Because responses stream, the proxy must:
- Disable response buffering — nginx:
proxy_buffering off; - Allow read timeout ≥ 5 minutes — some analytics queries legitimately run several minutes
- Speak HTTP/1.1 with keepalive upstream
Google OAuth
In your Google Cloud project:
- APIs & Services → OAuth consent screen
- User type: Internal (restricts sign-in to your Google Workspace domain)
- Scopes:
openid,https://www.googleapis.com/auth/userinfo.email
- Credentials → Create OAuth client ID → Web application
- Authorized redirect URI:
https://{your-base-url}/auth/callback - Copy the Client ID and Client Secret
- Authorized redirect URI:
Set:
GOOGLE_CLIENT_ID=....apps.googleusercontent.com
GOOGLE_CLIENT_SECRET=GOCSPX-...
OAuth activates when GOOGLE_CLIENT_ID, GOOGLE_CLIENT_SECRET, and MCP_BASE_URL are all set. With an Internal consent screen, Google rejects non-Workspace sign-ins at the identity layer — the server never sees those attempts.
Optional email allowlist
For narrower control than "anyone in the Workspace":
# either, or both
ALLOWED_EMAIL_DOMAINS=yourcompany.com
ALLOWED_EMAILS=alice@yourcompany.com,bob@yourcompany.com
When set, every tool call verifies the caller's email_verified claim and checks membership before proceeding. When unset, the server trusts whatever the OAuth provider returns.
Adding to Claude.ai
Once hosted at https://mcp.yourcompany.com:
- Claude.ai → Settings → Connectors → Add custom connector
- Remote MCP server URL:
https://mcp.yourcompany.com/mcp/ - Leave the OAuth Client ID / Secret fields empty — the server uses Dynamic Client Registration; those fields are for a different deployment pattern.
- Click Add → Google sign-in popup → done.
Verifying the deploy
Auth enabled, expect 401:
curl -i -X POST https://mcp.yourcompany.com/mcp/ \
-H "Content-Type: application/json" \
-H "Accept: application/json, text/event-stream" \
-d '{"jsonrpc":"2.0","id":1,"method":"initialize","params":{"protocolVersion":"2024-11-05","capabilities":{},"clientInfo":{"name":"curl","version":"0"}}}'
OAuth metadata endpoint returns JSON (used by Claude.ai to auto-register):
curl https://mcp.yourcompany.com/.well-known/oauth-authorization-server
Liveness/readiness probe — unauthenticated GET /health returns HTTP 200 with {"status": "ok"}, suitable for Kubernetes probes:
curl -i https://mcp.yourcompany.com/health
Multi-project support
A single server instance can route to multiple Langfuse projects. Every tool accepts an optional project argument; when omitted, the server-configured default is used. Call list_projects to discover what's available.
Configuring projects
Declare each project via indexed env vars. Project names are data, not part of variable names — use whatever scheme you like.
LANGFUSE_PROJECT_1_NAME=production
LANGFUSE_PROJECT_1_PUBLIC_KEY=pk-lf-...
LANGFUSE_PROJECT_1_SECRET_KEY=sk-lf-...
LANGFUSE_PROJECT_1_HOST=https://cloud.langfuse.com
LANGFUSE_PROJECT_2_NAME=staging
LANGFUSE_PROJECT_2_PUBLIC_KEY=pk-lf-...
LANGFUSE_PROJECT_2_SECRET_KEY=sk-lf-...
LANGFUSE_PROJECT_2_HOST=https://cloud.langfuse.com
LANGFUSE_DEFAULT_PROJECT=production
Usage from the client
Claude: "Show me failing traces in production today."
→ fetch_traces(project="production", ...) routed to project 1's credentials.
Claude: "Compare that with staging."
→ fetch_traces(project="staging", ...) routed to project 2's credentials.
Each project has its own cache, rate limiter, and connection pool. Claude.ai sees one connector; users authenticate once via OAuth and can query any configured project within the session.
Single-project (legacy) mode
If LANGFUSE_PROJECT_1_NAME is not set, the server falls back to the legacy LANGFUSE_PUBLIC_KEY / LANGFUSE_SECRET_KEY / LANGFUSE_HOST vars and registers them as a project called default. Existing deployments keep working without changes.
Configuration
| Env Variable | Default | Description |
|---|---|---|
LANGFUSE_PUBLIC_KEY |
(required) | Langfuse public API key |
LANGFUSE_SECRET_KEY |
(required) | Langfuse secret API key |
LANGFUSE_HOST |
https://cloud.langfuse.com |
Langfuse instance URL (cloud or self-hosted) |
LANGFUSE_INTERNAL_DOMAINS |
"" |
Comma-separated internal domains to exclude from analytics (e.g., mycompany.com,test.com). Applies when using group_by='domain'. |
LANGFUSE_MCP_READ_ONLY |
false |
Disable write operations (score_traces, create_dataset, etc.) |
LANGFUSE_PAGE_LIMIT |
100 |
Traces per API page |
LANGFUSE_PROJECT_{N}_NAME |
(unset) | Multi-project: name for project N (e.g. production). See Multi-project. |
LANGFUSE_PROJECT_{N}_PUBLIC_KEY |
(unset) | Public key for project N. |
LANGFUSE_PROJECT_{N}_SECRET_KEY |
(unset) | Secret key for project N. |
LANGFUSE_PROJECT_{N}_HOST |
https://cloud.langfuse.com |
Host URL for project N. |
LANGFUSE_DEFAULT_PROJECT |
first configured | Default project name used when a tool call omits project. |
MCP_TRANSPORT |
stdio |
stdio or streamable-http. HTTP mode listens on a port instead of stdin/stdout. See Hosting. |
MCP_HOST |
0.0.0.0 |
Bind address when MCP_TRANSPORT=streamable-http. |
MCP_PORT |
8000 |
Port when MCP_TRANSPORT=streamable-http. |
MCP_BASE_URL |
(unset) | Public base URL of the hosted server. Required for Google OAuth. |
GOOGLE_CLIENT_ID |
(unset) | Google OAuth client ID. OAuth activates when all three Google vars are set. |
GOOGLE_CLIENT_SECRET |
(unset) | Google OAuth client secret. |
ALLOWED_EMAILS |
(unset) | Comma-separated emails allowed to call tools. Requires OAuth. |
ALLOWED_EMAIL_DOMAINS |
(unset) | Comma-separated email domains allowed to call tools. Requires OAuth. |
Tools
Analytics (9 tools)
Tools that compute insights server-side and return compact summaries. These go beyond raw data access — they aggregate, detect patterns, and compute statistics so the LLM can reason over results without hitting context window limits.
| Tool | Description | Key Parameters |
|---|---|---|
aggregate_by_group |
Aggregate trace metrics by user group. Returns per-group: trace count, unique sessions, unique users, accuracy rate, average latency, total cost. | group_by (name/userId/domain/tag), time_range, top_n, exclude_internal |
compute_accuracy |
Compute accuracy from feedback scores. Accuracy = correct / (correct + incorrect). Supports grouping and time bucketing for trend analysis. | group_by, bucket_by (week/day), score_name, time_range |
detect_failures |
Detect LLM output quality failures using pattern matching ("unable to", "I can't", etc.) and negative feedback scores. NOT Python exceptions — use find_exceptions for those. |
group_by, include_examples, max_examples, time_range |
compute_token_percentiles |
Compute token usage percentiles (TP50/TP90/TP95/TP99) at trace level. Fetches generation observations for accurate per-trace token counts. | group_by, percentiles, time_range |
detect_context_breaches |
Scan for traces where any single generation exceeds a token threshold. Catches context window overflow causing degraded LLM performance or silent truncation. | threshold (default 256000), check_per_generation, time_range |
analyze_sessions |
Analyze multi-turn session behavior. Returns session count, depth distribution (single vs multi-turn), engagement metrics, and session-level cost/latency. | group_by, time_range |
estimate_costs |
Compute cost breakdown using Langfuse's built-in totalCost field (model-aware, computed by Langfuse). Groups by user, agent, or time bucket. |
group_by, bucket_by (week/day), time_range |
analyze_latency |
Analyze latency distribution at trace level and optionally per LLM generation. Identifies which model is the bottleneck. | group_by, percentiles, include_per_generation, time_range |
score_traces |
Write scores back to Langfuse. Use after analysis to annotate traces with findings — tag failures for review, mark high-quality traces for dataset creation. | trace_ids, score_name, score_value, comment |
Data Access (25 tools)
Full Langfuse API coverage for querying and managing your observability data.
Traces
| Tool | Description |
|---|---|
fetch_traces |
List traces with filters — user ID, name, tags, time range, ordering. Returns paginated results. |
fetch_trace |
Get a single trace by ID with full details including all observations (spans, generations, events). |
diff_traces |
Compare two traces side-by-side (name, user, latency, cost, tags, release, version). |
Observations
| Tool | Description |
|---|---|
fetch_observations |
List observations with filters — trace ID, type (GENERATION/SPAN/EVENT), name, time range. |
fetch_observation |
Get a single observation by ID. Returns input/output, token usage, model, latency, and cost. |
Sessions
| Tool | Description |
|---|---|
fetch_sessions |
List sessions with optional time filters. |
get_session_details |
Get full details of a session including all its traces. |
get_user_sessions |
Get sessions for a specific user. Fetches user's traces and extracts unique sessions. |
Errors
| Tool | Description |
|---|---|
find_exceptions |
Find observations with error status. For LLM output quality issues, use detect_failures instead. |
get_exception_details |
Get full error details for a trace — returns all observations with error status highlighted. |
get_error_count |
Get total error count within a time period. |
Scores
| Tool | Description |
|---|---|
fetch_scores |
List scores/evaluations with filters — trace ID, score name, time range. |
list_scores_v2 |
v2 Scores API with richer filters (session ID, dataset run ID, queue ID, config ID, operator/value, etc.). |
get_score_v2 |
Get a single score by ID via the v2 Scores API. |
Prompts
| Tool | Description |
|---|---|
list_prompts |
List all prompts in the project with optional name filter. |
get_prompt |
Fetch a specific prompt by name, version, or label. |
get_prompt_unresolved |
Fetch a prompt with placeholders/dependencies intact (debugging prompt composition). |
create_text_prompt |
Create a new text prompt version with optional labels and model config. |
create_chat_prompt |
Create a new chat prompt version with message array and optional config. |
update_prompt_labels |
Update labels for a specific prompt version (e.g., promote to "production"). |
Datasets
| Tool | Description |
|---|---|
list_datasets |
List all datasets in the project. |
get_dataset |
Get metadata for a specific dataset. |
list_dataset_items |
List items in a dataset with pagination. |
get_dataset_item |
Get a single dataset item by ID. |
create_dataset |
Create a new dataset with optional description and metadata. |
create_dataset_item |
Create or upsert a dataset item. Supports linking to source traces. |
delete_dataset_item |
Delete a dataset item by ID. |
Annotation Queues
| Tool | Description |
|---|---|
list_annotation_queues |
List all annotation queues in the project. |
create_annotation_queue |
Create a new annotation queue with attached score configs. |
get_annotation_queue |
Get a queue by ID. |
list_annotation_queue_items |
List items in a queue (optionally filtered by status). |
get_annotation_queue_item |
Get a queue item by ID. |
create_annotation_queue_item |
Add a trace or observation to a queue for review. |
update_annotation_queue_item |
Change a queue item's status (PENDING / COMPLETED). |
delete_annotation_queue_item |
Remove an item from a queue. |
create_annotation_queue_assignment |
Assign a reviewer to a queue. |
delete_annotation_queue_assignment |
Remove a reviewer from a queue. |
Metrics
| Tool | Description |
|---|---|
get_daily_metrics |
Langfuse's pre-aggregated daily rollup (trace count, cost, tokens per day). Faster than per-trace aggregation for long windows. |
Users
| Tool | Description |
|---|---|
list_users |
Top users by trace count over a time window (defaults to last 30 days). Wraps the Langfuse /metrics query API. |
Comments
| Tool | Description |
|---|---|
list_comments |
List comments attached to traces/observations/sessions/prompts, with filters. |
get_comment |
Get a single comment by ID. |
create_comment |
Create a markdown comment on a trace/observation/session/prompt. |
Models
| Tool | Description |
|---|---|
list_models |
List model definitions in Langfuse's models registry (pricing + tokenizer config). |
get_model |
Get a single model definition by ID. |
Projects
| Tool | Description |
|---|---|
list_projects |
Discovery: returns the list of configured Langfuse projects and the default project. |
Schema
| Tool | Description |
|---|---|
get_data_schema |
Get the data schema for the Langfuse project — available fields and types for traces, observations, scores, sessions. |
Sample Questions
Once connected, ask your AI assistant questions like these:
Agent & Pipeline Health
- "Which agents failed the most this week?"
- "What's the failure rate by agent name?"
- "Which agent has the worst accuracy?"
- "Show me the top 5 agents by trace volume"
- "Are any agents consistently slower than others?"
- "Compare all agents by accuracy, latency, and cost"
Accuracy & Quality
- "What's our overall accuracy this week?"
- "What's the accuracy trend by week for the last 30 days?"
- "Compare accuracy across different agents"
- "What's the daily accuracy breakdown?"
- "Which users are getting the worst accuracy?"
- "What percentage of traces have feedback scores?"
Failures & Debugging
- "Show me failure examples from today"
- "What are the most common failure patterns?"
- "Which users are seeing the most failures?"
- "What's the failure rate by agent?"
- "Are failures increasing or decreasing this week vs last?"
- "Show me traces where the LLM said 'unable to' or 'I can't'"
Token Usage
- "What are the P90 and P99 token usage stats?"
- "Which agents consume the most tokens?"
- "Compare token usage across user groups"
- "Are any users hitting unusually high token counts?"
Context Window Breaches
- "Are any generations exceeding the 128K context window?"
- "Show me traces with token usage above 200K per generation"
- "What's the breach severity distribution?"
- "Which users trigger the most context window breaches?"
Sessions & Engagement
- "What's our multi-turn rate?"
- "How deep are sessions on average?"
- "Which users have the deepest sessions?"
- "How many single-turn vs multi-turn sessions this week?"
- "What's the average session cost?"
Cost
- "How much are we spending per day this week?"
- "What's the weekly cost trend for the last 30 days?"
- "Which agent is the most expensive?"
- "Which users are costing the most?"
- "What's the average cost per trace?"
Latency
- "What's the P95 latency?"
- "Is latency getting worse over time?"
- "Which model is the slowest?"
- "Compare latency across agents"
- "Show me per-generation latency breakdown by model"
- "Which users are experiencing the highest latency?"
Annotation & Write-back
- "Score all failing traces from today with 'needs-review'"
- "Tag these trace IDs as 'high-quality' for dataset creation"
- "Mark trace abc-123 with a score of 0 and comment 'hallucinated output'"
Lookups & Exploration
- "Fetch the last 20 traces"
- "Show me trace abc-123 with all its observations"
- "List sessions for user alice@example.com"
- "What errors happened in the last 24 hours?"
- "How many errors occurred this week?"
- "Show me all prompts in the project"
- "List all datasets"
- "What fields are available on traces and observations?"
Grouping Options
The group_by parameter controls how traces are segmented in analytics tools:
| Value | What it groups by | When to use |
|---|---|---|
name |
Trace/agent name (default) | Compare performance across different agents or pipelines |
userId |
Per-user breakdown | Identify users with issues or high usage |
domain |
Email domain extracted from userId | Multi-tenant apps where users have email-based IDs (e.g., user@acme.com → acme.com) |
tag |
Trace tags | Compare across tagged environments, versions, or experiments |
Selective Tool Loading
Load only the tool groups you need to reduce token overhead:
# Only load traces and analytics tools
LANGFUSE_TOOLS=traces,analytics langfuse-mcp-server
# Only load prompts and datasets
LANGFUSE_TOOLS=prompts,datasets langfuse-mcp-server
# In Claude Code
claude mcp add \
-e LANGFUSE_PUBLIC_KEY=pk-lf-... \
-e LANGFUSE_SECRET_KEY=sk-lf-... \
-e LANGFUSE_TOOLS=traces,observations,analytics \
langfuse-mcp -- uvx langfuse-mcp-server
Available groups:
| Group | Tools | Count |
|---|---|---|
traces |
fetch_traces, fetch_trace |
2 |
observations |
fetch_observations, fetch_observation |
2 |
sessions |
fetch_sessions, get_session_details, get_user_sessions |
3 |
errors |
find_exceptions, get_exception_details, get_error_count |
3 |
scores |
fetch_scores |
1 |
prompts |
list_prompts, get_prompt, create_text_prompt, create_chat_prompt, update_prompt_labels |
5 |
datasets |
list_datasets, get_dataset, list_dataset_items, get_dataset_item, create_dataset, create_dataset_item, delete_dataset_item |
7 |
annotation_queues |
All 10 annotation queue tools | 10 |
metrics |
get_daily_metrics |
1 |
users |
list_users |
1 |
comments |
list_comments, get_comment, create_comment |
3 |
models |
list_models, get_model |
2 |
projects |
list_projects |
1 |
schema |
get_data_schema |
1 |
analytics |
All 9 analytics tools | 9 |
If LANGFUSE_TOOLS is not set, all 56 tools are loaded.
Read-Only Mode
Disable write operations (score_traces, create_dataset, create_dataset_item, delete_dataset_item, create_text_prompt, create_chat_prompt):
LANGFUSE_MCP_READ_ONLY=true
How it Compares
vs Official Langfuse MCP
| Capability | This server | Official Langfuse MCP |
|---|---|---|
| Traces & Observations | Yes | No |
| Sessions & Users | Yes | No |
| Exception Tracking | Yes | No |
| Prompt Management | Yes | Yes |
| Dataset Management | Yes | No |
| Score Write-back | Yes | No |
| Selective Tool Loading | Yes | No |
| Accuracy Metrics | Yes | No |
| Failure Detection | Yes | No |
| Token Percentiles | Yes | No |
| Cost Breakdown | Yes | No |
| Latency Analysis | Yes | No |
| Session Analytics | Yes | No |
| Context Breach Scanning | Yes | No |
| User Group Aggregation | Yes | No |
The official Langfuse MCP (5 tools) focuses on prompt management. This server provides full observability coverage plus 9 analytics tools.
vs Other Langfuse MCP Implementations
| Capability | This server | Others |
|---|---|---|
| Data access (traces, observations, sessions) | Yes | Yes |
| Prompt & dataset management | Yes | Yes |
| Exception tracking | Yes | Yes |
| Annotation queues | Yes | Partial |
| Selective tool loading | Yes | Yes |
| Multi-project support | Yes | No |
| Accuracy metrics | Yes | No |
| LLM failure detection | Yes | No |
| Token percentiles (TP50/P90/P95/P99) | Yes | No |
| Cost breakdown by group/time | Yes | No |
| Latency analysis with per-model breakdown | Yes | No |
| Multi-turn session analytics | Yes | No |
| Context window breach scanning | Yes | No |
| User/tenant group aggregation | Yes | No |
| Score write-back | Yes | No |
Other implementations provide data access (fetching raw traces, observations, sessions) using synchronous HTTP clients. This server adds a compute layer — analytics tools that aggregate, detect patterns, and compute statistics server-side — plus an async architecture that's fundamentally faster.
| Architecture | This server | Others |
|---|---|---|
| Async HTTP client | Yes (httpx.AsyncClient) | No (sync requests/httpx) |
| Concurrent observation fetching | Yes (asyncio.gather) | No (sequential per-trace) |
| TTL caching | Yes (live 5min, historical 1hr) | No |
| Adaptive rate limiting | Yes (token bucket, 429 backoff) | No (fixed sleep) |
| Batch observation queries | Yes (with auto-fallback) | No (N+1 per-trace) |
| Claude Code sub-agent | Yes (.claude/agents/) | No |
vs Platform-Embedded AI (Braintrust Loop, LangSmith Insights, Arize Alyx)
| Capability | This server | Platform AI assistants |
|---|---|---|
| Open source | Yes | No |
| Works with any MCP client | Yes | Platform-locked |
| Self-hosted Langfuse support | Yes | N/A |
| Real-time conversational | Yes | Varies (some batch-only) |
| Custom grouping/segmentation | Yes | Limited |
| Write-back to Langfuse | Yes | Platform-specific |
| Free | Yes | Paid tiers |
Architecture
Why async httpx instead of the Langfuse SDK?
The Langfuse Python SDK is excellent for writing traces (it batches and sends asynchronously in the background). But for reading traces at scale — which is what an analytics MCP server does — the SDK has a limitation: its read API is synchronous, built on the requests library.
This server uses httpx.AsyncClient instead, which enables:
- Concurrent observation fetching — fetch observations for 100 traces simultaneously via
asyncio.gather, not one-by-one - Non-blocking pagination — paginate through thousands of traces without blocking the event loop
- Rate-limited concurrency —
asyncio.Semaphore+ token bucket controls throughput withouttime.sleep()blocking
Measured impact: analyze_latency with per-generation breakdown dropped from 110s to 20s (5.4x faster) on a self-hosted instance with 2.4M daily observations.
Caching strategy
Two-tier in-memory TTL cache using cachetools.TTLCache:
| Data age | TTL | Rationale |
|---|---|---|
| Today's data | 5 minutes | Still changing, short cache |
| Historical data (before today) | 1 hour | Won't change, cache aggressively |
The cache operates at the API page level. If you call aggregate_by_group then compute_accuracy for the same time range, the second call hits cache for all trace pages — only scores are fetched fresh.
Configure via LANGFUSE_CACHE_TTL and LANGFUSE_CACHE_TTL_HISTORICAL (seconds).
Rate limiting
A global token bucket rate limiter respects Langfuse API limits:
| Instance type | Default RPM | Behavior |
|---|---|---|
| Self-hosted | Unlimited (0) | No artificial throttling. Full speed, limited only by your server. |
| Langfuse Cloud (Hobby) | 30 req/min | Conservative default for Hobby tier |
| Langfuse Cloud (Pro/Team) | Set LANGFUSE_RATE_LIMIT_RPM=1000 |
Higher throughput for paid plans |
On HTTP 429 responses, the limiter automatically halves the RPM and reads the Retry-After header. This means the server adapts to any rate limit — cloud or self-hosted — without manual configuration.
Observation fetching: batch vs concurrent
Analytics tools that need per-generation data (token percentiles, context breaches, latency breakdown) face the N+1 problem: one API call per trace to fetch its observations.
This server uses a two-step strategy:
- Try batch fetch — fetch ALL observations for the time range in one paginated call, group by traceId in memory
- If volume is too high (>5000 pages / 500K+ observations) — fall back to concurrent per-trace fetch using
asyncio.gatherwith semaphore-controlled concurrency
This means the server handles both small projects (batch is faster) and large-scale deployments (concurrent targeted fetching avoids downloading millions of irrelevant observations).
Context isolation via sub-agent
The server ships with a Claude Code custom agent at .claude/agents/langfuse-analyst.md. When a user asks a Langfuse-related question, Claude Code can delegate to this agent, which:
- Only loads Langfuse MCP tools (not other tools in the session)
- Has a specialized system prompt with tool taxonomy and workflow patterns
- Runs in an isolated context window, keeping the main conversation clean
- Returns a summary to the parent conversation
This prevents 33 tool schemas (~5000 tokens) from polluting every conversation.
Contributing
See CONTRIBUTING.md for development setup, code style guidelines, and areas for contribution.
Security
See SECURITY.md for the security policy, vulnerability reporting, and API key handling.
Code of Conduct
See CODE_OF_CONDUCT.md.
License
MIT — see LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file langfuse_mcp_server-0.1.0.tar.gz.
File metadata
- Download URL: langfuse_mcp_server-0.1.0.tar.gz
- Upload date:
- Size: 48.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3f23f15ddda4f844cd3fa46222acc1462fa762e5cffb35668a98060a892aa7a8
|
|
| MD5 |
ecb64d6dc9468ac32534f061526e7d36
|
|
| BLAKE2b-256 |
4872cda2fac3096d89e1755dc29582d02b93c69782ee41c6e34eddf8670f0c35
|
Provenance
The following attestation bundles were made for langfuse_mcp_server-0.1.0.tar.gz:
Publisher:
publish-to-pypi.yml on DrishtantKaushal/LangfuseMCP
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
langfuse_mcp_server-0.1.0.tar.gz -
Subject digest:
3f23f15ddda4f844cd3fa46222acc1462fa762e5cffb35668a98060a892aa7a8 - Sigstore transparency entry: 1350474194
- Sigstore integration time:
-
Permalink:
DrishtantKaushal/LangfuseMCP@a05817c9b929df99a691702c5f14e443635c4df4 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/DrishtantKaushal
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-to-pypi.yml@a05817c9b929df99a691702c5f14e443635c4df4 -
Trigger Event:
push
-
Statement type:
File details
Details for the file langfuse_mcp_server-0.1.0-py3-none-any.whl.
File metadata
- Download URL: langfuse_mcp_server-0.1.0-py3-none-any.whl
- Upload date:
- Size: 37.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c922c62fe812bb26d27d4f1ff881ddbdcde9e93a90430929cbed20fce123cd1e
|
|
| MD5 |
7f0c5afcd4149551e222cd9edc887807
|
|
| BLAKE2b-256 |
5dd027a5e6853662c00e89e2ad22d69309c2c411aa4645163850ab013a61e5cc
|
Provenance
The following attestation bundles were made for langfuse_mcp_server-0.1.0-py3-none-any.whl:
Publisher:
publish-to-pypi.yml on DrishtantKaushal/LangfuseMCP
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
langfuse_mcp_server-0.1.0-py3-none-any.whl -
Subject digest:
c922c62fe812bb26d27d4f1ff881ddbdcde9e93a90430929cbed20fce123cd1e - Sigstore transparency entry: 1350474288
- Sigstore integration time:
-
Permalink:
DrishtantKaushal/LangfuseMCP@a05817c9b929df99a691702c5f14e443635c4df4 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/DrishtantKaushal
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-to-pypi.yml@a05817c9b929df99a691702c5f14e443635c4df4 -
Trigger Event:
push
-
Statement type: