Skip to main content

Dunetrace MCP server — expose agent signals to Claude Code, Cursor, and Codex

Project description

Dunetrace MCP Server

Query agent signals, run details, and health scores directly from Claude Code, Cursor, Codex, or any MCP-compatible client — without leaving your editor.


What it is

The MCP server wraps the Dunetrace Customer API in the Model Context Protocol. Your editor (or any LLM) can call it as a tool and ask things like:

  • "Is my research-agent healthy?"
  • "What failed in the last 24 hours?"
  • "Show me signal #42 — what happened and how do I fix it?"
  • "Is the TOOL_LOOP I'm seeing systemic or a one-off?"
  • "Walk me through run abc123 step by step."

All data is read-only. Only hashed metadata is exposed — no raw prompts, tool arguments, or model outputs ever leave your process.


Prerequisites

  • Dunetrace backend running (docker compose up -d)
  • Python 3.11+
  • The Customer API accessible at http://localhost:8002 (or set DUNETRACE_API_URL)

Install

pip install dunetrace-mcp

Or install from source (for development):

cd packages/mcp-server
pip install -e .

Client setup

Claude Code

Add to ~/.claude.json:

{
  "mcpServers": {
    "dunetrace": {
      "command": "dunetrace-mcp",
      "env": {
        "DUNETRACE_API_URL": "http://localhost:8002",
        "DUNETRACE_API_KEY": "dt_dev_test"
      }
    }
  }
}

Restart Claude Code. The dunetrace server will appear in the MCP tools list.

Cursor

Create .cursor/mcp.json in your project root (or global ~/.cursor/mcp.json):

{
  "mcpServers": {
    "dunetrace": {
      "command": "dunetrace-mcp",
      "env": {
        "DUNETRACE_API_URL": "http://localhost:8002",
        "DUNETRACE_API_KEY": "dt_dev_test"
      }
    }
  }
}

Codex / SSE clients

Run the server in SSE mode (listens on :8000 by default):

python -c "from dunetrace_mcp.server import mcp; mcp.run(transport='sse')"

Point your client's tool endpoint at http://localhost:8000/sse.

Manual test (stdio)

dunetrace-mcp

The server speaks MCP over stdin/stdout. You can pipe JSON-RPC messages manually or use the MCP Inspector.


Environment variables

Variable Default Description
DUNETRACE_API_URL http://localhost:8002 Customer API base URL
DUNETRACE_API_KEY dt_dev_test Bearer token (auth header)

For production, set DUNETRACE_API_KEY to your real API key.


Tools

list_agents

List all monitored agents with their run counts, signal counts, and failure type breakdown.

No arguments.

Example output:

AGENT                                RUNS  SIGS CRIT HIGH  LAST SEEN
───────────────────────────────────────────────────────────────────────────────
research-agent                        129    55    0   46  6h ago
                                      TOOL_LOOP×46, STEP_COUNT_INFLATION×8
billing-agent                          36    34    0   33  10h ago
                                      TOOL_LOOP×33

get_agent_signals

Get recent failure signals for a specific agent, with titles, explanations, and top fix suggestion.

Arguments:

Argument Type Default Description
agent_id string required Agent ID (from list_agents)
limit int 20 Max signals to return (max 100)
severity string Filter: CRITICAL, HIGH, MEDIUM, or LOW

Example:

🟠 [HIGH] TOOL_LOOP  conf=90%  step=7  6h ago
   Tool loop detected: `web_search` called 6× in steps 2–7
   What: The agent called web_search 6 times with identical args.
   Fix:  Deduplicate `web_search` calls — identical args hash seen 6×

get_signal_detail

Full detail for a specific signal: complete evidence dict, impact statement, and all suggested fixes with code snippets.

Arguments:

Argument Type Default Description
signal_id int required Integer signal ID (visible in search_signals output)
agent_id string Agent ID (optional — omit to search all agents)

Example output:

🟠 Signal #495
Type:      TOOL_LOOP
Severity:  HIGH  confidence=90%
Agent:     research-agent  vabcd1234
Run:       019e217d-bd24-…
Step:      7
Detected:  2026-05-13 13:19 UTC  (6h ago)

What happened:
  The agent called `web_search` 6 times in steps 2–7 with identical
  arguments every time. It is not tracking which queries it has tried.

Why it matters:
  Looping agents burn tokens without producing value. A 5-step loop at
  typical gpt-4o pricing costs $0.15–$0.30 with nothing to show for it.

Evidence (hashed/structural data):
  tool: web_search
  count: 6
  args_identical: True
  args_hashes: ['ffa8f58f', 'ffa8f58f', …+4 more]

Suggested fixes (2):
  1. Deduplicate `web_search` calls — identical args hash seen 6×
     ```python
     seen = set()
     if args not in seen:
         seen.add(args)
         call_tool(args)
     ```
  2. Set a hard step limit as a circuit breaker

Privacy note: The args_hashes field contains SHA-256 hashes of the original tool arguments — the raw arguments never leave your agent process.


get_agent_health

Health score (0–100) and per-component breakdown for an agent.

Arguments:

Argument Type Default Description
agent_id string required Agent ID

Scoring components:

Component Max points Measures
failure_rate 40 % of runs that triggered any signal
loop_avoidance 25 % of runs without a tool loop
token_efficiency 20 Avg prompt tokens vs. per-agent baseline
latency 15 Avg LLM latency vs. per-agent baseline

Requires ≥3 runs for a score. Token/latency components return neutral (half points) until ≥30 runs accumulate a baseline.

Example output:

🔴 Health score for research-agent: 41/100
   Sample runs:     24
   Baseline ready:  no (need ≥30 runs for token/latency)

Component breakdown:
  failure_rate          7/40  (current: 83.3 % runs with failures)
  loop_avoidance        4/25  (current: 83.3 % runs with loops)
  token_efficiency     15/20
  latency              15/15  (current: 3005.0 avg LLM latency ms)

get_run_detail

Full detail for a specific run: metadata, detected signals with fixes, and a step-by-step event timeline.

Arguments:

Argument Type Default Description
run_id string required Run UUID
agent_id string Optional — not used for the lookup, reserved for future use

Example output:

Run: 019e217d-bd24-7d72-a8be-4715c2dcf385
Agent:    research-agent  vabcd1234
Started:  2026-05-13 13:19 UTC  (6h ago)
Duration: 5.6s
Steps:    8
Exit:     run.completed

Signals (1):
  🟠 TOOL_LOOP  [HIGH]  conf=90%  step=7
     Tool loop detected: `web_search` called 6× in steps 2–7
     Fix: Deduplicate `web_search` calls — identical args hash seen 6×

Event timeline (18 events):
  [  0]    +0.0s  run.started
  [  1]    +0.0s  llm.called           model=gpt-4o-mini  p=512 c=98  800ms
  [  2]    +2.8s  tool.called          tool=web_search  ok=True  200ms
  [  3]    +2.8s  tool.called          tool=web_search  ok=True  200ms
  …
  [  8]    +3.1s  run.completed        final_answer

Event timeline is capped at 40 entries; longer runs show a count of remaining events.


search_signals

Search signals across all agents with combined filters. Useful for cross-agent audits or time-bounded investigations.

Arguments:

Argument Type Default Description
severity string Filter: CRITICAL, HIGH, MEDIUM, or LOW
failure_type string Detector name e.g. TOOL_LOOP, COST_SPIKE, CONTEXT_BLOAT
since_hours int Only signals from the last N hours
agent_id string Restrict to one agent; searches all agents if omitted
limit int 30 Max signals to return (max 200)

Example:

# All CRITICAL signals in the past 24 hours
search_signals(severity="CRITICAL", since_hours=24)

# All TOOL_LOOP signals for one agent
search_signals(failure_type="TOOL_LOOP", agent_id="research-agent")

Example output:

Signals (3 shown, 6 matched):

🟠     6h ago  [HIGH    ]  TOOL_LOOP                       agent=research-agent
   id=495  run=019e217d-bd2…  conf=90%
   Tool loop detected: `web_search` called 6× in steps 2–7

get_agent_patterns

Analyze failure patterns for an agent: systemic vs. one-off classification, daily signal trend, failure rates by type, and input hashes that consistently trigger failures.

Arguments:

Argument Type Default Description
agent_id string required Agent ID

Systemic classification: a failure is marked SYSTEMIC when it has appeared in a high proportion of runs over an extended window. A ⚠ Occasional label means isolated incidents.

Input patterns: when the same input hash (a structural fingerprint of the user query) reliably triggers a specific failure type, it appears in the "Input patterns" section. Only patterns with a hit rate ≥50% are shown — lower rates are noise.

Example output:

Failure patterns for: research-agent

Systemic patterns:
  🚨 SYSTEMIC  TOOL_LOOP  12/12 runs (100%)
            first seen 5d ago  last seen 6h ago

Daily signal counts (last 7 days):
  FAILURE TYPE                    05-07  05-08  05-09  05-12  05-13
  ─────────────────────────────────────────────────────────────────
  TOOL_LOOP                           1      2      1      5      5

Failure rate by type:
  TOOL_LOOP     ████████████████████  100%  (5/5 runs on 2026-05-13)

Input patterns that reliably trigger failures (rate ≥ 50%):
  hash=e47617d3  TOOL_LOOP  38/39 runs (97%)
    → This input hash consistently causes this failure.

summarize_agent

One-shot diagnosis of an agent. Combines health score, failure breakdown, recent signals with their fixes, and health component bars. Start here before diving deeper.

Arguments:

Argument Type Default Description
agent_id string required Agent ID

Example output:

═══ Agent summary: research-agent ═══

Health score:  🔴 41/100
Total runs:    129
Total signals: 55
Last seen:     6h ago

Failure breakdown:
  TOOL_LOOP                             46 signals  (36% of runs)
  STEP_COUNT_INFLATION                   8 signals  (6% of runs)

Most recent signals:
  🟠 TOOL_LOOP  conf=90%  6h ago  run=019e217d…
     The agent called `web_search` 6 times with identical args.
     Impact: Looping agents burn tokens without producing value.
     Fix: Deduplicate `web_search` calls — identical args hash seen 6×

Health components:
  failure_rate         ███░░░░░░░░░░░░░░░░░  7/40
  loop_avoidance       ███░░░░░░░░░░░░░░░░░  4/25
  token_efficiency     ███████████████░░░░░  15/20
  latency              ████████████████████  15/15

get_agent_runs

List recent runs for an agent with durations and signal status.

Arguments:

Argument Type Default Description
agent_id string required Agent ID
limit int 20 Max runs to return (max 100)

Example output:

Recent runs for: research-agent

RUN ID       STARTED                  DUR  STEPS SIGS  STATUS
──────────────────────────────────────────────────────────────────────
019e217d-bd2 6h ago                   5.6s     8  🔴 1
019e2163-a89 6h ago                   4.7s     8  🔴 1
019e2163-66f 6h ago                   4.8s     4  ✅  0

get_instrumentation_guide

Get a quick-start code snippet for instrumenting an agent with Dunetrace. Works for Python, LangChain/LangGraph, TypeScript, and plain tool-call tracking.

Arguments:

Argument Type Default Description
framework string required Framework name: python, langchain, langgraph, typescript, or tools

Aliases accepted: lc, lc-graph, lc_graph, langgraph, ts, js, javascript, node, tracking, tool_calls (and more).


Typical workflows

Triage an alert

You:   I got a Slack alert for TOOL_LOOP on research-agent. What's happening?

Agent: [calls summarize_agent("research-agent")]
       Health is 41/100. TOOL_LOOP is systemic — 46 signals across 36% of
       runs. The fix is to deduplicate web_search calls (identical args hash
       seen 6× per run). Signal #495 is the most recent. Want the code?

You:   Yes, show me signal #495.

Agent: [calls get_signal_detail(495, "research-agent")]
       Here's the evidence and fix code…

Investigate a specific run from Slack

The Slack alert includes a "View Run" link: http://localhost:3000/runs/<run_id>. You can also pass the run ID directly:

You:   Check run 019e217d-bd24-7d72-a8be-4715c2dcf385

Agent: [calls get_run_detail("019e217d-…")]
       Duration 5.6s, 8 steps. TOOL_LOOP at step 7 — web_search called
       6× with identical args. Fix: add a dedup set.

Cross-agent audit

You:   Are there any CRITICAL signals in the last 24 hours?

Agent: [calls search_signals(severity="CRITICAL", since_hours=24)]
       2 CRITICAL signals: PROMPT_INJECTION_SIGNAL on billing-agent (2h ago)
       and COST_SPIKE on data-agent (5h ago). Want details on either?

Before a deploy

You:   Is research-agent stable enough to deploy to production?

Agent: [calls get_agent_patterns("research-agent")]
       TOOL_LOOP is systemic — 100% of runs in the last 7 days.
       Recommending you fix the dedup issue before deploying.

Privacy

All data served by the MCP tools comes from the Dunetrace Customer API, which stores only hashed or structural metadata:

  • Tool arguments → SHA-256 hash (shown as args_hashes)
  • LLM prompts and outputs → SHA-256 hash (never stored)
  • Token counts, latency, step counts → stored as plain numbers
  • Run and signal metadata → stored as plain text

The evidence dict in signal responses contains the hashed fingerprints the detector used — not the original content.


Tests

cd packages/mcp-server
python -m pytest tests/ -v

83 tests, all offline — no running stack required.


Source

packages/mcp-server/

dunetrace_mcp/
  __init__.py
  client.py      # thin httpx wrapper around the Customer API
  server.py      # FastMCP server with 10 tools + 6 doc resources
tests/
  test_tools.py  # 83 unit tests (all offline)
pyproject.toml
README.md

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dunetrace_mcp-0.1.0.tar.gz (26.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dunetrace_mcp-0.1.0-py3-none-any.whl (18.3 kB view details)

Uploaded Python 3

File details

Details for the file dunetrace_mcp-0.1.0.tar.gz.

File metadata

  • Download URL: dunetrace_mcp-0.1.0.tar.gz
  • Upload date:
  • Size: 26.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for dunetrace_mcp-0.1.0.tar.gz
Algorithm Hash digest
SHA256 73114e32bda8f5cb49db5d95533453723e9184719a84df2ad50cc2036a3205d5
MD5 14b94dd567525f62e8c791e85539b91b
BLAKE2b-256 684a8fc321e9263bc5ccb3fc9ace48414bd7623e6dbe847a5b7e4afe62c25f8b

See more details on using hashes here.

File details

Details for the file dunetrace_mcp-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: dunetrace_mcp-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 18.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for dunetrace_mcp-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 66b1623131feb033e74353cbb0da51ce0db6a984359e787a75b7c791373975bf
MD5 58df4cac61b3d98f11e590b1c7fe60ae
BLAKE2b-256 a86b7fa2851e6bdc9d3af22adf009f2b1155501e2af887e43eeb0d3b1e22b5f9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page