Skip to main content

Model Context Protocol server for Opik (Comet's LLM observability platform).

Project description

opik-mcp

Model Context Protocol server for Opik + Ollie. Plug your AI host (Claude Code, Cursor, VS Code Copilot, MCP Inspector) directly into your Opik workspace — read traces, log scores, save prompt versions, and ask Ollie investigative questions, all from the chat.

Built for LLM engineers who already run Opik and want to drive it from the same AI assistant they code with.

You:    "Why did the experiment 'gpt-4o-rerank-v3' regress on factuality?"
Claude: → ask_ollie → reads experiment + traces → "Three traces failed because…"

You:    "Score trace 7f2e… 0.9 on helpfulness with reason 'great recovery'."
Claude: → write(score.create) → done

Install

opik-mcp is a Python package (requires Python 3.13+). The recommended way to run it is uvx, which fetches and runs the latest published version on demand — no global install, no virtualenv juggling.

Install uv once:

curl -LsSf https://astral.sh/uv/install.sh | sh   # macOS / Linux
# or: brew install uv

You'll need two things from your Opik workspace:

  • OPIK_API_KEY — get it from comet.com/api/my/settings/.
  • COMET_WORKSPACE — your workspace name (lowercase, as it appears in the URL). E.g. https://www.comet.com/acme-ai/...COMET_WORKSPACE=acme-ai. Required for ask_ollie; optional but recommended everywhere else (used for scoping and analytics).

Pre-release note: opik-mcp (Python) is not yet published to PyPI. Until the first PyPI release lands, replace uvx opik-mcp in any snippet below with: uvx --from git+https://github.com/comet-ml/opik-mcp.git opik-mcp

Claude Code

Add the server with one command:

claude mcp add --transport stdio opik-mcp \
  --env OPIK_API_KEY=<your-key> \
  --env COMET_WORKSPACE=<your-workspace> \
  -- uvx opik-mcp

Or edit ~/.claude.json directly:

{
  "mcpServers": {
    "opik-mcp": {
      "type": "stdio",
      "command": "uvx",
      "args": ["opik-mcp"],
      "env": {
        "OPIK_API_KEY": "<your-key>",
        "COMET_WORKSPACE": "<your-workspace>"
      }
    }
  }
}

Restart Claude Code. Verify with /mcpopik-mcp should appear as connected. Then, in the chat, ask: "list my Opik projects" — Claude will call the list tool and you'll see your workspace's projects.

Cursor

Edit ~/.cursor/mcp.json (global) or .cursor/mcp.json (project), or open Cmd+Shift+J → Features → Model Context Protocol:

{
  "mcpServers": {
    "opik-mcp": {
      "type": "stdio",
      "command": "uvx",
      "args": ["opik-mcp"],
      "env": {
        "OPIK_API_KEY": "<your-key>",
        "COMET_WORKSPACE": "<your-workspace>"
      }
    }
  }
}

Reload Cursor; the green dot next to opik-mcp in the MCP panel confirms the connection. Ask in chat: "list my Opik projects".

Cursor 60s timeout. Cursor enforces a hard tool-call timeout that doesn't reset on progress notifications. Long ask_ollie turns will fail on Cursor. See Known host limits.

VS Code Copilot

.vscode/mcp.json in your workspace (or User Settings JSON):

{
  "servers": {
    "opik-mcp": {
      "type": "stdio",
      "command": "uvx",
      "args": ["opik-mcp"],
      "env": {
        "OPIK_API_KEY": "<your-key>",
        "COMET_WORKSPACE": "<your-workspace>"
      }
    }
  }
}

Reload the window; the Copilot Chat MCP indicator shows opik-mcp once the server is reachable. Ask in chat: "list my Opik projects".

MCP Inspector (manual testing)

OPIK_API_KEY=<your-key> COMET_WORKSPACE=<your-workspace> \
  npx @modelcontextprotocol/inspector uvx opik-mcp

Self-hosted Opik

Add COMET_URL_OVERRIDE (and OPIK_URL if Opik lives at a non-default path) to the same env block in your host config:

{
  "mcpServers": {
    "opik-mcp": {
      "type": "stdio",
      "command": "uvx",
      "args": ["opik-mcp"],
      "env": {
        "OPIK_API_KEY": "<your-key>",
        "COMET_URL_OVERRIDE": "https://opik.your-company.com",
        "OPIK_MCP_ANALYTICS_SOURCE": ""
      }
    }
  }
}

ask_ollie and run_experiment are available on Comet Cloud only — on self-hosted those calls will fail at dispatch, so use read / list / write directly. Setting OPIK_MCP_ANALYTICS_SOURCE="" opts your install out of the cloud-Comet source label on telemetry events.


Tools

opik-mcp exposes a small, outcome-oriented surface — six tools that cover the full lifecycle (read → annotate → curate → author → iterate).

Tool Purpose
read Universal read by id / name / opik:// URI
list Universal list with optional name filter + pagination
ask_ollie Investigate / synthesize via the Opik in-product assistant
write Universal write — log traces/spans, score, comment, save prompts, manage test suites & experiments
schema Introspect write-operation schemas (used by the LLM to construct valid payloads)
run_experiment Run an evaluation experiment end-to-end via Ollie

read

One tool for any "show me X" question. Takes an entity_type plus an id (UUID or, for nameable types, a name) or a full opik:// URI. Composite reads (trace, prompt) inline their children so a single call returns the full picture.

Supported entities: project, trace, span, test_suite, experiment, prompt. Name-based lookup is available for project, experiment, prompt, test_suite (slower — two API calls — and may return multiple matches).

read(entity_type="trace", id="7f2e3c8a-…")
read(entity_type="project", id="demo")          # name lookup
read(entity_type="trace", id="opik://traces/7f2e3c8a-…")

list

Browse a collection with optional name filter and pagination. Project-scoped types (trace, test_suite_item, prompt_version) require their parent UUID.

list(entity_type="experiment", page=1, size=25)
list(entity_type="experiment", name="rerank")          # name substring filter
list(entity_type="trace", project_id="<project-uuid>") # traces of one project

ask_ollie

For investigative questions, cross-entity synthesis, or anything that needs Opik domain expertise. Ollie has direct read access to your workspace and can execute writes (scores, comments, test-suite items, prompt versions) mid-stream when asked.

ask_ollie(query="Why are spans in project 'demo' slower this week than last?")
ask_ollie(query="Compare experiments A and B on factuality. Score the bottom 5 traces of A 0.2 with reason.")

Returns the assistant's final text plus a thread_id. Pass it back on follow-ups to preserve context — Ollie has no memory across threads.

YOLO mode (default). Writes Ollie performs mid-stream execute without a per-action confirmation. Each auto-approval is logged as a JSON audit row on the opik_mcp.audit Python logger. To require confirmation instead, set OPIK_MCP_AUTO_APPROVE=disabled — Ollie's confirm requests then surface as typed errors you can manually re-issue.

Available on Comet Cloud only.

write

Universal write dispatcher. Pass operation + data and the dispatcher validates the payload, applies the right REST verb, and returns the backend response.

Operations:

Operation What it does
trace.create Log a single trace (or a batch). Parent for spans / scores / comments.
trace.update Finalize or amend an existing trace.
span.create Log a span on an existing trace (or a batch).
score.create Attach a numeric feedback score to a trace, span, or thread.
comment.create Attach a free-text comment to a trace, span, or thread.
prompt_version.save Save a new prompt version (creates the prompt by name if missing).
test_suite.create Create an evaluation test suite.
test_suite_item.upsert Upsert items into a test suite (always the envelope shape).
experiment.create Create an experiment scoped to a test suite.
experiment_item.create Attach trace + dataset_item rows to an experiment.
write(operation="score.create", data={
  "target": "trace",
  "target_id": "7f2e3c8a-…",
  "name": "helpfulness",
  "value": 0.9,
  "reason": "great recovery"
})

schema

Inspect the exact JSON shape and required fields of any write operation before you call it — useful when you're not sure what data should look like. Returns the schema, OAuth scope, and one validated example. Pure lookup, no backend call.

schema(operation="score.create")
schema(operation="prompt_version.save")

run_experiment

Run an evaluation experiment end-to-end via Ollie. Takes a single experiment_config dict that mirrors Opik's experiment shape (prompt, test suite, scorers); Ollie executes the run and writes results back as an Opik experiment.

run_experiment(experiment_config={
  "test_suite_name": "qa-eval-v2",
  "prompt_name": "welcome-msg",
  # … see `schema(operation="experiment.create")` for the full shape
})

Available on Comet Cloud only.


Configuration

Every setting is an environment variable. Required ones in bold.

Identity / endpoint

Variable Default Notes
OPIK_API_KEY Required for ask_ollie and any authenticated read/write.
COMET_WORKSPACE Workspace name. Required for ask_ollie.
COMET_WORKSPACE_ID Optional workspace UUID. Stamped into analytics events when set so BI can join on a stable id rather than the (mutable) workspace name.
COMET_URL_OVERRIDE https://www.comet.com Set to your self-hosted Comet host, or https://dev.comet.com for staging.
OPIK_URL derived from COMET_URL_OVERRIDE + /opik/api Override only if Opik lives on a different host/path than the Comet UI.
OPIK_DEFAULT_PROJECT_NAME unset When set, the per-session instructions blob tells the LLM to pass this as project_name on every tool call unless the user names a different project.

Server / transport

Variable Default Notes
OPIK_MCP_TRANSPORT stdio stdio for host-launched, streamable-http to listen on a port.
OPIK_MCP_HOST 127.0.0.1 uvicorn bind host (streamable-http only).
OPIK_MCP_PORT 8080 uvicorn bind port (streamable-http only).
OPIK_MCP_RELOAD false true to enable uvicorn --reload (dev only).
OPIK_MCP_DEV_TOKEN dev-token-123 Bearer token the HTTP transport requires.
OPIK_MCP_LOG_LEVEL INFO stderr logger threshold.

Ollie / long calls

Variable Default Notes
OPIK_MCP_AUTO_APPROVE enabled disabled to surface Ollie's confirm requests as typed errors instead of auto-approving.
OPIK_MCP_POD_READY_TIMEOUT_S 120 Ollie pod cold-start poll cap.
OPIK_MCP_POD_READY_INTERVAL_S 2 Cold-start poll interval.
OPIK_MCP_HEARTBEAT_INTERVAL_S 15.0 Watchdog cadence — emits a notifications/progress tick when the pod is silent, keeping host timeouts at bay.
OPIK_MCP_STREAM_IDLE_TIMEOUT_S 300.0 Hard ceiling on pod silence before ask_ollie aborts. 0 disables (debug only).

Telemetry

Anonymous usage events (event type + timing only — no query content). A SHA-256 digest of your API key is included so support can find your account; the raw key never leaves the process. Opt out: OPIK_MCP_ANALYTICS_ENABLED=false.

Variable Default Notes
OPIK_MCP_ANALYTICS_ENABLED true Set to false to disable all telemetry.
OPIK_MCP_ANALYTICS_URL https://stats.comet.com/notify/event/ Override for staging.
OPIK_MCP_ANALYTICS_ENVIRONMENT prod Tag on every event (prod / staging / dev).
OPIK_MCP_ANALYTICS_SOURCE comet.com Receiver uses this to mark on_prem=False. On-prem installs should override to "" or their own domain.
OPIK_MCP_ANALYTICS_CONNECT_TIMEOUT_S 5.0 HTTP connect timeout.
OPIK_MCP_ANALYTICS_TOTAL_TIMEOUT_S 10.0 HTTP total request timeout.

Known host limits

The MCP spec lets hosts reset their tool-call timeout on notifications/progressopik-mcp emits one per Ollie SSE event plus a 15-second watchdog heartbeat. Reality is uneven:

  • Claude Code — no documented tool-call timeout; heartbeat keeps the call alive until message_end. Recommended.
  • Cursor — hard 60s timeout that does not reset on progress (upstream bug). Long Ollie turns will fail. Keep ask_ollie queries focused.
  • MCP InspectorMAX_TOTAL_TIMEOUT bounds total duration (default 60s). Raise it in the Inspector UI for long operations.

If a call gets stuck, set OPIK_MCP_LOG_LEVEL=DEBUG — heartbeat failures (usually host disconnects) are logged on opik_mcp.ask_ollie at debug level.


Troubleshooting

OPIK_API_KEY is required to use ask_ollie — the var isn't reaching the server process. In Claude Code / Cursor / VS Code, env vars only apply when inside the env block of the MCP server config, not your shell. Restart the host after editing.

ask_ollie returns "pod not ready" after 2 minutes — the Ollie pod cold-start exceeded OPIK_MCP_POD_READY_TIMEOUT_S. Retry — the second call usually hits a warm pod.

ask_ollie / run_experiment fails with a dispatch error on self-hosted Opik — those tools are available on Comet Cloud only. Use read / list / write directly on self-hosted.

Cursor call times out at 60s — Cursor's known bug, not opik-mcp. Either shorten the Ollie query, or run the same operation on Claude Code which has no hard cap.


Development

git clone git@github.com:comet-ml/opik-mcp.git
cd opik-mcp
make install        # uv sync --extra dev
make check          # lint + typecheck + test
make run-dev        # uvicorn with --reload + DEBUG logs
make inspect        # MCP Inspector against the running server

Common targets:

Target What it does
make install uv sync --extra dev
make run Run the MCP server (stdio by default).
make run-dev Run with DEBUG logging + uvicorn --reload.
make dev Run via mcp dev (Inspector dev-mode wrapper).
make inspect Launch MCP Inspector against a running server.
make test uv run pytest -q.
make test-live Live end-to-end against dev.comet.com (set OPIK_API_KEY + COMET_WORKSPACE).
make lint ruff check + format check.
make format ruff format + ruff check --fix.
make typecheck mypy.
make check lint + typecheck + test.

Repo layout:

opik-mcp/
├── src/opik_mcp/        ← server, tools, ask_ollie, analytics
├── tests/               ← pytest suites
├── scripts/             ← live-BE smoke + MCP-session smoke
├── legacy/typescript/   ← deprecated v2 TS server
├── pyproject.toml
└── Makefile

Get help


Upgrading from v2? The legacy TypeScript server still ships on npm as opik-mcp@^2 (npx -y opik-mcp); source is preserved under legacy/typescript/. See legacy/typescript/DEPRECATED.md for the support policy.


License

Apache-2.0.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

opik_mcp-0.1.0.tar.gz (407.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

opik_mcp-0.1.0-py3-none-any.whl (89.8 kB view details)

Uploaded Python 3

File details

Details for the file opik_mcp-0.1.0.tar.gz.

File metadata

  • Download URL: opik_mcp-0.1.0.tar.gz
  • Upload date:
  • Size: 407.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for opik_mcp-0.1.0.tar.gz
Algorithm Hash digest
SHA256 4fc469fbb59b544f02cfbf8b8c4d51c12f2c6b58f5143bc2f8bc4b07ab8899fc
MD5 acc8040ef80a9ca4fd4b3e62e4dd2890
BLAKE2b-256 8282b0a2a42b5494131204640ba094cfe6638de5b45361591f1f3fdf6969c3cd

See more details on using hashes here.

Provenance

The following attestation bundles were made for opik_mcp-0.1.0.tar.gz:

Publisher: python-release.yml on comet-ml/opik-mcp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file opik_mcp-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: opik_mcp-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 89.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for opik_mcp-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 6dea0d63b4573ef9db8e56e8bcffe9347cf1cbd27d01ba70c53ad8e89c1d7d7e
MD5 3b3c5e556a6c68728ea70a2bafdcd4ff
BLAKE2b-256 1a3a41ce78fbee4482bad48b758f166438ae230a12e9ca655fcc9da10a671d31

See more details on using hashes here.

Provenance

The following attestation bundles were made for opik_mcp-0.1.0-py3-none-any.whl:

Publisher: python-release.yml on comet-ml/opik-mcp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page