Local MCP server that makes performance-profiler output (NVIDIA ncu, AMD rocprof, Linux perf, Apple Metal) token-efficient for LLM coding agents.
Project description
perfdigest
Local MCP server that makes performance-profiler output token-efficient for LLM coding agents. It sits between the profiler and the agent: reads the report from disk and returns a small, structured, numeric digest the agent can act on, while keeping a pointer back to the raw report for lazy expansion.
It is a translator/router, not a judge — interpretation ("is this kernel memory-bound?") is the model's job. perfdigest only provides efficient, deterministic access to clean numeric metrics across vendors and languages.
Two operations — keep them separate
perfdigest deliberately splits what an agent does with a profiler into two tiers:
- Digest (read a report) — universal, available on every machine. A report's
origin is irrelevant to digesting it. An NVIDIA
.ncu-rep(or its CSV export) captured on a CI/remote GPU runner can be pulled to a Mac and digested there — that is how you do CUDA work on a Mac via CI. Tier-1 is never gated by local hardware, only by whether a reader's dependency imports. - Capture (produce a report) — platform-verified. You cannot run
ncuon a Mac, Metal on Linux, or hardware PMU counters under WSL2.platform_capabilitiesandsuggest_profile_commandgate this so the agent never spends context on a capture that can't run here, and redirect it to "capture elsewhere, digest here."
Backends (v1.0.0)
| Backend | format |
Domain | Capture tool | Capture OS | Digest anywhere |
|---|---|---|---|---|---|
nsight |
ncu-rep |
gpu_kernel | NVIDIA ncu |
Linux, Windows | needs ncu-report wheel |
nsight_csv |
ncu-csv |
gpu_kernel | (export of ncu) |
— | ✅ pure Python |
rocm |
rocprof-csv |
gpu_kernel | AMD rocprof |
Linux, Windows | ✅ pure Python |
linux_perf |
perf-stat-json, perf-report |
cpu_function | Linux perf |
Linux | ✅ pure Python |
metal |
metal-trace |
gpu_pass | Apple xctrace |
macOS | ✅ pure Python |
GPU backends share one vocabulary (compute_pct_peak, dram_pct_peak,
l2_hit_rate, achieved_occupancy, …); the CPU backend introduces a CPU
vocabulary (ipc, cache_miss_rate, llc_miss_rate, branch_mispredict_rate,
self_pct, …). Future: Go, Java, and other perf-critical runtimes.
Two load-bearing invariants (read before running)
- Suppress profiler stdout — write to a file. e.g.
ncu --set full -o report.ncu-rep ./app. If the profiler prints its summary to stdout, that raw table enters the agent's context before perfdigest runs — defeating the entire purpose. Nonemeans "not measured in this export", NEVER zero. A metric the export does not contain is returned asnot_available_in_this_export, not a fake0.0. A genuine0.0(e.g. zero branch divergence) is preserved. Silently returning zero = lying to the model = the worst bug this tool can have.
Tools
Tier 1 — digest (any backend, any host):
list_kernels(report_ref, format)→[{name, index, duration_us, domain}]get_metrics(report_ref, format, kernel, metrics=None)→ compact digest (metrics=None→ the backend's default core set)expand(report_ref, format, kernel, section)→ raw vendor metrics (the safety valve)
Tier 2 — capture advisory (platform-verified):
platform_capabilities()→ machine identity +can_digest(universal) vscan_capture_here(gated)suggest_profile_command(backend, target)→ the correct, platform-aware invocation, or a refusal that redirects to the tier-1 path
format is mandatory — the agent passes what it produced; a path says where
a file is, not what format it is.
Install & connect
uvx perfdigest-mcp # run from PyPI (downloadable)
uv tool install "perfdigest-mcp[nvidia]" # + NVIDIA native binary reader (Linux/Windows)
PyPI/install name is
perfdigest-mcp; the command and import package areperfdigest(e.g.uvx perfdigest-mcp,import perfdigest).
Claude Code and OpenAI Codex setup (both stdio MCP): see docs/clients.md.
Status
v1.0.0 — multi-backend registry; NVIDIA (native + CSV), AMD HIP, Linux perf (C++/Rust), and Apple Metal adapters; platform capability gating; cross-client config. Validated on the Linux/macOS/Windows CI matrix (pure-Python readers run hardware-free against committed fixtures). Real binary-capture tests are fixture-gated and skip without the device.
Related / similar projects
perfdigest was authored independently; these adjacent MCP servers occupy a nearby space and likely work well for their narrower scope. The differences are the reason perfdigest exists:
- nsys-mcp (NVIDIA Nsight Systems) — profiles binaries and aggregates trace timeline stats. perfdigest targets Nsight Compute per-kernel counters, is read-only (does not run the profiler), and spans multiple vendors.
- pprof-analyzer-mcp / Profiler-MCP — Go (and Go/Python/Java) CPU/memory profiles, often rendering flamegraphs. perfdigest is a numeric digester focused on token/context efficiency rather than visualization.
What is distinct here: the multi-backend matrix (NVIDIA + AMD HIP + CPU perf +
Metal under one neutral contract), the token-efficiency thesis with the
None≠0.0 honesty rule, and the read/capture split with platform capability
gating (digest anywhere, capture only where supported).
License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file perfdigest_mcp-1.0.0.tar.gz.
File metadata
- Download URL: perfdigest_mcp-1.0.0.tar.gz
- Upload date:
- Size: 94.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.11.21 {"installer":{"name":"uv","version":"0.11.21","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
643ed4cf265d7c6adf7c657d0da5b387f06431ec960cfc1c419d0a239ade73e2
|
|
| MD5 |
69b1472b3b0f5a2323f1b7909b92c722
|
|
| BLAKE2b-256 |
285ff135026f3108c42642583f45f81aa3f242cddc087b9a74a03c1ddeb857df
|
File details
Details for the file perfdigest_mcp-1.0.0-py3-none-any.whl.
File metadata
- Download URL: perfdigest_mcp-1.0.0-py3-none-any.whl
- Upload date:
- Size: 45.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.11.21 {"installer":{"name":"uv","version":"0.11.21","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f45b2a71d97c681170c53cdc381b741497246fe383805772f1717105d01a8f23
|
|
| MD5 |
afaf65464bfb5b70fae5fba53aee901f
|
|
| BLAKE2b-256 |
6351b15a4ea6b4d5acee2c23112594e6617b645cd4b692346be1232d03c9af63
|