Skip to main content

Printf-style debugging for Metal compute shaders

Project description

metal-debug

Printf-style debugging for Metal compute shaders. No Xcode GPU debugger, no buffer dumps, no guessing.

Add #include "metal_debug.h" to your shader, drop in a debug buffer, see what every thread computed.

#include "metal_debug.h"

kernel void my_kernel(
    device float *A       [[buffer(0)]],
    device float *B       [[buffer(1)]],
    device float *C       [[buffer(2)]],
    device uint  *dbg_buf [[buffer(30)]],
    uint id [[thread_position_in_grid]]
) {
    float a = A[id], b = B[id];

    dbg_printf(dbg_buf, id, 0, a);       // log input A
    dbg_printf(dbg_buf, id, 1, b);       // log input B

    float result = a * b;
    dbg_watch_nan(dbg_buf, id, 2, result);  // only logs if NaN/Inf
    dbg_assert(dbg_buf, id, 3, result > 0); // GPU-side assertion

    C[id] = result;
}

Host side (ObjC):

MetalDebugSession *dbg = [[MetalDebugSession alloc] initWithDevice:device maxEntries:4096];
[encoder setBuffer:dbg.buffer offset:0 atIndex:30];
// ... dispatch ...
[dbg dump];

Output:

[metal-debug] 24 entries
  thread[0] 0: 3.5
  thread[0] 1: 2.0
  thread[1] 0: 1.2
  thread[1] 1: -0.5
  thread[1] 3: ASSERTION FAILED

Features

Feature GPU API Description
Printf dbg_printf(buf, tid, tag, val) Log float/int/uint/half/vec values
Conditional dbg_printf_if(buf, cond, tid, tag, val) Only log when condition is true
NaN watchpoint dbg_watch_nan(buf, tid, tag, val) Log only NaN/Inf values
Range watchpoint dbg_watch_range(buf, tid, tag, val, lo, hi) Log values outside range
Assertions dbg_assert(buf, tid, tag, cond) Record assertion failures
Breakpoints dbg_break(buf, tid, tag, cond) Set flag for host to detect
Stats dbg_stats(buf, tag, val) Cross-thread min/max/mean/count
Histogram dbg_histogram(buf, tag, val, lo, hi) Value distribution with bar chart
Named tags Preprocessor or host-side "loss" instead of tag=42
2D grid view Host-side Display values as threadgroup grid
Diff mode Host-side Compare two kernel runs
Zero-overhead disable #define METAL_DEBUG_DISABLE Compiles out all debug calls

How it works

  1. GPU side: metal_debug.h is a single header. Debug calls write (thread_id, tag, type, value) entries into a device buffer using atomic counters.

  2. Host side: MetalDebugSession allocates the buffer, binds it at slot 30, and reads/formats entries after execution.

  3. No recompilation needed when changing buffer size — max_entries is stored in the buffer itself and read by the GPU at runtime.

Build & test

git clone <this repo>
cd metal-debug
make test    # compiles + runs 9 test kernels on your GPU

Integration

ObjC / C++

Copy src/metal_debug.h into your project. Link runtime/MetalDebugSession.{h,m} into your app.

#import "MetalDebugSession.h"

MetalDebugSession *dbg = [[MetalDebugSession alloc]
    initWithDevice:device maxEntries:4096];
[dbg setName:@"loss" forTag:0];

[encoder setBuffer:dbg.buffer offset:0 atIndex:30];
// dispatch kernel...

[dbg dump];                    // all entries, sorted by thread
[dbg dumpTag:0];               // filter by tag
[dbg dumpGrid:0 width:8 height:8]; // 2D threadgroup view
[dbg dumpStats:0];             // min/max/mean
[dbg dumpHistogram:0 lo:0 hi:1]; // value distribution

if ([dbg breakpointHit])
    [dbg dumpBreakpoint];      // what went wrong

[dbg reset];                   // reuse for next dispatch

Swift

import Metal

let dbg = MetalDebugSession(device: device, maxEntries: 4096)
encoder.setBuffer(dbg.buffer, offset: 0, index: 30)
// dispatch kernel...
dbg.dump()

See examples/SwiftDemo/ for a complete Swift example.

Python (PyTorch MPS / Triton)

from metal_debug import MetalDebugSession

dbg = MetalDebugSession(max_entries=4096)
# pass dbg.tensor as buffer(30) to your Metal/Triton kernel
torch.mps.synchronize()
dbg.dump()

Interactive TUI

Explore debug traces interactively — filter, navigate, see grid views and stats live:

pip install textual

# Launch with demo data
python python/tui.py --demo

# Launch with a debug buffer dump
python python/tui.py trace.bin

Or from Python after a kernel dispatch:

dbg.explore(grid_width=8, grid_height=8)

Keyboard shortcuts:

Key Action
↑/↓ Navigate entries
g Show 2D grid for selected tag
a Show assertions only
b Jump to breakpoint thread
c Clear filters
m Toggle mouse (enable copy/paste)
escape Focus table from filter input
q Quit

Source preprocessor

Auto-inject the debug buffer parameter into kernel signatures and use string tags:

python3 src/metal_debug_preprocess.py my_kernel.metal -o my_kernel_debug.metal
xcrun metal -I path/to/metal-debug/src -o out.metallib my_kernel_debug.metal

Before:

kernel void foo(device float *A [[buffer(0)]], uint id [[thread_position_in_grid]]) {
    dbg(id, "value", A[id]);
}

After preprocessing:

kernel void foo(device float *A [[buffer(0)]], uint id [[thread_position_in_grid]],
                device uint *_dbg_buf [[buffer(30)]]) {
    dbg(id, 47248/*value*/, A[id]);
}

Convenience macros

If you use DBG_PARAM in your kernel signature, the short macros work:

kernel void my_kernel(device float *A [[buffer(0)]], DBG_PARAM,
                      uint id [[thread_position_in_grid]]) {
    dbg(id, 0, A[id]);              // printf
    dbg_if(id == 0, id, 1, A[id]);  // conditional
    dbg_nan(id, 2, A[id]);          // NaN watchpoint
    dbg_check(id, 3, A[id] > 0);   // assertion
    dbg_stat(0, A[id]);             // stats accumulator
    dbg_hist(0, A[id], 0, 100);     // histogram
    dbg_brk(id, 4, A[id] < 0);     // breakpoint
}

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

metal_debug-0.1.0.tar.gz (8.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

metal_debug-0.1.0-py3-none-any.whl (8.7 kB view details)

Uploaded Python 3

File details

Details for the file metal_debug-0.1.0.tar.gz.

File metadata

  • Download URL: metal_debug-0.1.0.tar.gz
  • Upload date:
  • Size: 8.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for metal_debug-0.1.0.tar.gz
Algorithm Hash digest
SHA256 c7e088fea081a31a193abcf7c24d8706ece8352cc7fc90b917318e656c40ab96
MD5 ac889d93be84ee73453dfd199e532728
BLAKE2b-256 c5242d8aa2d05141c18af942b8d6d8c94f07e75ca8c1c28165b9c9059b0127ad

See more details on using hashes here.

File details

Details for the file metal_debug-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: metal_debug-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 8.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for metal_debug-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 12eecaa3201712ac8ec55af0d322ac424103ed981334d61b21f6da217d7887aa
MD5 937c2b5cb7baf038db58d4aaf31d9bcc
BLAKE2b-256 13a6e9c3cf2f391bc5d4b963bcc4b3e3d7daebcdca10ed6ce37e6a9c69eefa3a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page