Printf-style debugging for Metal compute shaders
Project description
metal-debug
Printf-style debugging for Metal compute shaders. No Xcode GPU debugger, no buffer dumps, no guessing.
Add #include "metal_debug.h" to your shader, drop in a debug buffer, see what every thread computed.
#include "metal_debug.h"
kernel void my_kernel(
device float *A [[buffer(0)]],
device float *B [[buffer(1)]],
device float *C [[buffer(2)]],
device uint *dbg_buf [[buffer(30)]],
uint id [[thread_position_in_grid]]
) {
float a = A[id], b = B[id];
dbg_printf(dbg_buf, id, 0, a); // log input A
dbg_printf(dbg_buf, id, 1, b); // log input B
float result = a * b;
dbg_watch_nan(dbg_buf, id, 2, result); // only logs if NaN/Inf
dbg_assert(dbg_buf, id, 3, result > 0); // GPU-side assertion
C[id] = result;
}
Host side (ObjC):
MetalDebugSession *dbg = [[MetalDebugSession alloc] initWithDevice:device maxEntries:4096];
[encoder setBuffer:dbg.buffer offset:0 atIndex:30];
// ... dispatch ...
[dbg dump];
Output:
[metal-debug] 24 entries
thread[0] 0: 3.5
thread[0] 1: 2.0
thread[1] 0: 1.2
thread[1] 1: -0.5
thread[1] 3: ASSERTION FAILED
Features
| Feature | GPU API | Description |
|---|---|---|
| Printf | dbg_printf(buf, tid, tag, val) |
Log float/int/uint/half/vec values |
| Conditional | dbg_printf_if(buf, cond, tid, tag, val) |
Only log when condition is true |
| NaN watchpoint | dbg_watch_nan(buf, tid, tag, val) |
Log only NaN/Inf values |
| Range watchpoint | dbg_watch_range(buf, tid, tag, val, lo, hi) |
Log values outside range |
| Assertions | dbg_assert(buf, tid, tag, cond) |
Record assertion failures |
| Breakpoints | dbg_break(buf, tid, tag, cond) |
Set flag for host to detect |
| Stats | dbg_stats(buf, tag, val) |
Cross-thread min/max/mean/count |
| Histogram | dbg_histogram(buf, tag, val, lo, hi) |
Value distribution with bar chart |
| Named tags | Preprocessor or host-side | "loss" instead of tag=42 |
| 2D grid view | Host-side | Display values as threadgroup grid |
| Diff mode | Host-side | Compare two kernel runs |
| Zero-overhead disable | #define METAL_DEBUG_DISABLE |
Compiles out all debug calls |
How it works
-
GPU side:
metal_debug.his a single header. Debug calls write(thread_id, tag, type, value)entries into a device buffer using atomic counters. -
Host side:
MetalDebugSessionallocates the buffer, binds it at slot 30, and reads/formats entries after execution. -
No recompilation needed when changing buffer size —
max_entriesis stored in the buffer itself and read by the GPU at runtime.
Build & test
git clone <this repo>
cd metal-debug
make test # compiles + runs 9 test kernels on your GPU
Integration
ObjC / C++
Copy src/metal_debug.h into your project. Link runtime/MetalDebugSession.{h,m} into your app.
#import "MetalDebugSession.h"
MetalDebugSession *dbg = [[MetalDebugSession alloc]
initWithDevice:device maxEntries:4096];
[dbg setName:@"loss" forTag:0];
[encoder setBuffer:dbg.buffer offset:0 atIndex:30];
// dispatch kernel...
[dbg dump]; // all entries, sorted by thread
[dbg dumpTag:0]; // filter by tag
[dbg dumpGrid:0 width:8 height:8]; // 2D threadgroup view
[dbg dumpStats:0]; // min/max/mean
[dbg dumpHistogram:0 lo:0 hi:1]; // value distribution
if ([dbg breakpointHit])
[dbg dumpBreakpoint]; // what went wrong
[dbg reset]; // reuse for next dispatch
Swift
import Metal
let dbg = MetalDebugSession(device: device, maxEntries: 4096)
encoder.setBuffer(dbg.buffer, offset: 0, index: 30)
// dispatch kernel...
dbg.dump()
See examples/SwiftDemo/ for a complete Swift example.
Python (PyTorch MPS / Triton)
from metal_debug import MetalDebugSession
dbg = MetalDebugSession(max_entries=4096)
# pass dbg.tensor as buffer(30) to your Metal/Triton kernel
torch.mps.synchronize()
dbg.dump()
Interactive TUI
Explore debug traces interactively — filter, navigate, see grid views and stats live:
pip install textual
# Launch with demo data
python python/tui.py --demo
# Launch with a debug buffer dump
python python/tui.py trace.bin
Or from Python after a kernel dispatch:
dbg.explore(grid_width=8, grid_height=8)
Keyboard shortcuts:
| Key | Action |
|---|---|
↑/↓ |
Navigate entries |
g |
Show 2D grid for selected tag |
a |
Show assertions only |
b |
Jump to breakpoint thread |
c |
Clear filters |
m |
Toggle mouse (enable copy/paste) |
escape |
Focus table from filter input |
q |
Quit |
Source preprocessor
Auto-inject the debug buffer parameter into kernel signatures and use string tags:
python3 src/metal_debug_preprocess.py my_kernel.metal -o my_kernel_debug.metal
xcrun metal -I path/to/metal-debug/src -o out.metallib my_kernel_debug.metal
Before:
kernel void foo(device float *A [[buffer(0)]], uint id [[thread_position_in_grid]]) {
dbg(id, "value", A[id]);
}
After preprocessing:
kernel void foo(device float *A [[buffer(0)]], uint id [[thread_position_in_grid]],
device uint *_dbg_buf [[buffer(30)]]) {
dbg(id, 47248/*value*/, A[id]);
}
Convenience macros
If you use DBG_PARAM in your kernel signature, the short macros work:
kernel void my_kernel(device float *A [[buffer(0)]], DBG_PARAM,
uint id [[thread_position_in_grid]]) {
dbg(id, 0, A[id]); // printf
dbg_if(id == 0, id, 1, A[id]); // conditional
dbg_nan(id, 2, A[id]); // NaN watchpoint
dbg_check(id, 3, A[id] > 0); // assertion
dbg_stat(0, A[id]); // stats accumulator
dbg_hist(0, A[id], 0, 100); // histogram
dbg_brk(id, 4, A[id] < 0); // breakpoint
}
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file metal_debug-0.1.0.tar.gz.
File metadata
- Download URL: metal_debug-0.1.0.tar.gz
- Upload date:
- Size: 8.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c7e088fea081a31a193abcf7c24d8706ece8352cc7fc90b917318e656c40ab96
|
|
| MD5 |
ac889d93be84ee73453dfd199e532728
|
|
| BLAKE2b-256 |
c5242d8aa2d05141c18af942b8d6d8c94f07e75ca8c1c28165b9c9059b0127ad
|
File details
Details for the file metal_debug-0.1.0-py3-none-any.whl.
File metadata
- Download URL: metal_debug-0.1.0-py3-none-any.whl
- Upload date:
- Size: 8.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
12eecaa3201712ac8ec55af0d322ac424103ed981334d61b21f6da217d7887aa
|
|
| MD5 |
937c2b5cb7baf038db58d4aaf31d9bcc
|
|
| BLAKE2b-256 |
13a6e9c3cf2f391bc5d4b963bcc4b3e3d7daebcdca10ed6ce37e6a9c69eefa3a
|