Last released Mar 14, 2026
Stream transformer blocks layer-by-layer from disk to GPU with a pipelined prefetch queue (disk → CPU RAM → pinned RAM → GPU).
Last released Mar 13, 2026
Lightweight runtime correctness checker for custom CUDA/Triton kernels via statistical sampling and outlier-aware comparison.
Supported by