Last released May 2, 2026
Serving LLMs at Scale
Last released Sep 1, 2024
Forward-only flash-attn with CUDA 12.4
Supported by