Last released Oct 9, 2023
A fast gradient checkpointing strategy for training with memory-efficient attention (e.g., FlashAttention).
Supported by