Last released Jun 6, 2025
A package for massive serving
Last released Oct 9, 2023
A fast gradient checkpointing strategy for training with memory-efficient attention (e.g., FlashAttention).
Supported by