Last released Apr 11, 2024
A framework for efficient fault tolerance in large scale distributed training with pipeline template.
Last released Apr 10, 2024
A versatile model toolkit for distributed training.
Supported by