MegaBlocks
Project description
:robot: MegaBlocks
MegaBlocks is a light-weight library for mixture-of-experts (MoE) training. The core of the system is efficient "dropless-MoE" (dMoE, paper) and standard MoE layers.
MegaBlocks is built on top of Megatron-LM, where we support data, expert and pipeline parallel training of MoEs. We're working on extending more frameworks to support MegaBlocks.
:rocket: Performance
MegaBlocks dMoEs outperform MoEs trained with Tutel by up to 40% compared to Tutel's best performing capacity_factor
configuration. MegaBlocks dMoEs use a reformulation of MoEs in terms of block-sparse operations, which allows us to avoid token dropping without sacrificing hardware efficiency. In addition to being faster, MegaBlocks simplifies MoE training by removing the capacity_factor
hyperparameter alltogether. Compared to dense Transformers trained with Megatron-LM, MegaBlocks dMoEs can accelerate training by as much as 2.4x. Check out our paper for more details!
:building_construction: Installation
We recommend using NGC's nvcr.io/nvidia/pytorch:23.01-py3
PyTorch container. The Dockerfile builds on this image with additional dependencies. To build the image, run docker build . -t megablocks-dev
and then bash docker.sh
to launch the container.
Note that the block-sparse kernels used to implement dMoE are currently limited to A100 GPUs.
:steam_locomotive: Usage
We provide scripts for pre-training Transformer MoE and dMoE language models under the top-level directory. The quickest way to get started is to use one of the experiment launch scripts. These scripts require a dataset in Megatron-LM's format, which can be created by following their instructions.
:writing_hand: Citation
@article{megablocks-arxiv,
author = {Trevor Gale and Deepak Narayanan and Cliff Young and Matei Zaharia},
title = {MegaBlocks: Efficient Sparse Training with Mixture-of-Experts},
journal = {CoRR},
volume = {abs/2211.15841},
year = {2022},
}
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.