Skip to main content

MegaBlocks

Project description

:robot: MegaBlocks

MegaBlocks is a light-weight library for mixture-of-experts (MoE) training. The core of the system is efficient "dropless-MoE" (dMoE, paper) and standard MoE layers.

MegaBlocks is integrated with Megatron-LM, where we support data, expert and pipeline parallel training of MoEs. Stay tuned for tighter integration with Databricks libraries and tools!

:rocket: Performance

MegaBlocks Performance

MegaBlocks dMoEs outperform MoEs trained with Tutel by up to 40% compared to Tutel's best performing capacity_factor configuration. MegaBlocks dMoEs use a reformulation of MoEs in terms of block-sparse operations, which allows us to avoid token dropping without sacrificing hardware efficiency. In addition to being faster, MegaBlocks simplifies MoE training by removing the capacity_factor hyperparameter altogether. Compared to dense Transformers trained with Megatron-LM, MegaBlocks dMoEs can accelerate training by as much as 2.4x. Check out our paper for more details!

:building_construction: Installation

NOTE: This assumes you have numpy and torch installed.

Training models with Megatron-LM: We recommend using NGC's nvcr.io/nvidia/pytorch:23.09-py3 PyTorch container. The Dockerfile builds on this image with additional dependencies. To build the image, run docker build . -t megablocks-dev and then bash docker.sh to launch the container. Once inside the container, install MegaBlocks with pip install .. See Usage for instructions on training MoEs with MegaBlocks + Megatron-LM.

Using MegaBlocks in other packages: To install the MegaBlocks package for use in other frameworks, run pip install megablocks. For example, Mixtral-8x7B can be run with vLLM + MegaBlocks with this installation method.

Extras: MegaBlocks has optional dependencies that enable additional features.

Installing megablocks[gg] enables dMoE computation with grouped GEMM. This feature is enabled by setting the mlp_impl argument to grouped. This is currently our recommended path for Hopper-generation GPUs.

Installing megablocks[dev] allows you to contribute to MegaBlocks and test locally. Installing megablocks[testing] allows you to test via Github Actions. If you've installed megablocks[dev], you can run pre-commit install to configure the pre-commit hook to automatically format the code.

MegaBlocks can be installed with all dependencies (except for testing) via the megablocks[all] package.

:steam_locomotive: Usage

We provide scripts for pre-training Transformer MoE and dMoE language models under the top-level directory. The quickest way to get started is to use one of the experiment launch scripts. These scripts require a dataset in Megatron-LM's format, which can be created by following their instructions.

:writing_hand: Citation

@article{megablocks,
  title={{MegaBlocks: Efficient Sparse Training with Mixture-of-Experts}},
  author={Trevor Gale and Deepak Narayanan and Cliff Young and Matei Zaharia},
  journal={Proceedings of Machine Learning and Systems},
  volume={5},
  year={2023}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

megablocks-0.6.1.tar.gz (47.0 kB view details)

Uploaded Source

File details

Details for the file megablocks-0.6.1.tar.gz.

File metadata

  • Download URL: megablocks-0.6.1.tar.gz
  • Upload date:
  • Size: 47.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.9

File hashes

Hashes for megablocks-0.6.1.tar.gz
Algorithm Hash digest
SHA256 7cca096dddfb16016c10158e066780255b3f9d37f892dbfb5a62b97aeb708f77
MD5 ddf2a8c737601f3292c8ed1f28ef95fb
BLAKE2b-256 6930ab5ee79807e684c08c054e6655c1d141c712f8a2c6e16cae5b41fb465749

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page