Skip to main content

SparseMixer Algorithm

Project description

PyTorch GitHub

SparseMixer

Sparse Backpropagation for Mixture-of-Expert Training

Mixture-of-ExpertSparseMixerHow to Use?ExamplesCitationLicense

sparsemixer, a scalable gradient estimator, bridges the gap between backpropagation and sparse expert routing.

What is Mixture-of-Expert

The significant success of large-scale pre-training across various applications has underscored the imperative need for scalable models that are economically feasible. Recent advances in sparsely activated networks, prominently known as Mixture-of-Experts (MoE), have attracted widespread interest. Unlike traditional networks that densely activate all modules for all input, MoE selectively activates parts of modules to specific inputs through a process called {expert routing}, leading to notable efficiency enhancements.

Numerous methods have emerged to bridge discrete and back-propagation, and most of them are based on Straight-Through (ST). Unfortunately, all existing ST estimators are incompatible with MoE, since they require activating all experts for gradient computing, thereby eliminating all the efficiency improvements of MoE. Consequently, typical MoE training strategically neglects the gradient computation for routing, trading certain training signals for sparse computation. Despite the scalability brought by sparse computation, this trade-off may result in slow convergence and improperly trained models.

Backpropagation Made Sparse

We propose sparsemixer, a scalable gradient estimator, bridges the gap between backpropagation and sparse expert routing. Grounded in a numerical ODE framework, SparseMixer harnesses the mid-point method, a second-order ODE solver, to deliver precise gradient approximations with negligible computational overhead. Applying SparseMixer to Switch Transformer on both pre-training and machine translation tasks, SparseMixer showcases considerable performance gain, accelerating training convergence up to 2 times

How to use?

sparsemixer can be installed via pip

pip install sparsemixer

Examples

Please check the example folder for a working example.

Citation

Please cite the following papers if you found our model useful. Thanks!

Liyuan Liu, Jianfeng Gao, and Weizhu Chen (2023). Sparse Backpropagation for MoE Training. ArXiv, abs/2304.08612.

@inproceedings{liu2023bridging,
  title={Sparse Backpropagation for MoE Training},
  author = {Liu, Liyuan and Gao, Jianfeng and Chen, Weizhu},
  booktitle = {arXiv:2304.08612 [cs]},
  year={2023}
}

Liyuan Liu, Chengyu Dong, Xiaodong Liu, Bin Yu, and Jianfeng Gao (2023). Bridging Discrete and Backpropagation: Straight-Through and Beyond. ArXiv, abs/2304.08612.

@inproceedings{liu2023bridging,
  title={Bridging Discrete and Backpropagation: Straight-Through and Beyond},
  author = {Liu, Liyuan and Dong, Chengyu and Liu, Xiaodong and Yu, Bin and Gao, Jianfeng},
  booktitle = {arXiv:2304.08612 [cs]},
  year={2023}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sparsemixer-0.0.0.tar.gz (4.3 kB view details)

Uploaded Source

Built Distribution

sparsemixer-0.0.0-py2.py3-none-any.whl (4.8 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file sparsemixer-0.0.0.tar.gz.

File metadata

  • Download URL: sparsemixer-0.0.0.tar.gz
  • Upload date:
  • Size: 4.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.12

File hashes

Hashes for sparsemixer-0.0.0.tar.gz
Algorithm Hash digest
SHA256 f2dd1948ee1a6fb489b3a3d7c94c2dfffe87bf28a21a25e92aaba8cc7343dcdd
MD5 958f2c2581c373ab6988538305c246c0
BLAKE2b-256 01ec3abe7d838d6aef2495d137666dae2a1ceec3b71f7a708e912073d2930bc4

See more details on using hashes here.

File details

Details for the file sparsemixer-0.0.0-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for sparsemixer-0.0.0-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 56b60c288528cdc5d8a251a55333ea96733acf3835441656cb985998d6aa3ff4
MD5 826d9bc04ddf5710558feb4da7b72e65
BLAKE2b-256 dc1f8596bcf5a70d772c3e71fee83f0968b378012ec047fa6fa95a9a859d0766

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page