SparseMixer Algorithm
Project description
SparseMixer
Sparse Backpropagation for Mixture-of-Expert Training
Mixture-of-Expert • SparseMixer • How to Use? • Examples • Citation • License
sparsemixer, a scalable gradient estimator, bridges the gap between backpropagation and sparse expert routing.
What is Mixture-of-Expert
The significant success of large-scale pre-training across various applications has underscored the imperative need for scalable models that are economically feasible. Recent advances in sparsely activated networks, prominently known as Mixture-of-Experts (MoE), have attracted widespread interest. Unlike traditional networks that densely activate all modules for all input, MoE selectively activates parts of modules to specific inputs through a process called {expert routing}, leading to notable efficiency enhancements.Numerous methods have emerged to bridge discrete and back-propagation, and most of them are based on Straight-Through (ST). Unfortunately, all existing ST estimators are incompatible with MoE, since they require activating all experts for gradient computing, thereby eliminating all the efficiency improvements of MoE. Consequently, typical MoE training strategically neglects the gradient computation for routing, trading certain training signals for sparse computation. Despite the scalability brought by sparse computation, this trade-off may result in slow convergence and improperly trained models.
Backpropagation Made Sparse
We propose sparsemixer, a scalable gradient estimator, bridges the gap between backpropagation and sparse expert routing. Grounded in a numerical ODE framework, SparseMixer harnesses the mid-point method, a second-order ODE solver, to deliver precise gradient approximations with negligible computational overhead. Applying SparseMixer to Switch Transformer on both pre-training and machine translation tasks, SparseMixer showcases considerable performance gain, accelerating training convergence up to 2 times
How to use?
sparsemixer
can be installed via pip
pip install sparsemixer
Examples
Please check the example
folder for a working example.
Citation
Please cite the following papers if you found our model useful. Thanks!
Liyuan Liu, Jianfeng Gao, and Weizhu Chen (2023). Sparse Backpropagation for MoE Training. ArXiv, abs/2304.08612.
@inproceedings{liu2023bridging,
title={Sparse Backpropagation for MoE Training},
author = {Liu, Liyuan and Gao, Jianfeng and Chen, Weizhu},
booktitle = {arXiv:2304.08612 [cs]},
year={2023}
}
Liyuan Liu, Chengyu Dong, Xiaodong Liu, Bin Yu, and Jianfeng Gao (2023). Bridging Discrete and Backpropagation: Straight-Through and Beyond. ArXiv, abs/2304.08612.
@inproceedings{liu2023bridging,
title={Bridging Discrete and Backpropagation: Straight-Through and Beyond},
author = {Liu, Liyuan and Dong, Chengyu and Liu, Xiaodong and Yu, Bin and Gao, Jianfeng},
booktitle = {arXiv:2304.08612 [cs]},
year={2023}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file sparsemixer-0.0.0.tar.gz
.
File metadata
- Download URL: sparsemixer-0.0.0.tar.gz
- Upload date:
- Size: 4.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f2dd1948ee1a6fb489b3a3d7c94c2dfffe87bf28a21a25e92aaba8cc7343dcdd |
|
MD5 | 958f2c2581c373ab6988538305c246c0 |
|
BLAKE2b-256 | 01ec3abe7d838d6aef2495d137666dae2a1ceec3b71f7a708e912073d2930bc4 |
File details
Details for the file sparsemixer-0.0.0-py2.py3-none-any.whl
.
File metadata
- Download URL: sparsemixer-0.0.0-py2.py3-none-any.whl
- Upload date:
- Size: 4.8 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 56b60c288528cdc5d8a251a55333ea96733acf3835441656cb985998d6aa3ff4 |
|
MD5 | 826d9bc04ddf5710558feb4da7b72e65 |
|
BLAKE2b-256 | dc1f8596bcf5a70d772c3e71fee83f0968b378012ec047fa6fa95a9a859d0766 |