No project description provided
Project description
SonicMoE: Accelerating MoE with IO and Tile-aware Optimizations
SonicMoE is a simple but blazing-fast Mixture-of-Experts (MoE) implementation optimized for NVIDIA Hopper and Blackwell architecture GPUs. It mainly leverages CuTeDSL and Triton to deliver state-of-the-art performance through IO-aware optimizations. These 2 figures provide an overview of activation memory usage and training throughput on Hopper GPUs (H100) and Blackwell GPUs (B300). The current version of SonicMoE builds on the Grouped GEMM kernels from the QuACK library which is itself built on CUTLASS.
News
- 04/19/2026: we release SonicMoE with Blackwell (SM100) support, built on QuACK's Grouped GEMM kernels.
📦 Installation
Prerequisites
- NVIDIA Hopper GPUs (H100, H200, etc.) or Blackwell GPUs (GB200, B200, B300, etc.)
- CUDA 12.9+ (13.0+ for B300 GPUs)
- Python 3.12+ recommended
- PyTorch 2.7+ (2.9.1 recommended)
B300 users: please manually upgrade Triton to 3.6.0 after installing PyTorch.
Install from pip
pip install sonic-moe
Install from Source
# Clone the repository
git clone https://github.com/Dao-AILab/sonic-moe.git
cd sonic-moe
# Install dependencies
pip install -r requirements.txt
# Install SonicMoE
pip install -e .
🎯 Quick Start
Basic Usage
import torch
from sonicmoe import MoE, KernelBackendMoE
from sonicmoe.enums import ActivationType
# Create MoE layer
moe = MoE(
num_experts=128, # Number of experts
num_experts_per_tok=8, # Top-k experts per token
hidden_size=4096, # Hidden dimension
intermediate_size=1536, # Expert intermediate size
activation_function=ActivationType.SWIGLU, # SwiGLU activation
add_bias=False, # Add bias to linear layers
std=0.02, # Weight initialization std
).to(device="cuda", dtype=torch.bfloat16)
# Forward pass
x = torch.randn(32768, 4096, device="cuda", dtype=torch.bfloat16)
output, aux_loss = moe(x, kernel_backend_moe=KernelBackendMoE.sonicmoe)
🧪 Testing
Run the test suite to verify correctness:
make test
Example usage
-
SonicMoE with TC top-K routing (softmax-over-topk, or
softmax(topk(logits))) and interleaved weight layout format for up-proj weightspython benchmarks/moe-cute.py --thiek 32768,4096,1024,128,8 --activation swiglu
-
SonicMoE with Qwen3-style routing (topk-over-softmax, or
topk(softmax(logits))) with topk probabilities renormalization and interleaved weight layout format for up-proj weightspython benchmarks/moe-cute.py --thiek 32768,4096,1024,128,8 --topk_over_softmax --norm_topk_probs
-
SonicMoE with token rounding routing (SwiGLU activation) and interleaved weight layout format for up-proj weights
python benchmarks/moe-token-rounding.py --routing nr --thiekq 16384,4096,1024,256,8,128
-
SonicMoE with concatenated weight layout format for up-proj weights
By default, SonicMoE expects
w1(the gated up-projection weights) in interleaved format:[gate_0, up_0, gate_1, up_1, ...]. HuggingFace models (Qwen3, Mixtral, DeepSeek, etc.) storegate_up_projin concatenated format:[gate_0, gate_1, ..., gate_{I-1}, up_0, up_1, ..., up_{I-1}].# Concatenated weight layout format with TC top-K routing python benchmarks/moe-cute.py --thiek 32768,4096,1024,128,8 --concat_layout
🤝 Contributing
We welcome contributions! Please feel free to submit issues, feature requests, or pull requests.
📄 License
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
📚 Citation
If you use SonicMoE in your research, please cite:
@misc{guo2025sonicmoeacceleratingmoeio,
title={SonicMoE: Accelerating MoE with IO and Tile-aware Optimizations},
author={Wentao Guo and Mayank Mishra and Xinle Cheng and Ion Stoica and Tri Dao},
year={2025},
eprint={2512.14080},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2512.14080},
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sonic_moe-0.1.2.post1.tar.gz.
File metadata
- Download URL: sonic_moe-0.1.2.post1.tar.gz
- Upload date:
- Size: 32.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0bebfd7355998fc60b6b1a944e450c290e7fd2c2d954f0def1e6d68c6f434816
|
|
| MD5 |
c43abc055687df5f365a45c6f491c432
|
|
| BLAKE2b-256 |
a842710f29935d71356a46a128e05fc0e72d948b1bcbca5a9199b0a3177bb21a
|
File details
Details for the file sonic_moe-0.1.2.post1-py3-none-any.whl.
File metadata
- Download URL: sonic_moe-0.1.2.post1-py3-none-any.whl
- Upload date:
- Size: 38.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
de103b91c3e5fab09c10b45c989d38b5204ce0fe1e97bec596359b59aed76932
|
|
| MD5 |
5080fceb3f763c0a18746d7f8b593835
|
|
| BLAKE2b-256 |
a3dfd2c226105528c509ac62678302cb61c41acf8cec0e5f024e230425506357
|