The one-stop solution to easily integrate MoE & MoD layers into custom PyTorch code.
Project description
PyTorch Mixtures [PyPi]
A plug-and-play module for Mixture-of-Experts and Mixture-of-Depths in PyTorch. Your one-stop solution for inserting MoE/MoD layers into custom neural networks effortlessly!
--
Sources:
Features/Todo
- Mixture of Experts
- Top-k Routing
- Expert Choice Routing
- router-z loss
- load-balancing loss
- Testing of all MoE protocols - finished
- Mixture of Depths
- capacity-based routing around attention layer
- Testing of MoD protocol - finished
Installation
Simply using pip3 install pytorch-mixtures
will install this package. Note that this requires torch
and einops
to be pre-installed as dependencies. If you would like to build this package from source, run the following command:
git clone https://github.com/jaisidhsingh/pytorch-mixtures.git
cd pytorch-mixtures
pip3 install .
Usage
pytorch-mixtures
is designed to effortlessly integrate into your existing code for any neural network of your choice, for example
from pytorch_mixtures.routing import ExpertChoiceRouter
from pytorch_mixtures.moe_layer import MoELayer
import torch
import torch.nn as nn
# define some config
BATCH_SIZE = 16
SEQ_LEN = 128
DIM = 768
NUM_EXPERTS = 8
CAPACITY_FACTOR = 1.25
# first initialize the router
router = ExpertChoiceRouter(dim=DIM, num_experts=NUM_EXPERTS)
# choose the experts you want: pytorch-mixtures just needs a list of `nn.Module` experts
# for e.g. our experts are just linear layers
experts=[nn.Linear(DIM, DIM) for _ in range(NUM_EXPERTS)]
# supply the router and experts to the MoELayer for modularity
moe = MoELayer(
num_experts=NUM_EXPERTS,
router=router,
experts=experts,
capacity_factor=CAPACITY_FACTOR
)
# initialize some test input
x = torch.randn(B, N, D)
# pass through moe
moe_output = moe(x) # shape: [B, N, D]
You can also use this easily within your own nn.Module
classes
from pytorch_mixtures.routing import ExpertChoiceRouter
from pytorch_mixtures.moe import MoELayer
from pytorch_mixtures.utils import MHSA # multi-head self-attention layer provided for ease
import torch
import torch.nn as nn
class CustomMoEAttentionBlock(nn.Module):
def __init__(self, dim, num_heads, num_experts, capacity_factor, experts):
super().__init__()
self.attn = MHSA(dim, num_heads)
self.router = ExpertChoiceRouter(dim, num_experts)
self.moe = MoELayer(dim, router, experts, capacity_factor)
self.norm1 = nn.LayerNorm(dim)
self.norm2 = nn.LayerNorm(dim)
def forward(self, x):
x = self.norm1(self.attn(x) + x)
x = self.norm2(self.moe(x) + x)
return x
experts = [nn.Linear(768, 768) for _ in range(8)]
my_block = CustomMoEAttentionBlock(
dim=768,
num_heads=8,
num_experts=8,
capacity_factor=1.25,
experts=experts
)
# some test input
x = torch.randn(16, 128, 768)
output = my_block(x) # shape: [16, 128, 768]
Citation
If you found this package useful, please cite it in your work:
@misc{JaisidhSingh2024,
author = {Singh, Jaisidh},
title = {pytorch-mixtures},
year = {2024},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/jaisidhsingh/pytorch-mixtures}},
}
References
This package was built with the help of open-source code mentioned below:
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file pytorch-mixtures-0.1.1.tar.gz
.
File metadata
- Download URL: pytorch-mixtures-0.1.1.tar.gz
- Upload date:
- Size: 9.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.9.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 994f185e85e040f60e3a24904d94570386432679b4bbcb8941dc0994d4d7093a |
|
MD5 | de2026104e379cfa4b6f755f66e4b1ac |
|
BLAKE2b-256 | bb83ffdbbf29d6749c915892e6a60fe10d414f358ec6b7e534801b129bc474b2 |