FMS Acceleration Plugin for Mixture-of-Experts

These details have been verified by PyPI

Maintainers

ashokponkumar dushyantbehl fabianlim kmehant praveenj

These details have not been verified by PyPI

Project description

FMS Acceleration for Mixture-of-Experts

This library contains plugins to accelerate finetuning with the following optimizations:

Expert-Parallel MoE with Triton Kernels from ScatterMoE, and some extracted from megablocks.
- Megablocks kernels for gather and scatter

Plugins

Plugin	Description	Depends	Loading	Augmentation	Callbacks
scattermoe	MoE Expert Parallel with Triton Kernels from scattermoe (& megablocks)	ScatterMoE / extracted kernels from megablocks	✅		✅

Adding New Models

Our ScatterMoe implementation is a module-swap; to add new models we need to update the specifications in scattermoe_constants.py.

See the code documentation within to understand how to add new models.

Using ScatterMoE Saved Checkpoints

ScatterMoE checkpoints are saved using torch.distributed.checkpoint (DCP) and which is by default StateDictType.SHARDED_STATE_DICT:

DTensors limited support for full state dicts.
sharded state dicts are the extremely efficient, and require little comms overhead when saving.

We provide a script to recover back the original checkpoint:

currently the script is only tested in the case where DCP has saved the model in a single node.

If the checkpoint is stored in hf/checkpoint-10, call the following to have the converted checkpoint written into output_dir:

python -m fms_acceleration_moe.utils.checkpoint_utils \
    hf/checkpoint-10 output_dir \
    mistralai/Mixtral-8x7B-Instruct-v0.1

Code Extracted from Megablocks

Notes on code extraction:

we have only extracted two autograd functions GatherOp and ScatterOp,
and the associated triton kernels from backend/kernels.py; mostly the _padded_copy.

Running Benchmarks

Run the below in the top-level directory of this repo:

the scattermoe dep is not included by default, so the -x switch installs it.
consider disabling the torch memory logging to see improved speeds.

tox -e run-benches \
    -x testenv:run-benches.setenv+="MEMORY_LOGGING=nvidia" \
    -- \
    "1 2 4" 128 benchmark_outputs scenarios-moe.yaml accelerated-moe-full

or run the larger Mixtral-8x7B bench:

tox ... \
    8 128 benchmark_outputs scenarios-moe.yaml accelerated-moe-full-mixtral

NOTE: if FileNotFoundError is observed on the triton cache, similar to issues like these:

https://github.com/triton-lang/triton/issues/2688

then somehow tox is causing problems with triton and multiprocessing (there is some race condition). But the workaound is to first activate the tox env and running in bash:

# if FileNotFoundError in the triton cache is observed
# - then activate the env and run the script manually

source .tox/run-benches/bin/activate
bash scripts/run_benchmarks.sh \
    ....

Triton Kernel Dependencies

Triton Kernels are copied into scattermoe_utils and were copied from kernel hyperdrive which is a fork of cute kernels

Known Issues

These are currently some known issues not yet resolved:

should eventually remove the dependency on an external kernel-hyperdrive repository.
now support only loading sharded safetensor non-GGUF MoE checkpoints. This is a reasonable assumption since MoE checkpoints are typically above the size limit that prevents it being saved into a single checkpoint filed.
when used together with FSDP, the FSDP's clip_grad_norm will not properly compute for ScatterMoE, see issue here.

Project details

These details have been verified by PyPI

Maintainers

ashokponkumar dushyantbehl fabianlim kmehant praveenj

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.4.6

Jan 30, 2026

0.4.5

Dec 10, 2025

0.4.4

Nov 26, 2025

0.4.3

Oct 16, 2025

0.4.2

Oct 15, 2025

0.4.1

Sep 29, 2025

0.4.0

Jun 16, 2025

0.3.1

Apr 30, 2025

0.3.0

Apr 21, 2025

0.2.0

Apr 11, 2025

0.1.1

Feb 12, 2025

0.1.0

Jan 2, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

fms_acceleration_moe-0.4.6-py3-none-any.whl (51.6 kB view details)

Uploaded Jan 30, 2026 Python 3

File details

Details for the file fms_acceleration_moe-0.4.6-py3-none-any.whl.

File metadata

Download URL: fms_acceleration_moe-0.4.6-py3-none-any.whl
Upload date: Jan 30, 2026
Size: 51.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for fms_acceleration_moe-0.4.6-py3-none-any.whl
Algorithm	Hash digest
SHA256	`8ff199666431a76244dbfc47e05ee3ed31177dcb60163b2c2da4f0b78bf15354`
MD5	`64537f544be61eac2eec26f649e8caf5`
BLAKE2b-256	`cc171d0568956b7cfb80b468bbb8e574815dd70cf3662e4eef1dd9bc2abe074c`

See more details on using hashes here.

Provenance

The following attestation bundles were made for fms_acceleration_moe-0.4.6-py3-none-any.whl:

Publisher: build-and-publish.yml on foundation-model-stack/fms-acceleration

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: fms_acceleration_moe-0.4.6-py3-none-any.whl
- Subject digest: 8ff199666431a76244dbfc47e05ee3ed31177dcb60163b2c2da4f0b78bf15354
- Sigstore transparency entry: 872121119
- Sigstore integration time: Jan 30, 2026
Source repository:
- Permalink: foundation-model-stack/fms-acceleration@b93674fe66135ef78f7f2c3d0a69bc65ee53c63e
- Branch / Tag: refs/tags/v0.6.4
- Owner: https://github.com/foundation-model-stack
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: build-and-publish.yml@b93674fe66135ef78f7f2c3d0a69bc65ee53c63e
- Trigger Event: release

fms-acceleration-moe 0.4.6

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

FMS Acceleration for Mixture-of-Experts

Plugins

Adding New Models

Using ScatterMoE Saved Checkpoints

Code Extracted from Megablocks

Running Benchmarks

Triton Kernel Dependencies

Known Issues

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distribution

File details

File metadata

File hashes

Provenance