Skip to main content

No project description provided

Project description

MSLK Logo

MSLK Library

MSLK (Meta Superintelligence Labs Kernels, formerly known as FBGEMM GenAI) is a collection of high-performance kernels and optimizations built on top of PyTorch primitives for GenAI training and inference.

Installation

# Install MSLK for CUDA
pip install mslk-cuda==1.0.0
# Install MSLK for ROCm
pip install mslk-rocm==1.0.0
# Install a nightly CUDA version
pip install --pre mslk --index-url https://download.pytorch.org/whl/nightly/cu128
# Install a nightly ROCm version
pip install --pre mslk --index-url https://download.pytorch.org/whl/nightly/rocm7.1/

Release Compatibility Table

MSLK is released in accordance to the PyTorch release schedule, and each release has no guarantee to work in conjunction with PyTorch releases that are older than the one that the MSLK release corresponds to.

MSLK Release Corresponding PyTorch Release Supported Python Versions Supported CUDA Versions Supported CUDA Architectures Supported ROCm Versions Supported ROCm Architectures
1.1.0 2.11.x 3.10, 3.11, 3.12, 3.13, 3.14 12.6, 12.8, 12.9, 13.0 8.0, 9.0a, 10.0a, 12.0a 7.0, 7.1 gfx908, gfx90a, gfx942, gfx950
1.0.0 2.10.x 3.10, 3.11, 3.12, 3.13, 3.14 12.6, 12.8, 12.9, 13.0 8.0, 9.0a, 10.0a, 12.0a 7.1, 7.2 gfx908, gfx90a, gfx942, gfx950

Note that the supported CUDA/ROCm Architectures refer to compiled C++ kernels. In addition, some kernels (e.g. CUTLASS/CK) would be specific to certain architectures. Python JIT DSL based kernels (e.g. Triton) would potentially work on wider variety of architectures.

Running Benchmarks

python bench/gemm/gemm_bench.py --M 4096 --N 4096 --K 4096
python bench/quantize/quantize_bench.py --M 4096 --K 4096
python bench/conv/conv_bench.py

Running Tests

pytest test/gemm/gemm_test.py
pytest test/quantize/fp8_quantize_correctness_test.py
pytest test/conv/conv_test.py

Build From Source

We only support building on Linux. See the release compatibility table above for supported versions of Python, CUDA, ROCm.

# Clone repo
git clone https://github.com/meta-pytorch/MSLK
cd MSLK
git submodule sync
git submodule update --init --recursive
# Build and install
# The script will create a conda environment and install the required dependencies.
# The conda environment will look something like: build-py3.14-torchnightly-cuda12.9.1
./ci/integration/mslk_oss_build.bash
# After the initial environment setup, you can activate the environment and iterate faster:
conda activate build-py3.14-torchnightly-cuda12.9.1
python setup.py install

Python-Only Build

If you don't need the C++/CUDA kernels (e.g. for testing Python only changes), you can install MSLK in Python-only mode by setting the MSLK_PYTHON_ONLY environment variable. This skips the C++/CUDA compilation entirely.

MSLK_PYTHON_ONLY=1 pip install -e .

Join the MSLK community

For questions, support, news updates, or feature requests, please feel free to:

For contributions, please see the CONTRIBUTING file for ways to help out.

License

MSLK is BSD licensed, as found in the LICENSE file.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

mslk_cuda_nightly-2026.4.20-cp314-cp314-manylinux_2_28_x86_64.whl (46.5 MB view details)

Uploaded CPython 3.14manylinux: glibc 2.28+ x86-64

mslk_cuda_nightly-2026.4.20-cp313-cp313-manylinux_2_28_x86_64.whl (46.5 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.28+ x86-64

mslk_cuda_nightly-2026.4.20-cp312-cp312-manylinux_2_28_x86_64.whl (48.0 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.28+ x86-64

mslk_cuda_nightly-2026.4.20-cp311-cp311-manylinux_2_28_x86_64.whl (46.5 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.28+ x86-64

mslk_cuda_nightly-2026.4.20-cp310-cp310-manylinux_2_28_x86_64.whl (48.0 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.28+ x86-64

File details

Details for the file mslk_cuda_nightly-2026.4.20-cp314-cp314-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for mslk_cuda_nightly-2026.4.20-cp314-cp314-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 2e86b3b3ef78e95afa55cc2551b1a7212665662bad787d5768c40c10e1fa2afd
MD5 bf1d173f547cb4e8f7b3c87c60671f0a
BLAKE2b-256 332deb6c9b8e4275a1b277accf7d5f0f54e575d708eed4595fbd47b53aec090d

See more details on using hashes here.

File details

Details for the file mslk_cuda_nightly-2026.4.20-cp313-cp313-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for mslk_cuda_nightly-2026.4.20-cp313-cp313-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 827d52e36627ebb435a2d9ca5cfcbf2362f0ca5e8b726ae26080858c557ffc5c
MD5 d4ea0f426a15cee351942ffc1816b057
BLAKE2b-256 99d5bedba89246abca49cf645197188e8cbbf483abffe3a89b452a2d53ff48d9

See more details on using hashes here.

File details

Details for the file mslk_cuda_nightly-2026.4.20-cp312-cp312-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for mslk_cuda_nightly-2026.4.20-cp312-cp312-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 47a4fd887539181b30cd1954983f7ee9092fa8b05d0c6c92b04fbd1667c5d54f
MD5 def55c3bda7f997c37c6c0588459405e
BLAKE2b-256 f4ceffb28c4d0209c241a8cc1fb46c3cbb7f89aca4a4fe851dcaa8cfcc75c702

See more details on using hashes here.

File details

Details for the file mslk_cuda_nightly-2026.4.20-cp311-cp311-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for mslk_cuda_nightly-2026.4.20-cp311-cp311-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 b755687afbadc47c85d5d70e4ac63659cddd8f272d83614d8997cc1385b6b068
MD5 a0234c4a43a0815aac65a221d57510ae
BLAKE2b-256 f7f3d5c47dc9d2725c1620476d9b0748edd802f9aa08a1de003f8d2bee36d80b

See more details on using hashes here.

File details

Details for the file mslk_cuda_nightly-2026.4.20-cp310-cp310-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for mslk_cuda_nightly-2026.4.20-cp310-cp310-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 6195dc1f762bdd6440ee85f59bbd878b59e80ff979b76cbd2bf91f9e24d5218f
MD5 f874dbead5ded855920aaedbab7d053c
BLAKE2b-256 2aeafa4d9fa01e870ef53f53dbff74009a9842fbfd0d3303c88d065695d763c2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page