Skip to main content

No project description provided

Project description

MSLK Logo

MSLK Library

MSLK (Meta Superintelligence Labs Kernels, formerly known as FBGEMM GenAI) is a collection of high-performance kernels and optimizations built on top of PyTorch primitives for GenAI training and inference.

Installation

# Install MSLK for CUDA
pip install mslk-cuda==1.0.0
# Install MSLK for ROCm
pip install mslk-rocm==1.0.0
# Install a nightly CUDA version
pip install --pre mslk --index-url https://download.pytorch.org/whl/nightly/cu128
# Install a nightly ROCm version
pip install --pre mslk --index-url https://download.pytorch.org/whl/nightly/rocm7.1/

Release Compatibility Table

MSLK is released in accordance to the PyTorch release schedule, and each release has no guarantee to work in conjunction with PyTorch releases that are older than the one that the MSLK release corresponds to.

MSLK Release Corresponding PyTorch Release Supported Python Versions Supported CUDA Versions Supported CUDA Architectures Supported ROCm Versions Supported ROCm Architectures
1.1.0 2.11.x 3.10, 3.11, 3.12, 3.13, 3.14 12.6, 12.8, 12.9, 13.0 8.0, 9.0a, 10.0a, 12.0a 7.0, 7.1 gfx908, gfx90a, gfx942, gfx950
1.0.0 2.10.x 3.10, 3.11, 3.12, 3.13, 3.14 12.6, 12.8, 12.9, 13.0 8.0, 9.0a, 10.0a, 12.0a 7.1, 7.2 gfx908, gfx90a, gfx942, gfx950

Note that the supported CUDA/ROCm Architectures refer to compiled C++ kernels. In addition, some kernels (e.g. CUTLASS/CK) would be specific to certain architectures. Python JIT DSL based kernels (e.g. Triton) would potentially work on wider variety of architectures.

Running Benchmarks

python bench/gemm/gemm_bench.py --M 4096 --N 4096 --K 4096
python bench/quantize/quantize_bench.py --M 4096 --K 4096
python bench/conv/conv_bench.py

Running Tests

pytest test/gemm/gemm_test.py
pytest test/quantize/fp8_quantize_correctness_test.py
pytest test/conv/conv_test.py

Build From Source

We only support building on Linux. See the release compatibility table above for supported versions of Python, CUDA, ROCm.

# Clone repo
git clone https://github.com/meta-pytorch/MSLK
cd MSLK
git submodule sync
git submodule update --init --recursive
# Build and install
# The script will create a conda environment and install the required dependencies.
# The conda environment will look something like: build-py3.14-torchnightly-cuda12.9.1
./ci/integration/mslk_oss_build.bash
# After the initial environment setup, you can activate the environment and iterate faster:
conda activate build-py3.14-torchnightly-cuda12.9.1
python setup.py install

Python-Only Build

If you don't need the C++/CUDA kernels (e.g. for testing Python only changes), you can install MSLK in Python-only mode by setting the MSLK_PYTHON_ONLY environment variable. This skips the C++/CUDA compilation entirely.

MSLK_PYTHON_ONLY=1 pip install -e .

Join the MSLK community

For questions, support, news updates, or feature requests, please feel free to:

For contributions, please see the CONTRIBUTING file for ways to help out.

License

MSLK is BSD licensed, as found in the LICENSE file.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

mslk_cuda_nightly-2026.4.14-cp314-cp314-manylinux_2_28_x86_64.whl (46.5 MB view details)

Uploaded CPython 3.14manylinux: glibc 2.28+ x86-64

mslk_cuda_nightly-2026.4.14-cp313-cp313-manylinux_2_28_x86_64.whl (46.5 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.28+ x86-64

mslk_cuda_nightly-2026.4.14-cp312-cp312-manylinux_2_28_x86_64.whl (48.0 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.28+ x86-64

mslk_cuda_nightly-2026.4.14-cp311-cp311-manylinux_2_28_x86_64.whl (48.0 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.28+ x86-64

mslk_cuda_nightly-2026.4.14-cp310-cp310-manylinux_2_28_x86_64.whl (48.0 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.28+ x86-64

File details

Details for the file mslk_cuda_nightly-2026.4.14-cp314-cp314-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for mslk_cuda_nightly-2026.4.14-cp314-cp314-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 a73ed6a4f6186fa2583675fd727d134fece74f2dc448bd8e7a5b3dc7b862e5c8
MD5 95de875a852854d7a438a520928329e7
BLAKE2b-256 75c463255459dadfc0d62cec8a1d24677caef80ef4a77198889e800d87e01626

See more details on using hashes here.

File details

Details for the file mslk_cuda_nightly-2026.4.14-cp313-cp313-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for mslk_cuda_nightly-2026.4.14-cp313-cp313-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 ec6078546b96827218dd3e0087eef8afe310f92bc19d16ca70ca6e8f73f6004f
MD5 e2603c3324de0c7765fab54f17d18568
BLAKE2b-256 a313dbeb4d5a550dee1440b7b172504bef97638203269fd1c3842e93858cc633

See more details on using hashes here.

File details

Details for the file mslk_cuda_nightly-2026.4.14-cp312-cp312-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for mslk_cuda_nightly-2026.4.14-cp312-cp312-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 1f2888e8f61b401d901b9e8575368a7b7c12cdc190e17f2b63256c0ef9806d33
MD5 c898872c84f892cd7e5f07c7cd4528c5
BLAKE2b-256 1fd2566ee1521c84a73493086e4832eebe750233ab52d1e6faae151ace3e5882

See more details on using hashes here.

File details

Details for the file mslk_cuda_nightly-2026.4.14-cp311-cp311-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for mslk_cuda_nightly-2026.4.14-cp311-cp311-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 dfc4a1d41c6c1c4bac406b521db320a6a22639fbf8fa612e2065ef25005ec9db
MD5 a278034640ef2079433232ecafc2aaa6
BLAKE2b-256 974e91afdbbd9c60f999b4d90a1e68da46dd2d113789ba321a9ace22ee37e3fe

See more details on using hashes here.

File details

Details for the file mslk_cuda_nightly-2026.4.14-cp310-cp310-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for mslk_cuda_nightly-2026.4.14-cp310-cp310-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 a1df332dd33b7af8c0d314040f3db8eecc67ea85e078289ed32848ab5e17df48
MD5 0c0e375a424986e020ada365c29785ed
BLAKE2b-256 c8e61141653d375fa365aa48e9046e710856398e8c1a40db66f19924d2fef2af

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page