Skip to main content

No project description provided

Project description

MSLK Logo

MSLK Library

MSLK (Meta Superintelligence Labs Kernels, formerly known as FBGEMM GenAI) is a collection of high-performance kernels and optimizations built on top of PyTorch primitives for GenAI training and inference.

Installation

# Install MSLK for CUDA
pip install mslk-cuda==1.0.0
# Install MSLK for ROCm
pip install mslk-rocm==1.0.0
# Install a nightly CUDA version
pip install --pre mslk --index-url https://download.pytorch.org/whl/nightly/cu128
# Install a nightly ROCm version
pip install --pre mslk --index-url https://download.pytorch.org/whl/nightly/rocm7.1/

Release Compatibility Table

MSLK is released in accordance to the PyTorch release schedule, and each release has no guarantee to work in conjunction with PyTorch releases that are older than the one that the MSLK release corresponds to.

MSLK Release Corresponding PyTorch Release Supported Python Versions Supported CUDA Versions Supported CUDA Architectures Supported ROCm Versions Supported ROCm Architectures
1.1.0 2.11.x 3.10, 3.11, 3.12, 3.13, 3.14 12.6, 12.8, 12.9, 13.0 8.0, 9.0a, 10.0a, 12.0a 7.0, 7.1 gfx908, gfx90a, gfx942, gfx950
1.0.0 2.10.x 3.10, 3.11, 3.12, 3.13, 3.14 12.6, 12.8, 12.9, 13.0 8.0, 9.0a, 10.0a, 12.0a 7.1, 7.2 gfx908, gfx90a, gfx942, gfx950

Note that the supported CUDA/ROCm Architectures refer to compiled C++ kernels. In addition, some kernels (e.g. CUTLASS/CK) would be specific to certain architectures. Python JIT DSL based kernels (e.g. Triton) would potentially work on wider variety of architectures.

Running Benchmarks

python bench/gemm/gemm_bench.py --M 4096 --N 4096 --K 4096
python bench/quantize/quantize_bench.py --M 4096 --K 4096
python bench/conv/conv_bench.py

Running Tests

pytest test/gemm/gemm_test.py
pytest test/quantize/fp8_quantize_correctness_test.py
pytest test/conv/conv_test.py

Build From Source

We only support building on Linux. See the release compatibility table above for supported versions of Python, CUDA, ROCm.

# Clone repo
git clone https://github.com/meta-pytorch/MSLK
cd MSLK
git submodule sync
git submodule update --init --recursive
# Build and install
# The script will create a conda environment and install the required dependencies.
# The conda environment will look something like: build-py3.14-torchnightly-cuda12.9.1
./ci/integration/mslk_oss_build.bash
# After the initial environment setup, you can activate the environment and iterate faster:
conda activate build-py3.14-torchnightly-cuda12.9.1
python setup.py install

Python-Only Build

If you don't need the C++/CUDA kernels (e.g. for testing Python only changes), you can install MSLK in Python-only mode by setting the MSLK_PYTHON_ONLY environment variable. This skips the C++/CUDA compilation entirely.

MSLK_PYTHON_ONLY=1 pip install -e .

Join the MSLK community

For questions, support, news updates, or feature requests, please feel free to:

For contributions, please see the CONTRIBUTING file for ways to help out.

License

MSLK is BSD licensed, as found in the LICENSE file.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

mslk_cuda_nightly-2026.4.1-cp314-cp314-manylinux_2_28_x86_64.whl (46.0 MB view details)

Uploaded CPython 3.14manylinux: glibc 2.28+ x86-64

mslk_cuda_nightly-2026.4.1-cp313-cp313-manylinux_2_28_x86_64.whl (46.0 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.28+ x86-64

mslk_cuda_nightly-2026.4.1-cp312-cp312-manylinux_2_28_x86_64.whl (46.0 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.28+ x86-64

mslk_cuda_nightly-2026.4.1-cp311-cp311-manylinux_2_28_x86_64.whl (47.6 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.28+ x86-64

mslk_cuda_nightly-2026.4.1-cp310-cp310-manylinux_2_28_x86_64.whl (46.0 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.28+ x86-64

File details

Details for the file mslk_cuda_nightly-2026.4.1-cp314-cp314-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for mslk_cuda_nightly-2026.4.1-cp314-cp314-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 52cfb1c2fe29fcab9ef5b2f18a60163d9ba5578923a727466e1387738ce57b64
MD5 7a949764591d623bc22d52d86672cb0f
BLAKE2b-256 79fa19aee65732ad1fc306b1096414b7f2dbadfff4c8bdcc664f60dc99010a3e

See more details on using hashes here.

File details

Details for the file mslk_cuda_nightly-2026.4.1-cp313-cp313-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for mslk_cuda_nightly-2026.4.1-cp313-cp313-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 09ea872b1e2c053d20737a5031e0d6c70f4eed5802e671d809b3f8478717c4ab
MD5 75b389b4a59e38c31e3689ff7c86f31a
BLAKE2b-256 16b04e669b133cf8bac6230fbb9000f988acfcf2f87b0b514aa4298caabfac51

See more details on using hashes here.

File details

Details for the file mslk_cuda_nightly-2026.4.1-cp312-cp312-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for mslk_cuda_nightly-2026.4.1-cp312-cp312-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 a46977e1d4fd3fd181d4ff99ad8cf1f1b81dd820cbc3279abbc03920ea26ed83
MD5 d67bfac12892cd96d35d9c38552612b7
BLAKE2b-256 7e543405f5b803542709ab00342a7962290b9c2a36444a46326cf460db914956

See more details on using hashes here.

File details

Details for the file mslk_cuda_nightly-2026.4.1-cp311-cp311-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for mslk_cuda_nightly-2026.4.1-cp311-cp311-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 144bd2cbf7fef903d5a8711a75eb579612e339d14e68c0f56673ce38e2b3ee47
MD5 fb8255900cc06901df782507beee7429
BLAKE2b-256 254ab40e205573caf57a7690635b0e74571092f412a60f0b1a6515c3ffdad7c1

See more details on using hashes here.

File details

Details for the file mslk_cuda_nightly-2026.4.1-cp310-cp310-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for mslk_cuda_nightly-2026.4.1-cp310-cp310-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 6d7ae977dfa458d70070a599f4ee8053560f455374cf623b15b5da63ff8ccf85
MD5 7d0fe231ac43d8191ef0ee463cefa145
BLAKE2b-256 a423ceace304d7303057a94063f7616643be0454d212de376e485d28577b6e9b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page