Skip to main content

No project description provided

Project description

MSLK Logo

MSLK Library

MSLK (Meta Superintelligence Labs Kernels, formerly known as FBGEMM GenAI) is a collection of high-performance kernels and optimizations built on top of PyTorch primitives for GenAI training and inference.

Installation

# Install MSLK for CUDA
pip install mslk-cuda==1.0.0
# Install MSLK for ROCm
pip install mslk-rocm==1.0.0
# Install a nightly CUDA version
pip install --pre mslk --index-url https://download.pytorch.org/whl/nightly/cu128
# Install a nightly ROCm version
pip install --pre mslk --index-url https://download.pytorch.org/whl/nightly/rocm7.1/

Release Compatibility Table

MSLK is released in accordance to the PyTorch release schedule, and each release has no guarantee to work in conjunction with PyTorch releases that are older than the one that the MSLK release corresponds to.

MSLK Release Corresponding PyTorch Release Supported Python Versions Supported CUDA Versions Supported CUDA Architectures Supported ROCm Versions Supported ROCm Architectures
1.1.0 2.11.x 3.10, 3.11, 3.12, 3.13, 3.14 12.6, 12.8, 12.9, 13.0 8.0, 9.0a, 10.0a, 12.0a 7.0, 7.1 gfx908, gfx90a, gfx942, gfx950
1.0.0 2.10.x 3.10, 3.11, 3.12, 3.13, 3.14 12.6, 12.8, 12.9, 13.0 8.0, 9.0a, 10.0a, 12.0a 7.1, 7.2 gfx908, gfx90a, gfx942, gfx950

Note that the supported CUDA/ROCm Architectures refer to compiled C++ kernels. In addition, some kernels (e.g. CUTLASS/CK) would be specific to certain architectures. Python JIT DSL based kernels (e.g. Triton) would potentially work on wider variety of architectures.

Running Benchmarks

python bench/gemm/gemm_bench.py --M 4096 --N 4096 --K 4096
python bench/quantize/quantize_bench.py --M 4096 --K 4096
python bench/conv/conv_bench.py

Running Tests

pytest test/gemm/gemm_test.py
pytest test/quantize/fp8_quantize_correctness_test.py
pytest test/conv/conv_test.py

Build From Source

We only support building on Linux. See the release compatibility table above for supported versions of Python, CUDA, ROCm.

# Clone repo
git clone https://github.com/meta-pytorch/MSLK
cd MSLK
git submodule sync
git submodule update --init --recursive
# Build and install
# The script will create a conda environment and install the required dependencies.
# The conda environment will look something like: build-py3.14-torchnightly-cuda12.9.1
./ci/integration/mslk_oss_build.bash
# After the initial environment setup, you can activate the environment and iterate faster:
conda activate build-py3.14-torchnightly-cuda12.9.1
python setup.py install

Python-Only Build

If you don't need the C++/CUDA kernels (e.g. for testing Python only changes), you can install MSLK in Python-only mode by setting the MSLK_PYTHON_ONLY environment variable. This skips the C++/CUDA compilation entirely.

MSLK_PYTHON_ONLY=1 pip install -e .

Join the MSLK community

For questions, support, news updates, or feature requests, please feel free to:

For contributions, please see the CONTRIBUTING file for ways to help out.

License

MSLK is BSD licensed, as found in the LICENSE file.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

mslk_cuda_nightly-2026.4.7-cp314-cp314-manylinux_2_28_x86_64.whl (47.6 MB view details)

Uploaded CPython 3.14manylinux: glibc 2.28+ x86-64

mslk_cuda_nightly-2026.4.7-cp313-cp313-manylinux_2_28_x86_64.whl (47.6 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.28+ x86-64

mslk_cuda_nightly-2026.4.7-cp312-cp312-manylinux_2_28_x86_64.whl (46.0 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.28+ x86-64

mslk_cuda_nightly-2026.4.7-cp311-cp311-manylinux_2_28_x86_64.whl (46.0 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.28+ x86-64

mslk_cuda_nightly-2026.4.7-cp310-cp310-manylinux_2_28_x86_64.whl (47.6 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.28+ x86-64

File details

Details for the file mslk_cuda_nightly-2026.4.7-cp314-cp314-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for mslk_cuda_nightly-2026.4.7-cp314-cp314-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 665abefc1cc7001c61a1ce26e5fec35fdb9a43ee58193fb8c638330a8519325a
MD5 cc0fbf75836e4b24f52b413eb67bab3b
BLAKE2b-256 881a6d81ca74bd3253fc7bd75ee5b98e17a22a518c44f9b9d2a919fa9f1afe24

See more details on using hashes here.

File details

Details for the file mslk_cuda_nightly-2026.4.7-cp313-cp313-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for mslk_cuda_nightly-2026.4.7-cp313-cp313-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 ee414befd70d2b92b244f05bb577a955fea9d9efe23cdf4f2357d1368fe381f2
MD5 c606358d8599e72bf40f09189a62be68
BLAKE2b-256 829d0a3e6117e19b1d8ff2529af8bf6975966eec9664a67758cfc7e839a231b1

See more details on using hashes here.

File details

Details for the file mslk_cuda_nightly-2026.4.7-cp312-cp312-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for mslk_cuda_nightly-2026.4.7-cp312-cp312-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 50fd04c6149c768edec2ab1ac5330bc4484ffd03d382688650e85b42b8f84b51
MD5 b90b147fe990c092458f16d32d92e26f
BLAKE2b-256 5f5792bb054d96f29a103e07543cbfc6e4e05107fad9ce97f27ada3501d68cc1

See more details on using hashes here.

File details

Details for the file mslk_cuda_nightly-2026.4.7-cp311-cp311-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for mslk_cuda_nightly-2026.4.7-cp311-cp311-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 989e23b9177c77cf9a7b291bfa8fee8b736cb0be8153c360e645565f158cd7cb
MD5 293ea6d121d52050e24b55ed9f9a9871
BLAKE2b-256 73c8f81e362fa80af131e8fab5797ce23965702034077fa5b8198978c7565528

See more details on using hashes here.

File details

Details for the file mslk_cuda_nightly-2026.4.7-cp310-cp310-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for mslk_cuda_nightly-2026.4.7-cp310-cp310-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 00a0a33532ece05c7d65ad87868d17ebaef42889642ba2060a712ca27979d747
MD5 e295c3f343f7c71f8b3b7d3aa56bd316
BLAKE2b-256 0473ce76aaa71c811eebbff5ad5f2c277eff6fa267e8c06f6775768eaae38aef

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page