Unapologetically SM120-only CuTe DSL kernels for NVFP4 GEMM and MoE.
Project description
b12x
b12x is an unapologetically SM120-only CuTe DSL kernel library for Blackwell-class NVFP4 GEMM and routed Mixture-of-Experts inference.
The project is intentionally narrow. It is not a generic CUDA kernel collection for LLM inference. It is a focused package for shipping a small number of high-performance kernels and the runtime glue needed to launch them cleanly from PyTorch and sglang/vLLM.
What It Includes
- Clean, standalone CuTe DSL dense NVFP4 GEMM kernels
- A fused tensor-parallel MoE path for NVFP4 expert weights
- CuTe DSL FP4 packing, scaling, and quantization helpers
- Torch reference implementations for correctness checking
- Lightweight integration surfaces for PyTorch and
sglang
What It Does Not Try To Be
- Multi-architecture
- Backward-compatible with pre-SM120 GPUs
- A model-serving framework
- A wrapper around inherited FlashInfer runtime code
Requirements
- NVIDIA Blackwell SM120 GPU
- CUDA 13 toolchain
- CUDA 13 PyTorch build,
torch>=2.10.0 nvidia-cutlass-dsl[cu13]==4.4.1
Package Layout
b12x.gemm- Dense NVFP4 GEMM kernels
b12x.integration- Public runtime entrypoints such as
b12x_moe_fp4
- Public runtime entrypoints such as
b12x.moe.fused- The fused MoE kernel, scheduler, and reference paths
b12x.quant- Expert-weight quantization helpers
b12x.sglangsglangintegration shims
The published wheel only contains the b12x package tree. Benchmarks, experiments, and tests remain in the source repo.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file b12x-0.1.0-py3-none-any.whl.
File metadata
- Download URL: b12x-0.1.0-py3-none-any.whl
- Upload date:
- Size: 68.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.7 {"installer":{"name":"uv","version":"0.10.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Arch Linux","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d62124f2ee6d5ecc9e37b0ac3a8dd4f6bf86a4cc70e9f07b88eaad781da7de42
|
|
| MD5 |
ed1209d36acb1d73a4f8af4aaf4a2900
|
|
| BLAKE2b-256 |
49231a0ecd22ff6e7fe8e48324b4dbbe5dea393b1edc53ff94c34c47cf82f099
|