Skip to main content

Unapologetically SM120-only CuTe DSL kernels for NVFP4 GEMM and MoE.

Project description

b12x

b12x is an unapologetically SM120-only CuTe DSL kernel library for Blackwell-class NVFP4 GEMM and routed Mixture-of-Experts inference.

The project is intentionally narrow. It is not a generic CUDA kernel collection for LLM inference. It is a focused package for shipping a small number of high-performance kernels and the runtime glue needed to launch them cleanly from PyTorch and sglang/vLLM.

What It Includes

  • Clean, standalone CuTe DSL dense NVFP4 GEMM kernels
  • A fused tensor-parallel MoE path for NVFP4 expert weights
  • CuTe DSL FP4 packing, scaling, and quantization helpers
  • Torch reference implementations for correctness checking
  • Lightweight integration surfaces for PyTorch and sglang

What It Does Not Try To Be

  • Multi-architecture
  • Backward-compatible with pre-SM120 GPUs
  • A model-serving framework
  • A wrapper around inherited FlashInfer runtime code

Requirements

  • NVIDIA Blackwell SM120 GPU
  • CUDA 13 toolchain
  • CUDA 13 PyTorch build, torch>=2.10.0
  • nvidia-cutlass-dsl[cu13]==4.4.1

Package Layout

  • b12x.gemm
    • Dense NVFP4 GEMM kernels
  • b12x.integration
    • Public runtime entrypoints such as b12x_moe_fp4
  • b12x.moe.fused
    • The fused MoE kernel, scheduler, and reference paths
  • b12x.quant
    • Expert-weight quantization helpers
  • b12x.sglang
    • sglang integration shims

The published wheel only contains the b12x package tree. Benchmarks, experiments, and tests remain in the source repo.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

b12x-0.1.0-py3-none-any.whl (68.4 kB view details)

Uploaded Python 3

File details

Details for the file b12x-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: b12x-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 68.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.7 {"installer":{"name":"uv","version":"0.10.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Arch Linux","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for b12x-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d62124f2ee6d5ecc9e37b0ac3a8dd4f6bf86a4cc70e9f07b88eaad781da7de42
MD5 ed1209d36acb1d73a4f8af4aaf4a2900
BLAKE2b-256 49231a0ecd22ff6e7fe8e48324b4dbbe5dea393b1edc53ff94c34c47cf82f099

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page