Kernel Library for SGLang

These details have not been verified by PyPI

Project links

Environment
- GPU :: NVIDIA CUDA
License
- OSI Approved :: Apache Software License
Programming Language
- Python :: 3

Project description

sglang-kernel (prior sgl-kernel)

Kernel Library for LLM inference engines

sglang-kernel provides optimized compute primitives for LLM inference engines, enabling efficient inference for large language models and vision-language models through custom kernel operations. The source tree remains under the sgl-kernel/ directory and the Python import path remains sgl_kernel.

Installation

Requires torch == 2.11.0

# Latest version
pip3 install sglang-kernel --upgrade

Building from Source

Requires

CMake ≥3.31,
Python ≥3.10
scikit-build-core
ninja(optional)

Use Makefile to build from the sgl-kernel source tree

make build

Limit build resource usage (CPU / parallelism)

By default, make build uses all available CPU cores. You can override build parallelism and NVCC compile threads:

# Limit parallel jobs (controls both make and cmake parallelism)
make build MAX_JOBS=2

# Additionally limit NVCC internal threads (reduces CPU and peak memory)
make build MAX_JOBS=2 CMAKE_ARGS="-DSGL_KERNEL_COMPILE_THREADS=1"

Contribution

Steps to add a new kernel:

Implement the kernel in csrc
Expose the interface in include/sgl_kernel_ops.h
Create torch extension in csrc/common_extension.cc
Update CMakeLists.txt to include new CUDA source
Expose Python interface in python
Add test and benchmark

Development Tips

When creating torch extensions, add the function definition with m.def, and device binding with m.impl:

How to write schema: Schema reference

// We need def with schema here for torch.compile
m.def(
 "bmm_fp8(Tensor A, Tensor B, Tensor! D, Tensor A_scale, Tensor B_scale, Tensor workspace_buffer, "
 "int cublas_handle) -> ()");
m.impl("bmm_fp8", torch::kCUDA, &bmm_fp8);

Adapting C++ Native Types for Torch Compatibility

Third-party C++ libraries often use int and float, but PyTorch bindings require int64_t and double due to Python's type mapping.

Use make_pytorch_shim from sgl_kernel_torch_shim.h to handle conversions automatically:

// Add type conversion for int -> int64_t
template <>
struct pytorch_library_compatible_type<int> {
  using type = int64_t;
  static int convert_from_type(int64_t arg) {
    TORCH_CHECK(arg <= std::numeric_limits<int>::max(), "value too large");
    TORCH_CHECK(arg >= std::numeric_limits<int>::min(), "value too small");
    return arg;
  }
};

// Wrap your function
m.impl("fwd", torch::kCUDA, make_pytorch_shim(&mha_fwd));

Testing & Benchmarking

Add pytest tests in tests/, if you need to skip some test, please use @pytest.mark.skipif

@pytest.mark.skipif(
    skip_condition, reason="Nvfp4 Requires compute capability of 10 or above."
)

Add benchmarks using triton benchmark in benchmark/

We recommend using triton.testing.do_bench_cudagraph for kernel benchmarking:

Compared to triton.testing.do_bench, do_bench_cudagraph provides:
- Reduced CPU overhead impact for more accurate kernel performance measurements
- Incorporation of PDL (Programmatic Dependent Launch) effects into individual kernel results
- More realistic performance data on PDL-supported architectures (SM >= 90)
Run test suite

Kernel Size Analysis

Analyze CUDA kernel sizes in compiled wheel files to identify oversized kernels and template-instantiation bloat:

This tool requires cubloaty (install with pip install cubloaty) to work.

# Install cubloaty
pip install cubloaty

# Analyze a wheel file
python analyze_whl_kernel_sizes.py path/to/sglang_kernel-*.whl

# Custom output file
python analyze_whl_kernel_sizes.py path/to/sglang_kernel-*.whl --output my_analysis.txt

The tool generates:

A text report with:
- Kernel groups (by name prefix)
- Individual kernel sizes (sorted by size)

Use this to identify large kernels and potential template instantiation bloat.

FAQ

Q: Segmentation fault with CUDA 12.6
A: Update ptxas to 12.8, reference: segment fault error

Project details

These details have not been verified by PyPI

Project links

Environment
- GPU :: NVIDIA CUDA
License
- OSI Approved :: Apache Software License
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

0.4.4

Jun 17, 2026

This version

0.4.3

May 26, 2026

0.4.2.post2

May 15, 2026

0.4.2.post1

May 5, 2026

0.4.2

Apr 30, 2026

0.4.1.post1

Apr 26, 2026

0.4.1

Apr 3, 2026

0.4.0

Mar 12, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

sglang_kernel-0.4.3-cp310-abi3-manylinux2014_x86_64.whl (322.7 MB view details)

Uploaded May 26, 2026 CPython 3.10+

sglang_kernel-0.4.3-cp310-abi3-manylinux2014_aarch64.whl (189.7 MB view details)

Uploaded May 26, 2026 CPython 3.10+

File details

Details for the file sglang_kernel-0.4.3-cp310-abi3-manylinux2014_x86_64.whl.

File metadata

Download URL: sglang_kernel-0.4.3-cp310-abi3-manylinux2014_x86_64.whl
Upload date: May 26, 2026
Size: 322.7 MB
Tags: CPython 3.10+
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.18

File hashes

Hashes for sglang_kernel-0.4.3-cp310-abi3-manylinux2014_x86_64.whl
Algorithm	Hash digest
SHA256	`636a2b05fdef7cc0f7d9af98a6eec0d3470845d6ea32251c52d359d877893c87`
MD5	`affb36e48a3d6d0daeb9b632811451d6`
BLAKE2b-256	`b906423a0f8f97d0fdfa12a0157b95de2d2c96281acb249c5b8204e534a8cd39`

See more details on using hashes here.

File details

Details for the file sglang_kernel-0.4.3-cp310-abi3-manylinux2014_aarch64.whl.

File metadata

Download URL: sglang_kernel-0.4.3-cp310-abi3-manylinux2014_aarch64.whl
Upload date: May 26, 2026
Size: 189.7 MB
Tags: CPython 3.10+
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.20

File hashes

Hashes for sglang_kernel-0.4.3-cp310-abi3-manylinux2014_aarch64.whl
Algorithm	Hash digest
SHA256	`14c6eabd7aae400d165158cdff90e58f1e505576939436d99d0045e1d0cc5881`
MD5	`809f40f54d80327cc4f79e05224320e1`
BLAKE2b-256	`4778f4341bef15b52da0c388f95c3b87b9e490698abae6e5d78effdfcab98648`

See more details on using hashes here.

sglang-kernel 0.4.3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

sglang-kernel (prior sgl-kernel)

Installation

Building from Source

Use Makefile to build from the sgl-kernel source tree

Limit build resource usage (CPU / parallelism)

Contribution

Steps to add a new kernel:

Development Tips

Adapting C++ Native Types for Torch Compatibility

Testing & Benchmarking

Kernel Size Analysis

FAQ

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distributions

File details

File metadata

File hashes

File details

File metadata

File hashes