Kernel Library for SGLang

These details have not been verified by PyPI

Project links

Environment
- GPU :: NVIDIA CUDA
License
- OSI Approved :: Apache Software License
Programming Language
- Python :: 3

Project description

SGL Kernel

Installation

For CUDA 11.8:

pip3 install sgl-kernel -i https://docs.sglang.ai/whl/cu118

For CUDA 12.1 or CUDA 12.4:

pip3 install sgl-kernel

Build from source

Development build:

make build

Note:

The sgl-kernel is rapidly evolving. If you experience a compilation failure, try using make rebuild.

Build with ccache

# or `yum install -y ccache`.
apt-get install -y ccache
# Building with ccache is enabled when ccache is installed and CCACHE_DIR is set.
export CCACHE_DIR=/path/to/your/ccache/dir
export CCACHE_BACKEND=""
export CCACHE_KEEP_LOCAL_STORAGE="TRUE"
unset CCACHE_READONLY
python -m uv build --wheel -Cbuild-dir=build --color=always .

Configuring CMake Build Options

Cmake options can be configuring by adding -Ccmake.define.<option>=<value> to the uv build flags. For example, to enable building FP4 kernels, use:

python -m uv build --wheel -Cbuild-dir=build -Ccmake.define.SGL_KERNEL_ENABLE_FP4=1 --color=always .

See CMakeLists.txt for more options.

Parallel Build

We highly recommend you build sgl-kernel with Ninja. Ninja can automatically build sgl-kernel in parallel. And if you build the sgl-kernel with cmake, you need to add CMAKE_BUILD_PARALLEL_LEVEL for parallel build like:

CMAKE_BUILD_PARALLEL_LEVEL=$(nproc) python -m uv build --wheel -Cbuild-dir=build --color=always .

Developer Guide

Development Environment Setup

Use Docker to set up the development environment. See Docker setup guide.

Create and enter development container:

docker run -itd --shm-size 32g --gpus all -v $HOME/.cache:/root/.cache --ipc=host --name sglang_zhyncs lmsysorg/sglang:dev /bin/zsh
docker exec -it sglang_zhyncs /bin/zsh

Project Structure

Dependencies

Third-party libraries:

FlashAttention FYI

FA3 can fail without a enough shared memory for a some shapes, such as higher hidden_dim or some special cases. Right now, fa3 is supported for sm80/sm87 and sm86/sm89.

The main different Between sm80/sm87 and sm86/sm89 is the shared memory size. you can follow the link below for more information https://docs.nvidia.com/cuda/cuda-c-programming-guide/#shared-memory-8-x.

And for sgl-kernel right now, we can build fa3 on sm80/sm86/sm89/sm90a. That means if you use A100(tested)/A*0/L20(tested)/L40/L40s/3090(tested) you can use fa3.

Kernel Development

Steps to add a new kernel:

Implement the kernel in csrc
Expose the interface in include/sgl_kernel_ops.h
Create torch extension in csrc/common_extension.cc
Update CMakeLists.txt to include new CUDA source
Expose Python interface in python

Development Tips

When implementing kernels in csrc, only define pure CUDA files and C++ interfaces. If you need to use Torch::tensor, use <torch/all.h> instead of <torch/extension.h>. Using <torch/extension.h> will cause compilation errors when using SABI.
When creating torch extensions, add the function definition with m.def, and device binding with m.impl:

Using torch.compile need m.def with schema, it helps auto capture the custom kernel. Reference: How to add FakeTensor

How to write schema: Schema reference

// We need def with schema here for torch.compile
m.def(
 "bmm_fp8(Tensor A, Tensor B, Tensor! D, Tensor A_scale, Tensor B_scale, Tensor workspace_buffer, int "
 "cublas_handle, int cuda_stream) -> ()");
m.impl("bmm_fp8", torch::kCUDA, &bmm_fp8);

When exposing Python interfaces, avoid using kwargs in C++ interface kernels.

Avoid this:

torch.ops.sgl_kernel.apply_rope_pos_ids_cos_sin_cache.default(
    q=query.view(query.shape[0], -1, head_size),
    k=key.view(key.shape[0], -1, head_size),
    q_rope=query.view(query.shape[0], -1, head_size),
    k_rope=key.view(key.shape[0], -1, head_size),
    cos_sin_cache=cos_sin_cache,
    pos_ids=positions.long(),
    interleave=(not is_neox),
    cuda_stream=get_cuda_stream(),
)

Use this instead:

torch.ops.sgl_kernel.apply_rope_pos_ids_cos_sin_cache.default(
    query.view(query.shape[0], -1, head_size),
    key.view(key.shape[0], -1, head_size),
    query.view(query.shape[0], -1, head_size),
    key.view(key.shape[0], -1, head_size),
    cos_sin_cache,
    positions.long(),
    (not is_neox),
    get_cuda_stream(),
)

Integrating Third-Party Libraries with Data Type Conversion

When integrating new third-party libraries like flash-attention, you may encounter data type compatibility issues between the C++ interface and PyTorch bindings. For example, the third-party code might use float or int types, while PyTorch requires double and int64_t.

The reason we need double and int64_t in torch binding is that TORCH_LIBRARY handles the Python-to-C++ conversion process. Python's float data type actually corresponds to double in C++, while Python's int corresponds to int64_t in C++.

To address this issue, we provide the make_pytorch_shim function in sgl_kernel_torch_shim that handles data type conversions automatically.

When you need to support new data type conversions, you can easily add conversion functions like this:

// Map `int` -> `int64_t`
template <>
struct pytorch_library_compatible_type<int> {
  using type = int64_t;
  static int convert_from_type(int64_t arg) {
    TORCH_CHECK(arg <= std::numeric_limits<int>::max(), "int64_t value is too large to be converted  to int");
    TORCH_CHECK(arg >= std::numeric_limits<int>::min(), "int64_t value is too small to be converted to int");
    return arg;
  }
};

To use this with your library functions, simply wrap them with make_pytorch_shim:

/*
 * From flash-attention
 */
 m.impl("fwd", torch::kCUDA, make_pytorch_shim(&mha_fwd));

Testing & Benchmarking

Add pytest tests in tests/, if you need to skip some test, please use @pytest.mark.skipif

@pytest.mark.skipif(
    skip_condition, reason="Nvfp4 Requires compute capability of 10 or above."
)

Add benchmarks using triton benchmark in benchmark/
Run test suite

FAQ

When encountering this error while compiling using ccache: ImportError: /usr/local/lib/python3.10/dist-packages/sgl_kernel/common_ops.abi3.so: undefined symbol: _ZN3c108ListType3getERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEENS_4Type24SingletonOrSharedTypePtrIS9_EE, please modify the last command as follows to resolve it: python3 -m uv build --wheel -Cbuild-dir=build . --color=always --no-build-isolation .

Release new version

Update version in pyproject.toml and version.py

Project details

These details have not been verified by PyPI

Project links

Environment
- GPU :: NVIDIA CUDA
License
- OSI Approved :: Apache Software License
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

This version

0.2.3

Jul 5, 2025

0.2.2

Jul 3, 2025

0.2.1

Jul 1, 2025

0.2.0

Jun 24, 2025

0.1.9

Jun 16, 2025

0.1.8.post2

Jun 15, 2025

0.1.8.post1

Jun 13, 2025

0.1.8

Jun 12, 2025

0.1.7

Jun 8, 2025

0.1.6.post1

Jun 7, 2025

0.1.6

Jun 7, 2025

0.1.5

May 31, 2025

0.1.4

May 22, 2025

0.1.3

May 17, 2025

0.1.2.post1

May 11, 2025

0.1.2

May 8, 2025

0.1.1

Apr 30, 2025

0.1.0

Apr 23, 2025

0.0.9.post2

Apr 18, 2025

0.0.9.post1

Apr 15, 2025

0.0.9

Apr 15, 2025

0.0.8.post3

Apr 12, 2025

0.0.8.post2

Apr 12, 2025

0.0.8.post1

Apr 11, 2025

0.0.8

Apr 5, 2025

0.0.7

Apr 3, 2025

0.0.6

Mar 31, 2025

0.0.5.post4

Mar 29, 2025

0.0.5.post3

Mar 17, 2025

0.0.5.post2

Mar 16, 2025

0.0.5.post1

Mar 14, 2025

0.0.5

Mar 13, 2025

0.0.4.post3

Mar 12, 2025

0.0.4.post2

Mar 11, 2025

0.0.4.post1

Mar 10, 2025

0.0.4

Mar 9, 2025

0.0.3.post7

Mar 7, 2025

0.0.3.post6

Feb 14, 2025

0.0.3.post5

Feb 12, 2025

0.0.3.post4

Feb 12, 2025

0.0.3.post3

Feb 9, 2025

0.0.3.post2

Feb 7, 2025

0.0.3.post1

Jan 30, 2025

0.0.3

Jan 27, 2025

0.0.2.post20

Jan 27, 2025

0.0.2.post19

Jan 27, 2025

0.0.2.post18

Jan 26, 2025

0.0.2.post17

Jan 25, 2025

0.0.2.post16

Jan 23, 2025

0.0.2.post15

Jan 20, 2025

0.0.2.post14

Jan 15, 2025

0.0.2.post13

Jan 15, 2025

0.0.2.post12

Jan 13, 2025

0.0.2.post11

Jan 5, 2025

0.0.2.post10

Dec 27, 2024

0.0.2.post9

Dec 26, 2024

0.0.2.post8

Dec 25, 2024

0.0.2.post7

Dec 15, 2024

0.0.2.post6

Dec 25, 2024

0.0.2.post5

Dec 15, 2024

0.0.2.post4

Dec 12, 2024

0.0.2.post3

Dec 12, 2024

0.0.2.post2

Dec 12, 2024

0.0.2.post1

Dec 12, 2024

0.0.2

Dec 12, 2024

0.0.1

Dec 1, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

sgl_kernel-0.2.3-cp39-abi3-manylinux2014_x86_64.whl (281.0 MB view details)

Uploaded Jul 5, 2025 CPython 3.9+

File details

Details for the file sgl_kernel-0.2.3-cp39-abi3-manylinux2014_x86_64.whl.

File metadata

Download URL: sgl_kernel-0.2.3-cp39-abi3-manylinux2014_x86_64.whl
Upload date: Jul 5, 2025
Size: 281.0 MB
Tags: CPython 3.9+
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.9.22

File hashes

Hashes for sgl_kernel-0.2.3-cp39-abi3-manylinux2014_x86_64.whl
Algorithm	Hash digest
SHA256	`34b5ddf2bd987bc2880e8095762c843b4e068488faf07f6058f9f278d5e7be6c`
MD5	`c34a10a1a52e00928b3fc37f2dbb8537`
BLAKE2b-256	`cecfd7de3b8cb55164beb920cabf8a73a38f13cb47b176a079cd4f9ca32dd621`

See more details on using hashes here.

sgl-kernel 0.2.3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

SGL Kernel

Installation

Build from source

Build with ccache

Configuring CMake Build Options

Parallel Build

Developer Guide

Development Environment Setup

Project Structure

Dependencies

FlashAttention FYI

Kernel Development

Development Tips

Integrating Third-Party Libraries with Data Type Conversion

Testing & Benchmarking

FAQ

Release new version

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distribution

File details

File metadata

File hashes