Skip to main content

Tilelang-based kernels.

Project description

Tile Kernels

Optimized GPU kernels for LLM operations, built with TileLang. TileLang is a domain-specific language for expressing high-performance GPU kernels in Python, featuring easy migration, agile development, and automatic optimization.

Most kernels in this project approach the limit of hardware performance regarding the compute intensity and memory bandwidth. Some of them have already been used in internal training and inference scenarios. However, they do not represent best practices and we are actively working on improving the code quality and documentation.

Features

  • Gating — Top-k expert selection and scoring for Mixture of Experts routing
  • MoE Routing — Token-to-expert mapping, fused expansion/reduction and weight normalization
  • Quantization — Per-token, per-block, and per-channel FP8/FP4/E5M6 casting with fused SwiGLU+quantization ops
  • Transpose — Batched transpose operations
  • Engram — Engram gating kernels with fused RMSNorm, forward/backward passes and weight gradient reduction
  • Manifold HyperConnection — Hyper-connection kernels including Sinkhorn normalization and mix splitting/application
  • Modeling — High-level torch.autograd.Function wrappers composing low-level kernels into trainable layers (engram gate, mHC pipeline)

Requirements

  • Python 3.10 or higher
  • PyTorch 2.10 or higher
  • TileLang 0.1.9 or higher
  • NVIDIA SM90 or SM100 architecture GPU
  • CUDA Toolkit 13.1 or higher

Installation

Install a local development version

pip install -e ".[dev]"

Install a release version

pip install tile-kernels

Testing

Tests using pytest:

Test single test file

pytest tests/transpose/test_transpose.py -n 4 # Correctness only with 4 workers
pytest tests/transpose/test_transpose.py --run-benchmark # Correctness + Benchmarking

Pressure test

TK_FULL_TEST=1 pytest -n 4 --count 2

Project Structure

tile_kernels/
├── moe/        # Mixture of Experts routing related kernels
├── quant/      # FP8/FP4/E5M6 quantization
├── transpose/  # Batched transpose
├── engram/     # Engram gating kernels
├── mhc/        # Manifold HyperConnection kernels
├── modeling/   # High-level autograd modeling layers (engram, mHC)
├── torch/      # PyTorch reference implementations
└── testing/    # Test and benchmark utilities

Acknowledgement

This project is built on TileLang. Thanks and respect to the developers!

License

This code repository is released under the MIT License.

Citation

@misc{tilekernels,
      title={TileKernels},
      author={Xiangwen Wang, Chenhao Xu, Huanqi Cao, Rui Tian, Weilin Zhao, Kuai Yu and Chenggang Zhao},
      year={2026},
      publisher = {GitHub},
      howpublished = {\url{https://github.com/deepseek-ai/TileKernels}},
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tile_kernels-1.0.0.tar.gz (107.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tile_kernels-1.0.0-py3-none-any.whl (120.6 kB view details)

Uploaded Python 3

File details

Details for the file tile_kernels-1.0.0.tar.gz.

File metadata

  • Download URL: tile_kernels-1.0.0.tar.gz
  • Upload date:
  • Size: 107.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for tile_kernels-1.0.0.tar.gz
Algorithm Hash digest
SHA256 259b7d70219cce6afa868f724c3032936c6ae331239285b1111554c4baa46eb3
MD5 33c6624ec64040effdac0b5d86f782ae
BLAKE2b-256 1770f0f62438b89c96bfe36d6ed95bfcc101b75df12f771a800cb6feb0fa9337

See more details on using hashes here.

File details

Details for the file tile_kernels-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: tile_kernels-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 120.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for tile_kernels-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 9f1e09c1bccde32f3189d1ff32ba9725f86fc04ac1d0ac93284e16fcab1a5666
MD5 ce14f95f37ce6879c91998c305235acb
BLAKE2b-256 d3747511421ab2a2f292ca2c147a7af78eaec90e39bd6d6703283a4ba8fd7dfc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page