Tilelang-based kernels.
Project description
Tile Kernels
Optimized GPU kernels for LLM operations, built with TileLang. TileLang is a domain-specific language for expressing high-performance GPU kernels in Python, featuring easy migration, agile development, and automatic optimization.
Most kernels in this project approach the limit of hardware performance regarding the compute intensity and memory bandwidth. Some of them have already been used in internal training and inference scenarios. However, they do not represent best practices and we are actively working on improving the code quality and documentation.
Features
- Gating — Top-k expert selection and scoring for Mixture of Experts routing
- MoE Routing — Token-to-expert mapping, fused expansion/reduction and weight normalization
- Quantization — Per-token, per-block, and per-channel FP8/FP4/E5M6 casting with fused SwiGLU+quantization ops
- Transpose — Batched transpose operations
- Engram — Engram gating kernels with fused RMSNorm, forward/backward passes and weight gradient reduction
- Manifold HyperConnection — Hyper-connection kernels including Sinkhorn normalization and mix splitting/application
- Modeling — High-level
torch.autograd.Functionwrappers composing low-level kernels into trainable layers (engram gate, mHC pipeline)
Requirements
- Python 3.10 or higher
- PyTorch 2.10 or higher
- TileLang 0.1.9 or higher
- NVIDIA SM90 or SM100 architecture GPU
- CUDA Toolkit 13.1 or higher
Installation
Install a local development version
pip install -e ".[dev]"
Install a release version
pip install tile-kernels
Testing
Tests using pytest:
Test single test file
pytest tests/transpose/test_transpose.py -n 4 # Correctness only with 4 workers
pytest tests/transpose/test_transpose.py --run-benchmark # Correctness + Benchmarking
Pressure test
TK_FULL_TEST=1 pytest -n 4 --count 2
Project Structure
tile_kernels/
├── moe/ # Mixture of Experts routing related kernels
├── quant/ # FP8/FP4/E5M6 quantization
├── transpose/ # Batched transpose
├── engram/ # Engram gating kernels
├── mhc/ # Manifold HyperConnection kernels
├── modeling/ # High-level autograd modeling layers (engram, mHC)
├── torch/ # PyTorch reference implementations
└── testing/ # Test and benchmark utilities
Acknowledgement
This project is built on TileLang. Thanks and respect to the developers!
License
This code repository is released under the MIT License.
Citation
@misc{tilekernels,
title={TileKernels},
author={Xiangwen Wang, Chenhao Xu, Huanqi Cao, Rui Tian, Weilin Zhao, Kuai Yu and Chenggang Zhao},
year={2026},
publisher = {GitHub},
howpublished = {\url{https://github.com/deepseek-ai/TileKernels}},
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file tile_kernels-1.0.0.tar.gz.
File metadata
- Download URL: tile_kernels-1.0.0.tar.gz
- Upload date:
- Size: 107.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
259b7d70219cce6afa868f724c3032936c6ae331239285b1111554c4baa46eb3
|
|
| MD5 |
33c6624ec64040effdac0b5d86f782ae
|
|
| BLAKE2b-256 |
1770f0f62438b89c96bfe36d6ed95bfcc101b75df12f771a800cb6feb0fa9337
|
File details
Details for the file tile_kernels-1.0.0-py3-none-any.whl.
File metadata
- Download URL: tile_kernels-1.0.0-py3-none-any.whl
- Upload date:
- Size: 120.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9f1e09c1bccde32f3189d1ff32ba9725f86fc04ac1d0ac93284e16fcab1a5666
|
|
| MD5 |
ce14f95f37ce6879c91998c305235acb
|
|
| BLAKE2b-256 |
d3747511421ab2a2f292ca2c147a7af78eaec90e39bd6d6703283a4ba8fd7dfc
|