Tilelang-based kernels.

These details have not been verified by PyPI

Project links

Homepage

Project description

Tile Kernels

Optimized GPU kernels for LLM operations, built with TileLang. TileLang is a domain-specific language for expressing high-performance GPU kernels in Python, featuring easy migration, agile development, and automatic optimization.

Most kernels in this project approach the limit of hardware performance regarding the compute intensity and memory bandwidth. Some of them have already been used in internal training and inference scenarios. However, they do not represent best practices and we are actively working on improving the code quality and documentation.

Features

Gating — Top-k expert selection and scoring for Mixture of Experts routing
MoE Routing — Token-to-expert mapping, fused expansion/reduction and weight normalization
Quantization — Per-token, per-block, and per-channel FP8/FP4/E5M6 casting with fused SwiGLU+quantization ops
Transpose — Batched transpose operations
Engram — Engram gating kernels with fused RMSNorm, forward/backward passes and weight gradient reduction
Manifold HyperConnection — Hyper-connection kernels including Sinkhorn normalization and mix splitting/application
Modeling — High-level torch.autograd.Function wrappers composing low-level kernels into trainable layers (engram gate, mHC pipeline)

Requirements

Python 3.10 or higher
PyTorch 2.10 or higher
TileLang 0.1.9 or higher
NVIDIA SM90 or SM100 architecture GPU
CUDA Toolkit 13.1 or higher

Installation

Install a local development version

pip install -e ".[dev]"

Install a release version

pip install tile-kernels

Testing

Tests using pytest:

Test single test file

pytest tests/transpose/test_transpose.py -n 4 # Correctness only with 4 workers
pytest tests/transpose/test_transpose.py --run-benchmark # Correctness + Benchmarking

Pressure test

TK_FULL_TEST=1 pytest -n 4 --count 2

Project Structure

tile_kernels/
├── moe/        # Mixture of Experts routing related kernels
├── quant/      # FP8/FP4/E5M6 quantization
├── transpose/  # Batched transpose
├── engram/     # Engram gating kernels
├── mhc/        # Manifold HyperConnection kernels
├── modeling/   # High-level autograd modeling layers (engram, mHC)
├── torch/      # PyTorch reference implementations
└── testing/    # Test and benchmark utilities

Acknowledgement

This project is built on TileLang. Thanks and respect to the developers!

License

This code repository is released under the MIT License.

Citation

@misc{tilekernels,
      title={TileKernels},
      author={Xiangwen Wang, Chenhao Xu, Huanqi Cao, Rui Tian, Weilin Zhao, Kuai Yu and Chenggang Zhao},
      year={2026},
      publisher = {GitHub},
      howpublished = {\url{https://github.com/deepseek-ai/TileKernels}},
}

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

1.0.0

Apr 23, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tile_kernels-1.0.0.tar.gz (107.9 kB view details)

Uploaded Apr 23, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

tile_kernels-1.0.0-py3-none-any.whl (120.6 kB view details)

Uploaded Apr 23, 2026 Python 3

File details

Details for the file tile_kernels-1.0.0.tar.gz.

File metadata

Download URL: tile_kernels-1.0.0.tar.gz
Upload date: Apr 23, 2026
Size: 107.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for tile_kernels-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`259b7d70219cce6afa868f724c3032936c6ae331239285b1111554c4baa46eb3`
MD5	`33c6624ec64040effdac0b5d86f782ae`
BLAKE2b-256	`1770f0f62438b89c96bfe36d6ed95bfcc101b75df12f771a800cb6feb0fa9337`

See more details on using hashes here.

File details

Details for the file tile_kernels-1.0.0-py3-none-any.whl.

File metadata

Download URL: tile_kernels-1.0.0-py3-none-any.whl
Upload date: Apr 23, 2026
Size: 120.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for tile_kernels-1.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`9f1e09c1bccde32f3189d1ff32ba9725f86fc04ac1d0ac93284e16fcab1a5666`
MD5	`ce14f95f37ce6879c91998c305235acb`
BLAKE2b-256	`d3747511421ab2a2f292ca2c147a7af78eaec90e39bd6d6703283a4ba8fd7dfc`

See more details on using hashes here.

tile-kernels 1.0.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Tile Kernels

Features

Requirements

Installation

Install a local development version

Install a release version

Testing

Test single test file

Pressure test

Project Structure

Acknowledgement

License

Citation

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes