TileOPs kernels for efficient inference

These details have not been verified by PyPI

Project description

TileOPs (TOP)

TileOPs (TOP) is a high-performance machine learning operator collections built on top of TileLang. It offers efficient, modular, and composable implementations optimized for AI workloads.

Note: TileOPs is still under rapid development.

DeepSeek-V3.2-Exp DeepSeek Sparse Attention (DSA) performance on H800 SXM

📦 Installation

Requirements

Python 3.8+
PyTorch >= 2.1
GLIBCXX_3.4.32
TileLang

Method 1: Install with Pip

pip install tileops

Method 2: Install from source (editable mode for development)

git clone https://github.com/tile-ai/TileOPs
cd TileOPs
pip install -e '.[dev]' -v # remove -e option if you don't want to install in editable mode, -v for verbose output

🚀 Quick Usage

Sparse MLA

import torch
from top import SparseMLAKernel

batch_size = 1
seq_len = 1024
seq_len_kv = 2048
q_start_index_s = 1024
n_heads = 128
head_dim = 512
tail_dim = 64
topk = 2048
kv_stride = 1
kv_group = 1
sm_scale = None

sparse_mla = SparseMLAKernel(
    batch=batch_size,
    seq_len=seq_len,
    seq_len_kv=seq_len_kv,
    q_start_index_s=q_start_index_s,
    heads=n_heads,
    dim=head_dim,
    tail_dim=tail_dim,
    topk=topk,
    kv_stride=kv_stride,
    kv_group=kv_group,
    sm_scale=sm_scale,
    is_casual=True,
    dtype=torch.bfloat16,
    device='cuda',
)

# Evaluate the Sparse MLA kernel performance
sparse_mla.check()
latency = sparse_mla.profile()
print(f"Latency: {latency:.4f} ms")
print(f'fwd tflops = ',
        (batch_size * seq_len * (head_dim + tail_dim + head_dim) * topk * 2 * n_heads) / (latency * 1e-3) / 1e12)

MLA

import torch
import top
from top import MLAKernel

device = "cuda"
dtype = torch.float16

batch = 128
heads = 64
kv_heads = 1
kv_ctx = 8192
dim = 512
pe_dim = 64

# Query input: [batch, heads, dim]
q = torch.randn(batch, heads, dim, device=device, dtype=dtype)

# Query positional encoding: [batch, heads, pe_dim]
q_pe = torch.randn(batch, heads, pe_dim, device=device, dtype=dtype)

# KV cache input: [batch, kv_ctx, kv_heads, dim]
kv = torch.randn(batch, kv_ctx, kv_heads, dim, device=device, dtype=dtype)

# KV positional encoding: [batch, kv_ctx, kv_heads, pe_dim]
k_pe = torch.randn(batch, kv_ctx, kv_heads, pe_dim, device=device, dtype=dtype)

# Use MLA kernel
block_N = 64
block_H = 64
num_split = 1

mla = MLAKernel(batch, heads, kv_heads, kv_ctx, dim, pe_dim, block_N, block_H, num_split)

out = mla(q, q_pe, kv, k_pe)

Acknowledgments

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.0.1.dev2 pre-release

Oct 9, 2025

0.0.1.dev1 pre-release

Oct 9, 2025

0.0.1.dev0 pre-release

Jul 16, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tileops-0.0.1.dev2.tar.gz (42.4 kB view details)

Uploaded Oct 9, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

tileops-0.0.1.dev2-py3-none-any.whl (47.2 kB view details)

Uploaded Oct 9, 2025 Python 3

File details

Details for the file tileops-0.0.1.dev2.tar.gz.

File metadata

Download URL: tileops-0.0.1.dev2.tar.gz
Upload date: Oct 9, 2025
Size: 42.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.10.18

File hashes

Hashes for tileops-0.0.1.dev2.tar.gz
Algorithm	Hash digest
SHA256	`30428e26a53b249d2ce35692ac32746b7ad8dc60b653ed8f63b3a7ae5c1fdbb3`
MD5	`091e11e142af28da67bc129ebea67818`
BLAKE2b-256	`9ed5b331a02afac98ad19a02232a9866e608c607fc29ecd899a0b40da223e4ee`

See more details on using hashes here.

File details

Details for the file tileops-0.0.1.dev2-py3-none-any.whl.

File metadata

Download URL: tileops-0.0.1.dev2-py3-none-any.whl
Upload date: Oct 9, 2025
Size: 47.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.10.18

File hashes

Hashes for tileops-0.0.1.dev2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`3bcfd562bc95466ab4855aa7afff363cdd2f6ab381abe24f8deea91dd3986d5f`
MD5	`7366b29b5ab3764ab33e7937caf430b6`
BLAKE2b-256	`882cdd2b61f917ea3df1bf4df2347fc17404fd0823495b1fec4bd87e72b3cfbb`

See more details on using hashes here.

tileops 0.0.1.dev2

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

TileOPs (TOP)

📦 Installation

Requirements

Method 1: Install with Pip

Method 2: Install from source (editable mode for development)

🚀 Quick Usage

Sparse MLA

MLA

Acknowledgments

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes