TileRT: Tile-Based Runtime for Ultra-Low-Latency LLM Inference.

These details have not been verified by PyPI

Project links

Operating System
- OS Independent
Programming Language
- Python :: 3.11
- Python :: 3.12
Topic
- Software Development :: Libraries

Project description

TileRT: Tile-Based Runtime for
Ultra-Low-Latency LLM Inference

TileRT serves large language models (LLMs) in ultra-low-latency scenarios — pushing the latency limits of hundred-billion-parameter models to millisecond-level time per output token (TPOT) without compromising model size or quality. Its tile-level runtime engine decomposes LLM operators into fine-grained tile tasks and dynamically overlaps computation, I/O, and communication across multiple GPUs.

The current preview supports DeepSeek-V3.2 and GLM-5 on 8× NVIDIA B200. For full usage, examples, and news, see the GitHub repository.

GLM-5.1-FP8 token generation with TileRT v0.1.4
_{GLM-5.1-FP8 token generation speed with TileRT v0.1.4. Output length 1K, input length 1K–192K. Bars compare TileRT without MTP, with MTP at average acceptance length 3.2, and the peak under best-case MTP acceptance.}

Installation

The official tilert==0.1.4 wheel on PyPI was compiled against the following stack. Treat these as hard requirements, not lower bounds.

Component	Pinned version
NVIDIA driver	Supports CUDA 13.2 runtime
Operating System	Linux x86_64, glibc ≥ 2.28 (manylinux_2_28)
Python	3.12
PyTorch	`torch==2.11.0+cu130`
`transformers`	`4.46.3`
`tokenizers`	`0.20.3`

Recommended: pre-built Docker image

The pinned environment is preinstalled in our official image — the recommended way to run TileRT, avoiding version drift on the host. The image is mirrored to two registries; pull from whichever is reachable:

docker pull ghcr.io/tile-ai/tilert:cu132-latest   # GitHub Container Registry
docker pull tileai/tilert:cu132-latest            # Docker Hub

Launch a container with all 8 GPUs attached, then install the wheel inside:

docker run --rm -it --gpus all --ipc=host \
    -v "$PWD":/workspace -w /workspace \
    ghcr.io/tile-ai/tilert:cu132-latest

# Install from PyPI:
pip install tilert==0.1.4

# Or pin the exact wheel from the GitHub Release page (same artifact,
# useful when PyPI is unreachable):
pip install https://github.com/tile-ai/TileRT/releases/download/v0.1.4/tilert-0.1.4-cp312-cp312-manylinux_2_28_x86_64.whl

Verify the install:

python -c "import tilert, torch; print('tilert', tilert.__version__, '/ torch', torch.__version__, '/ cuda', torch.version.cuda)"
# Expected: tilert 0.1.4 / torch 2.11.0+cu130 / cuda 13.0

Documentation

For weight conversion, the generation CLI, the programmatic API, Multi-Token Prediction (MTP), and the latest benchmarks, see the TileRT GitHub repository.

Project details

These details have not been verified by PyPI

Project links

Operating System
- OS Independent
Programming Language
- Python :: 3.11
- Python :: 3.12
Topic
- Software Development :: Libraries

Release history Release notifications | RSS feed

This version

0.1.4

Jun 2, 2026

0.1.3

Feb 14, 2026

0.1.2a1 pre-release

Jan 26, 2026

0.1.1

Dec 23, 2025

0.1.0a2 pre-release

Nov 24, 2025

0.1.0a1 pre-release

Nov 21, 2025

0.0.0.dev0 pre-release

Nov 12, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

tilert-0.1.4-cp312-cp312-manylinux_2_28_x86_64.whl (6.3 MB view details)

Uploaded Jun 2, 2026 CPython 3.12manylinux: glibc 2.28+ x86-64

File details

Details for the file tilert-0.1.4-cp312-cp312-manylinux_2_28_x86_64.whl.

File metadata

Download URL: tilert-0.1.4-cp312-cp312-manylinux_2_28_x86_64.whl
Upload date: Jun 2, 2026
Size: 6.3 MB
Tags: CPython 3.12, manylinux: glibc 2.28+ x86-64
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.10.18

File hashes

Hashes for tilert-0.1.4-cp312-cp312-manylinux_2_28_x86_64.whl
Algorithm	Hash digest
SHA256	`ac5919e59d0639e2160cc5b3e2004adb2da26bd6ede29b58d2b29c890736c057`
MD5	`c5220de7fe00487d1afa99a3390552ef`
BLAKE2b-256	`bbcd2d5c6f33e5a9f9219c59fe89580de4b73a845e6c97e09478fc43135a1dfd`

See more details on using hashes here.

tilert 0.1.4

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

TileRT: Tile-Based Runtime for
Ultra-Low-Latency LLM Inference

Installation

Recommended: pre-built Docker image

Documentation

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distribution

File details

File metadata

File hashes

tilert 0.1.4

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

TileRT: Tile-Based Runtime forUltra-Low-Latency LLM Inference

Installation

Recommended: pre-built Docker image

Documentation

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distribution

File details

File metadata

File hashes

TileRT: Tile-Based Runtime for
Ultra-Low-Latency LLM Inference