Skip to main content

CUDA Tile Compiler

Project description

cuTile Python

cuTile Python is a programming language for NVIDIA GPUs. The official documentation can be found on docs.nvidia.com, or built from source located in the docs folder.

Example

# This examples uses CuPy which can be installed via `pip install cupy-cuda13x`
# Make sure cuda toolkit 13.1+ is installed: https://developer.nvidia.com/cuda-downloads

import cuda.tile as ct
import cupy
import numpy as np

TILE_SIZE = 16

# cuTile kernel for adding two dense vectors. It runs in parallel on the GPU.
@ct.kernel
def vector_add_kernel(a, b, result):
    block_id = ct.bid(0)
    a_tile = ct.load(a, index=(block_id,), shape=(TILE_SIZE,))
    b_tile = ct.load(b, index=(block_id,), shape=(TILE_SIZE,))
    result_tile = a_tile + b_tile
    ct.store(result, index=(block_id,), tile=result_tile)

# Generate input arrays
rng = cupy.random.default_rng()
a = rng.random(128)
b = rng.random(128)
expected = cupy.asnumpy(a) + cupy.asnumpy(b)

# Allocate an output array and launch the kernel
result = cupy.zeros_like(a)
grid = (ct.cdiv(a.shape[0], TILE_SIZE), 1, 1)
ct.launch(cupy.cuda.get_current_stream(), grid, vector_add_kernel, (a, b, result))

# Verify the results
result_np = cupy.asnumpy(result)
np.testing.assert_array_almost_equal(result_np, expected)

More examples can be found at Samples and TileGym.

System Requirements

cuTile Python generates kernels based on Tile IR which requries NVIDIA Driver r580 or later to run. Furthermore, the tileiras compiler only supports Blackwell GPU with 13.1 release, but the restriction will be removed in the coming versions. Checkout the prerequisites for full list of requirements.

Installing from PyPI

cuTile Python is published on PyPI under the cuda-tile package name and can be installed with pip:

pip install cuda-tile

Currently, the CUDA Toolkit 13.1+ is required and needs to be installed separately. On a Debian-based system, use apt-get install cuda-tileiras-13.1 cuda-compiler-13.1 instead of apt-get install cuda-toolkit-13.1 if you wish to avoid installing the full CUDA Toolkit.

Building from Source

cuTile is written mostly in Python, but includes a C++ extension which needs to be built. You will need:

  • A C++17-capable compiler, such as GNU C++ or MSVC;
  • CMake 3.18+;
  • GNU Make on Linux or msbuild on Windows;
  • Python 3.10+ with development headers (venv module is recommended but optional);
  • CUDA Toolkit 13.1+

On an Ubuntu system, the first four dependencies can be installed with APT:

sudo apt-get update && sudo apt-get install build-essential cmake python3-dev python3-venv

The CMakeLists.txt script will also automatically download the DLPack dependency from GitHub. If you wish to disable this behavior and provide your own copy of DLPack, set the CUDA_TILE_CMAKE_DLPACK_PATH environment variable to a local path to the DLPack source tree.

Unless you are already using a Python virtual environment, it is recommended to create one in order to avoid installing cuTile globally:

python3 -m venv env
source env/bin/activate

Once the build dependencies are in place, the simplest way to build cuTile is to install it in editable mode by running the following command in the source root directory:

pip install -e .

This will create the build directory and invoke the CMake-based build process. In editable mode, the compiled extension module will be placed in the build directory, and then a symbolic link to it will be created in the source directory. This makes sure that the pip install -e . command above is needed only once, and recompiling the extension after making changes to the C++ code can be done with make -C build which is much faster. This logic is defined in setup.py.

Experimental Features (Optional)

cuTile now provides an experimental package containing APIs that are still under active development. These are not part of the stable cuda.tile API and may change.

To enable the experimental features when working from a source checkout, install the experimental package from the repository root:

pip install ./experimental

You can also install it directly from a GitHub repository subdirectory:

pip install \
  "git+https://github.com/NVIDIA/cutile-python.git#egg=cuda-tile-experimental&subdirectory=experimental"

For example, this will make the experimental namespace available for autotuner:

from cuda.tile_experimental import autotune_launch, clear_autotune_cache

Running Tests

cuTile uses the pytest framework for testing. Tests have extra dependencies, such as PyTorch, which can be installed with

pip install -r test/requirements.txt

The tests are located in the test/ directory. To run a specific test file, for example test_copy.py, use the following command:

pytest test/test_copy.py

Copyright and License Information

Copyright © 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.

cuTile-Python is licensed under the Apache 2.0 license. See the LICENSES folder for the full license text.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

cuda_tile-1.2.0-cp313-cp313-win_amd64.whl (203.4 kB view details)

Uploaded CPython 3.13Windows x86-64

cuda_tile-1.2.0-cp313-cp313-manylinux2014_x86_64.whl (209.2 kB view details)

Uploaded CPython 3.13

cuda_tile-1.2.0-cp313-cp313-manylinux2014_aarch64.whl (207.8 kB view details)

Uploaded CPython 3.13

cuda_tile-1.2.0-cp312-cp312-win_amd64.whl (203.4 kB view details)

Uploaded CPython 3.12Windows x86-64

cuda_tile-1.2.0-cp312-cp312-manylinux2014_x86_64.whl (209.1 kB view details)

Uploaded CPython 3.12

cuda_tile-1.2.0-cp312-cp312-manylinux2014_aarch64.whl (207.8 kB view details)

Uploaded CPython 3.12

cuda_tile-1.2.0-cp311-cp311-win_amd64.whl (203.3 kB view details)

Uploaded CPython 3.11Windows x86-64

cuda_tile-1.2.0-cp311-cp311-manylinux2014_x86_64.whl (208.8 kB view details)

Uploaded CPython 3.11

cuda_tile-1.2.0-cp311-cp311-manylinux2014_aarch64.whl (207.6 kB view details)

Uploaded CPython 3.11

cuda_tile-1.2.0-cp310-cp310-win_amd64.whl (203.4 kB view details)

Uploaded CPython 3.10Windows x86-64

cuda_tile-1.2.0-cp310-cp310-manylinux2014_x86_64.whl (208.8 kB view details)

Uploaded CPython 3.10

cuda_tile-1.2.0-cp310-cp310-manylinux2014_aarch64.whl (207.8 kB view details)

Uploaded CPython 3.10

File details

Details for the file cuda_tile-1.2.0-cp313-cp313-win_amd64.whl.

File metadata

  • Download URL: cuda_tile-1.2.0-cp313-cp313-win_amd64.whl
  • Upload date:
  • Size: 203.4 kB
  • Tags: CPython 3.13, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for cuda_tile-1.2.0-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 1fa83534190b4c8f6d7f94cad886dba2b3f10151bbd5f6773df5b0b16b05f666
MD5 934500ab2e8fcb0ee94a97c270eef7d5
BLAKE2b-256 e9f87ba62912cfd0d072c6bb0fdff2e87b14d62c18a48d3c222ad0d4641e5c43

See more details on using hashes here.

File details

Details for the file cuda_tile-1.2.0-cp313-cp313-manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for cuda_tile-1.2.0-cp313-cp313-manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 1fe9efd6567062b10b89dd5611c524bdd2bcf57ac97d0f4c1aabd981eddc5dff
MD5 9acc16761eac98ab3e9ab286b3e66aca
BLAKE2b-256 06dfe87db07ed37e9931e5ebfb7f95496a5cc17b7f94ea16f13ad0f97ed107b5

See more details on using hashes here.

File details

Details for the file cuda_tile-1.2.0-cp313-cp313-manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for cuda_tile-1.2.0-cp313-cp313-manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 11eda902e27a6763fc35d69937c237cf88848f91f360b0b420c4ff4e9bddf9fb
MD5 de7f6179e59b53f68b84b106bb13bf74
BLAKE2b-256 f01c2039585d2cef99ce9e863fcf5dc96847b4c193e3c6962cfb25385c9edcfe

See more details on using hashes here.

File details

Details for the file cuda_tile-1.2.0-cp312-cp312-win_amd64.whl.

File metadata

  • Download URL: cuda_tile-1.2.0-cp312-cp312-win_amd64.whl
  • Upload date:
  • Size: 203.4 kB
  • Tags: CPython 3.12, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for cuda_tile-1.2.0-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 3579808af467e4a72c800354092c9e95e73a12e899868a53b0a1e922f1cb4d53
MD5 7866749a37d9390c09958baa1fb5857e
BLAKE2b-256 308ab90ccb7e189bd9b6179e335bf5dd2be15c02b56ed0d2c2c204cc50207a03

See more details on using hashes here.

File details

Details for the file cuda_tile-1.2.0-cp312-cp312-manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for cuda_tile-1.2.0-cp312-cp312-manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 6e2f3a5476104a33856242299e98ebdccb976267e790f1d25de82d0f120b93b5
MD5 020ba9ff4313e34a2898ccc7173f20c4
BLAKE2b-256 3231ca73deda50367be3b7f595d3d4412c120454eec9ea161492476bbebcd6fd

See more details on using hashes here.

File details

Details for the file cuda_tile-1.2.0-cp312-cp312-manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for cuda_tile-1.2.0-cp312-cp312-manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 52b63c4ef25dfabdae6ba0ee135ba6195741f0642d498648ab66c5c6202b5533
MD5 af59300cda345c82549805b6d6442fa2
BLAKE2b-256 d90b0ae1d8b86c2baf5c9994e1b697af98e41e5e5bcec1ebbb5fdd6e2de482b1

See more details on using hashes here.

File details

Details for the file cuda_tile-1.2.0-cp311-cp311-win_amd64.whl.

File metadata

  • Download URL: cuda_tile-1.2.0-cp311-cp311-win_amd64.whl
  • Upload date:
  • Size: 203.3 kB
  • Tags: CPython 3.11, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for cuda_tile-1.2.0-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 f63bad8fa0dff539fdcd6f8e121824b40a92de7e38171bacff8900b50de0fd81
MD5 ee0903fbe7d1f9a55a1f3991a76e2161
BLAKE2b-256 2d24854a43c45f513182bc466fe8b2c4cff49173d4b52ed9d3414537f8b48704

See more details on using hashes here.

File details

Details for the file cuda_tile-1.2.0-cp311-cp311-manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for cuda_tile-1.2.0-cp311-cp311-manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 014e85fe8300d0db8281ae0e39e301877712319872a6b98554e646184100533c
MD5 48313446a9fc0e63cd8ba770514bf1b1
BLAKE2b-256 1896acba2a2594daf8b914ac7281dfb93f0730691762b360ebd01d3d37b6db40

See more details on using hashes here.

File details

Details for the file cuda_tile-1.2.0-cp311-cp311-manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for cuda_tile-1.2.0-cp311-cp311-manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 9c942450e6b1833ba263f9cfdd77e1e82ba5bdcfca01679184bef1ee1efae387
MD5 e70feb52db9b84ab8a6a7578e81d583a
BLAKE2b-256 3ad1b51bc9f5c3953bf539f08db38fea3bea590d82ed973dfbba33c3e6e1adf1

See more details on using hashes here.

File details

Details for the file cuda_tile-1.2.0-cp310-cp310-win_amd64.whl.

File metadata

  • Download URL: cuda_tile-1.2.0-cp310-cp310-win_amd64.whl
  • Upload date:
  • Size: 203.4 kB
  • Tags: CPython 3.10, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for cuda_tile-1.2.0-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 577a2c5f536c4c3d00e7217dcbda682224f5754ec51fa4aa2441418fe05241c4
MD5 0461005a29d6003d98ef4fd1f0f8aa95
BLAKE2b-256 c0bf1596be787dc19f0ff8e44fc08e869493f5a4477cb050147f5ab24f336234

See more details on using hashes here.

File details

Details for the file cuda_tile-1.2.0-cp310-cp310-manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for cuda_tile-1.2.0-cp310-cp310-manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 11b8975055a018633f54b941f8645298d6077cca720da6d9751606154f0a48b2
MD5 85b9f6828cfa717a94c4745843611135
BLAKE2b-256 460ecddeccb05c02b6bf7d23f7b440fce5336a552fea91d6b8433c4dcbdfe0b4

See more details on using hashes here.

File details

Details for the file cuda_tile-1.2.0-cp310-cp310-manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for cuda_tile-1.2.0-cp310-cp310-manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 cd4307cb607d6ff4daa8275a7dfd034b06c5f117e0e7f762a591027216a9d8d6
MD5 579aa28ffe25c4126beb71061d497cff
BLAKE2b-256 12504341fb87d43d6ea83538e6a9ad45aa6dc1493f5cc178fd3ec3766851161c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page