Skip to main content

CUDA Tile Compiler

Project description

cuTile Python

cuTile Python is a programming language for NVIDIA GPUs. The official documentation can be found on docs.nvidia.com, or built from source located in the docs folder.

Example

# This examples uses CuPy which can be installed via `pip install cupy-cuda13x`
# Make sure cuda toolkit 13.1+ is installed: https://developer.nvidia.com/cuda-downloads

import cuda.tile as ct
import cupy
import numpy as np

TILE_SIZE = 16

# cuTile kernel for adding two dense vectors. It runs in parallel on the GPU.
@ct.kernel
def vector_add_kernel(a, b, result):
    block_id = ct.bid(0)
    a_tile = ct.load(a, index=(block_id,), shape=(TILE_SIZE,))
    b_tile = ct.load(b, index=(block_id,), shape=(TILE_SIZE,))
    result_tile = a_tile + b_tile
    ct.store(result, index=(block_id,), tile=result_tile)

# Generate input arrays
rng = cupy.random.default_rng()
a = rng.random(128)
b = rng.random(128)
expected = cupy.asnumpy(a) + cupy.asnumpy(b)

# Allocate an output array and launch the kernel
result = cupy.zeros_like(a)
grid = (ct.cdiv(a.shape[0], TILE_SIZE), 1, 1)
ct.launch(cupy.cuda.get_current_stream(), grid, vector_add_kernel, (a, b, result))

# Verify the results
result_np = cupy.asnumpy(result)
np.testing.assert_array_almost_equal(result_np, expected)

More examples can be found at Samples and TileGym.

System Requirements

cuTile Python generates kernels based on Tile IR which requires NVIDIA Driver r580 or later to run. Furthermore, the tileiras compiler (version 13.2) only supports Blackwell GPU and Ampere/Ada GPU. Hopper GPU will be supported in the coming versions. Checkout the prerequisites for full list of requirements.

Installing from PyPI

cuTile Python is published on PyPI under the cuda-tile package name and can be installed with pip:

pip install cuda-tile[tileiras]

The optional tileiras dependency installs the tileiras compiler directly into your python environment.

If you do not want to have tileiras inside the python environment, run

pip install cuda-tile

and install CUDA Toolkit 13.1+ separately.

On a Debian-based system, use apt-get install cuda-tileiras-13.2 cuda-compiler-13.2 instead of apt-get install cuda-toolkit-13.2 if you wish to avoid installing the full CUDA Toolkit.

Building from Source

cuTile is written mostly in Python, but includes a C++ extension which needs to be built. You will need:

  • A C++17-capable compiler, such as GNU C++ or MSVC;
  • CMake 3.18+;
  • GNU Make on Linux or msbuild on Windows;
  • Python 3.10+ with development headers (venv module is recommended but optional);
  • CUDA Toolkit 13.1+

On an Ubuntu system, the first four dependencies can be installed with APT:

sudo apt-get update && sudo apt-get install build-essential cmake python3-dev python3-venv

The CMakeLists.txt script will also automatically download the DLPack dependency from GitHub. If you wish to disable this behavior and provide your own copy of DLPack, set the CUDA_TILE_CMAKE_DLPACK_PATH environment variable to a local path to the DLPack source tree.

Unless you are already using a Python virtual environment, it is recommended to create one in order to avoid installing cuTile globally:

python3 -m venv env
source env/bin/activate

Once the build dependencies are in place, the simplest way to build cuTile is to install it in editable mode by running the following command in the source root directory:

pip install -e .

This will create the build directory and invoke the CMake-based build process. In editable mode, the compiled extension module will be placed in the build directory, and then a symbolic link to it will be created in the source directory. This makes sure that the pip install -e . command above is needed only once, and recompiling the extension after making changes to the C++ code can be done with make -C build which is much faster. This logic is defined in setup.py.

Experimental Features (Optional)

cuTile now provides an experimental package containing APIs that are still under active development. These are not part of the stable cuda.tile API and may change.

To enable the experimental features when working from a source checkout, install the experimental package from the repository root:

pip install ./experimental

You can also install it directly from a GitHub repository subdirectory:

pip install \
  "git+https://github.com/NVIDIA/cutile-python.git#egg=cuda-tile-experimental&subdirectory=experimental"

For example, this will make the experimental namespace available for autotuner:

from cuda.tile_experimental import autotune_launch, clear_autotune_cache

Running Tests

cuTile uses the pytest framework for testing. Tests have extra dependencies, such as PyTorch, which can be installed with

pip install -r test/requirements.txt

The tests are located in the test/ directory. To run a specific test file, for example test_copy.py, use the following command:

pytest test/test_copy.py

Copyright and License Information

Copyright © 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.

cuTile-Python is licensed under the Apache 2.0 license. See the LICENSES folder for the full license text.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

cuda_tile-1.3.0-cp313-cp313-win_amd64.whl (240.9 kB view details)

Uploaded CPython 3.13Windows x86-64

cuda_tile-1.3.0-cp313-cp313-manylinux2014_x86_64.whl (247.3 kB view details)

Uploaded CPython 3.13

cuda_tile-1.3.0-cp313-cp313-manylinux2014_aarch64.whl (245.8 kB view details)

Uploaded CPython 3.13

cuda_tile-1.3.0-cp312-cp312-win_amd64.whl (240.9 kB view details)

Uploaded CPython 3.12Windows x86-64

cuda_tile-1.3.0-cp312-cp312-manylinux2014_x86_64.whl (247.3 kB view details)

Uploaded CPython 3.12

cuda_tile-1.3.0-cp312-cp312-manylinux2014_aarch64.whl (245.7 kB view details)

Uploaded CPython 3.12

cuda_tile-1.3.0-cp311-cp311-win_amd64.whl (240.6 kB view details)

Uploaded CPython 3.11Windows x86-64

cuda_tile-1.3.0-cp311-cp311-manylinux2014_x86_64.whl (246.7 kB view details)

Uploaded CPython 3.11

cuda_tile-1.3.0-cp311-cp311-manylinux2014_aarch64.whl (245.4 kB view details)

Uploaded CPython 3.11

cuda_tile-1.3.0-cp310-cp310-win_amd64.whl (240.7 kB view details)

Uploaded CPython 3.10Windows x86-64

cuda_tile-1.3.0-cp310-cp310-manylinux2014_x86_64.whl (246.9 kB view details)

Uploaded CPython 3.10

cuda_tile-1.3.0-cp310-cp310-manylinux2014_aarch64.whl (245.4 kB view details)

Uploaded CPython 3.10

File details

Details for the file cuda_tile-1.3.0-cp313-cp313-win_amd64.whl.

File metadata

  • Download URL: cuda_tile-1.3.0-cp313-cp313-win_amd64.whl
  • Upload date:
  • Size: 240.9 kB
  • Tags: CPython 3.13, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for cuda_tile-1.3.0-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 efcb93c25563fe23d6aa083c22893fd703122eaf684b0d36874982d28a6dad0b
MD5 4bf5d360c3e5e450c6a3b58c15687f1f
BLAKE2b-256 2b77c13afad1a06824c1c942afd0205e78ff17f0ee06fc1a943f6e2135cf4112

See more details on using hashes here.

File details

Details for the file cuda_tile-1.3.0-cp313-cp313-manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for cuda_tile-1.3.0-cp313-cp313-manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 a44a81e255fdb7bf8e1f7511fe3a019e6045024574509ea8548e0f71f25f8473
MD5 fbc48e80840dbf331e4a311d8f3a377d
BLAKE2b-256 3520e1daea2dc4e094290ba727750f8342095ae857ff3ba4f81c489f48688613

See more details on using hashes here.

File details

Details for the file cuda_tile-1.3.0-cp313-cp313-manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for cuda_tile-1.3.0-cp313-cp313-manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 8a9bd4dae193cddf438f55d617b6f25b4b0b0fcf4ac4acde7d2695898e396c30
MD5 09605c0b1fee48a7e88ae26b0c878e2e
BLAKE2b-256 9d7dee943554f83d6a143d9e0a5cf27cd7f5f8f6ef447c7e8366d9ad6a5d1bf2

See more details on using hashes here.

File details

Details for the file cuda_tile-1.3.0-cp312-cp312-win_amd64.whl.

File metadata

  • Download URL: cuda_tile-1.3.0-cp312-cp312-win_amd64.whl
  • Upload date:
  • Size: 240.9 kB
  • Tags: CPython 3.12, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for cuda_tile-1.3.0-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 93e20ed31e46e5bf704fb31d13e1c08338d2177838798876f7ee9ec4384b75ba
MD5 9568995e8e3daab51e0462583f6fb877
BLAKE2b-256 db6fd2fd16c2b0d878021dc703eea5f8fe09599d6b04bdc2531a36fc617751fd

See more details on using hashes here.

File details

Details for the file cuda_tile-1.3.0-cp312-cp312-manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for cuda_tile-1.3.0-cp312-cp312-manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 e4865acbff1172aaee304bf9c550586088d8b4545a384423597a590899386709
MD5 f476f605fadb00379e397ab8ba2c5ed5
BLAKE2b-256 407684cb68be463c827bf79da9fa0aa5140838de6455ef6f438bbe0ffa75d378

See more details on using hashes here.

File details

Details for the file cuda_tile-1.3.0-cp312-cp312-manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for cuda_tile-1.3.0-cp312-cp312-manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 375316b64c51ee7cfadb2f170a30c1547bc41eb39f1e233a6556713857d2e81f
MD5 d5cea0c3cbeea8cc70963a0a4beb70f3
BLAKE2b-256 f3494592bc94ca05a07c7947ea114fd12734c8497f2daffee9faa79a03e39fb5

See more details on using hashes here.

File details

Details for the file cuda_tile-1.3.0-cp311-cp311-win_amd64.whl.

File metadata

  • Download URL: cuda_tile-1.3.0-cp311-cp311-win_amd64.whl
  • Upload date:
  • Size: 240.6 kB
  • Tags: CPython 3.11, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for cuda_tile-1.3.0-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 791b363251fbc64db4402d92153ba3d14bc0aaa4d218cea66562af02a7a76bd9
MD5 d44063552112650fd32a968eb035b7c5
BLAKE2b-256 46b068303196d577e497ddf3cef0fd92785d83f47f6239543a5b19dc4076e487

See more details on using hashes here.

File details

Details for the file cuda_tile-1.3.0-cp311-cp311-manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for cuda_tile-1.3.0-cp311-cp311-manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 2888d6b89fae053a53ca7bb703c508a5cf90671d266934573c5b6c25978022c4
MD5 c0a76094c44e9efd0fd2b109016c19c0
BLAKE2b-256 c52d8b416239413bf11d17d42ccee43258f3787da13bcea7b2e42e8bbf04b3da

See more details on using hashes here.

File details

Details for the file cuda_tile-1.3.0-cp311-cp311-manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for cuda_tile-1.3.0-cp311-cp311-manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 59d9843fa723ceb4d680ec246e12e3ded857266e4c2bf5c5d21e530d6d765060
MD5 dd2af013b3a32406910743877b4d1a29
BLAKE2b-256 f4d6753aecb3e8fcee80d20f9d32b4504276691c2f77fc10abbbd8e82197e24c

See more details on using hashes here.

File details

Details for the file cuda_tile-1.3.0-cp310-cp310-win_amd64.whl.

File metadata

  • Download URL: cuda_tile-1.3.0-cp310-cp310-win_amd64.whl
  • Upload date:
  • Size: 240.7 kB
  • Tags: CPython 3.10, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for cuda_tile-1.3.0-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 339769f95b3a5453b7f416da6d1285f24d0daf3a700a895b68dee3fa6fc93e8f
MD5 c4fe322fd00dd54369557dd5ce5fa142
BLAKE2b-256 3d74055b786579909475528d599bc5a2729e49edb6acc5f06cfa3073f4343250

See more details on using hashes here.

File details

Details for the file cuda_tile-1.3.0-cp310-cp310-manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for cuda_tile-1.3.0-cp310-cp310-manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 8c71c2fd9b96c054c126a218f9927c8c8dde72441a532464551b865b416d452a
MD5 d157c0a749045f75c5cb31cb43520946
BLAKE2b-256 abd00e4790cc7a536a685a961d2e04f26be54f86f0974e1eaea71c1b64ec4032

See more details on using hashes here.

File details

Details for the file cuda_tile-1.3.0-cp310-cp310-manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for cuda_tile-1.3.0-cp310-cp310-manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 c55616c648f06f84808648a521c67f2d7c790574d6b53ddf8c3bfbc995d36d45
MD5 3b4b6c3844b9813896598afc4f498c06
BLAKE2b-256 46c81687a83d0739151ca410a5716b5f1dfb6f8feb6381c7c695d72264f17e1c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page