Skip to main content

CUDA Tile Compiler

Project description

cuTile Python

cuTile Python is a programming language for NVIDIA GPUs. The official documentation can be found on docs.nvidia.com, or built from source located in the docs folder.

Example

# This examples uses CuPy which can be installed via `pip install cupy-cuda13x`
# Make sure cuda toolkit 13.1+ is installed: https://developer.nvidia.com/cuda-downloads

import cuda.tile as ct
import cupy
import numpy as np

TILE_SIZE = 16

# cuTile kernel for adding two dense vectors. It runs in parallel on the GPU.
@ct.kernel
def vector_add_kernel(a, b, result):
    block_id = ct.bid(0)
    a_tile = ct.load(a, index=(block_id,), shape=(TILE_SIZE,))
    b_tile = ct.load(b, index=(block_id,), shape=(TILE_SIZE,))
    result_tile = a_tile + b_tile
    ct.store(result, index=(block_id,), tile=result_tile)

# Generate input arrays
a = cupy.random.uniform(-5, 5, 128)
b = cupy.random.uniform(-5, 5, 128)
expected = cupy.asnumpy(a) + cupy.asnumpy(b)

# Allocate an output array and launch the kernel
result = cupy.zeros_like(a)
grid = (ct.cdiv(a.shape[0], TILE_SIZE), 1, 1)
ct.launch(cupy.cuda.get_current_stream(), grid, vector_add_kernel, (a, b, result))

# Verify the results
result_np = cupy.asnumpy(result)
np.testing.assert_array_almost_equal(result_np, expected)

More examples can be found at Samples and TileGym.

System Requirements

cuTile Python generates kernels based on Tile IR which requries NVIDIA Driver r580 or later to run. Furthermore, the tileiras compiler only supports Blackwell GPU with 13.1 release, but the restriction will be removed in the coming versions. Checkout the prerequisites for full list of requirements.

Installing from PyPI

cuTile Python is published on PyPI under the cuda-tile package name and can be installed with pip:

pip install cuda-tile

Currently, the CUDA Toolkit 13.1+ is required and needs to be installed separately. On a Debian-based system, use apt-get install cuda-tileiras-13.1 cuda-compiler-13.1 instead of apt-get install cuda-toolkit-13.1 if you wish to avoid installing the full CUDA Toolkit.

Building from Source

cuTile is written mostly in Python, but includes a C++ extension which needs to be built. You will need:

  • A C++17-capable compiler, such as GNU C++ or MSVC;
  • CMake 3.18+;
  • GNU Make on Linux or msbuild on Windows;
  • Python 3.10+ with development headers (venv module is recommended but optional);
  • CUDA Toolkit 13.1+

On an Ubuntu system, the first four dependencies can be installed with APT:

sudo apt-get update && sudo apt-get install build-essential cmake python3-dev python3-venv

The CMakeLists.txt script will also automatically download the DLPack dependency from GitHub. If you wish to disable this behavior and provide your own copy of DLPack, set the CUDA_TILE_CMAKE_DLPACK_PATH environment variable to a local path to the DLPack source tree.

Unless you are already using a Python virtual environment, it is recommended to create one in order to avoid installing cuTile globally:

python3 -m venv env
source env/bin/activate

Once the build dependencies are in place, the simplest way to build cuTile is to install it in editable mode by running the following command in the source root directory:

pip install -e .

This will create the build directory and invoke the CMake-based build process. In editable mode, the compiled extension module will be placed in the build directory, and then a symbolic link to it will be created in the source directory. This makes sure that the pip install -e . command above is needed only once, and recompiling the extension after making changes to the C++ code can be done with make -C build which is much faster. This logic is defined in setup.py.

Experimental Features (Optional)

cuTile now provides an experimental package containing APIs that are still under active development. These are not part of the stable cuda.tile API and may change.

To enable the experimental features when working from a source checkout, install the experimental package from the repository root:

pip install ./experimental

You can also install it directly from a GitHub repository subdirectory:

pip install \
  "git+https://github.com/NVIDIA/cutile-python.git#egg=cuda-tile-experimental&subdirectory=experimental"

For example, this will make the experimental namespace available for autotuner:

from cuda.tile_experimental import autotune_launch, clear_autotune_cache

Running Tests

cuTile uses the pytest framework for testing. Tests have extra dependencies, such as PyTorch, which can be installed with

pip install -r test/requirements.txt

The tests are located in the test/ directory. To run a specific test file, for example test_copy.py, use the following command:

pytest test/test_copy.py

Copyright and License Information

Copyright © 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.

cuTile-Python is licensed under the Apache 2.0 license. See the LICENSES folder for the full license text.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

cuda_tile-1.1.0-cp313-cp313-win_amd64.whl (182.1 kB view details)

Uploaded CPython 3.13Windows x86-64

cuda_tile-1.1.0-cp313-cp313-manylinux2014_x86_64.whl (188.0 kB view details)

Uploaded CPython 3.13

cuda_tile-1.1.0-cp313-cp313-manylinux2014_aarch64.whl (186.6 kB view details)

Uploaded CPython 3.13

cuda_tile-1.1.0-cp312-cp312-win_amd64.whl (182.1 kB view details)

Uploaded CPython 3.12Windows x86-64

cuda_tile-1.1.0-cp312-cp312-manylinux2014_x86_64.whl (187.9 kB view details)

Uploaded CPython 3.12

cuda_tile-1.1.0-cp312-cp312-manylinux2014_aarch64.whl (186.6 kB view details)

Uploaded CPython 3.12

cuda_tile-1.1.0-cp311-cp311-win_amd64.whl (181.9 kB view details)

Uploaded CPython 3.11Windows x86-64

cuda_tile-1.1.0-cp311-cp311-manylinux2014_x86_64.whl (187.5 kB view details)

Uploaded CPython 3.11

cuda_tile-1.1.0-cp311-cp311-manylinux2014_aarch64.whl (186.3 kB view details)

Uploaded CPython 3.11

cuda_tile-1.1.0-cp310-cp310-win_amd64.whl (182.0 kB view details)

Uploaded CPython 3.10Windows x86-64

cuda_tile-1.1.0-cp310-cp310-manylinux2014_x86_64.whl (187.6 kB view details)

Uploaded CPython 3.10

cuda_tile-1.1.0-cp310-cp310-manylinux2014_aarch64.whl (186.5 kB view details)

Uploaded CPython 3.10

File details

Details for the file cuda_tile-1.1.0-cp313-cp313-win_amd64.whl.

File metadata

  • Download URL: cuda_tile-1.1.0-cp313-cp313-win_amd64.whl
  • Upload date:
  • Size: 182.1 kB
  • Tags: CPython 3.13, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for cuda_tile-1.1.0-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 52458f9329fed18630fc16505d7a21462481ad98ce2bb91a88a38fe943a831a9
MD5 53aee2c503d52c6fb8201080a6011fe7
BLAKE2b-256 8be6e216f82c4288b4a01e97f886080ca9fdc19233d8c153484e963b1f40c389

See more details on using hashes here.

File details

Details for the file cuda_tile-1.1.0-cp313-cp313-manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for cuda_tile-1.1.0-cp313-cp313-manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 2758aac91c7c5a9cdacd39c44c2876aa8cb96d72a8a861fbe240c50b5e9ba235
MD5 8a48e270eccde21233c8a260b277da82
BLAKE2b-256 7fc7e08882109e672c9d7b6fd980949df07757479254ba86aa8cb1df8741b5a0

See more details on using hashes here.

File details

Details for the file cuda_tile-1.1.0-cp313-cp313-manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for cuda_tile-1.1.0-cp313-cp313-manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 9008968989e83ffe705c9c8ac0d3fb4d9218fd98468eeff85f50b1ecc062d4c9
MD5 3c505f865b9df6abd1fdf88c25e0c400
BLAKE2b-256 12ca195f8d675d662fe8d88cd497e2e3ca5d9792e087b3ada8f86ddc073b4a0c

See more details on using hashes here.

File details

Details for the file cuda_tile-1.1.0-cp312-cp312-win_amd64.whl.

File metadata

  • Download URL: cuda_tile-1.1.0-cp312-cp312-win_amd64.whl
  • Upload date:
  • Size: 182.1 kB
  • Tags: CPython 3.12, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for cuda_tile-1.1.0-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 a38e09f6335a93fa94a41c07f3cf9f6089f75eeafce77443cf7e30bf1abadbab
MD5 6640724dcfeb5f65e7d20205d83a0a34
BLAKE2b-256 313b65b56adbe045ae77003597fbffd5caacc5a49168e9efa3e0db8b28aa1b19

See more details on using hashes here.

File details

Details for the file cuda_tile-1.1.0-cp312-cp312-manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for cuda_tile-1.1.0-cp312-cp312-manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 35b3e6a9368390339211ed7f1ee3416a829a44d1c1b335ef0df814a6e6ff4902
MD5 166c7509949fcf4e9e0ece7b0e0712fd
BLAKE2b-256 66ca6b4e9e3d16e123ebd304399e4d45d6379cbebdf2da2acc091e5004c5817f

See more details on using hashes here.

File details

Details for the file cuda_tile-1.1.0-cp312-cp312-manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for cuda_tile-1.1.0-cp312-cp312-manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 b5a777cfa88411ee12a77440f015ef5136a3c70208391f00626dd3f83bcdbb65
MD5 70b7479676adf07982929c3a21462245
BLAKE2b-256 8da00ba0cc999197bd35e1865418c92fbf5bac39510e2f79d33963b92b044235

See more details on using hashes here.

File details

Details for the file cuda_tile-1.1.0-cp311-cp311-win_amd64.whl.

File metadata

  • Download URL: cuda_tile-1.1.0-cp311-cp311-win_amd64.whl
  • Upload date:
  • Size: 181.9 kB
  • Tags: CPython 3.11, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for cuda_tile-1.1.0-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 4c4b09e0b9b6fb6514aa6192cf54113bcb5df4ca7fe7b1ff6c4773091fa0d13d
MD5 e5208b3edc197aeb02fc9d70d64267b5
BLAKE2b-256 91599679eca371e75cd5b56fdba259a5af2c72aa2a9bb2436d72327629622b11

See more details on using hashes here.

File details

Details for the file cuda_tile-1.1.0-cp311-cp311-manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for cuda_tile-1.1.0-cp311-cp311-manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 eb75d3caf9555f0b160f009e0c2dd995ccb986e4a766fe4278eedb52d086c89f
MD5 672ff78fa8a6be440d6442932f94d04f
BLAKE2b-256 09169f82c276d711d7b71d418c2f2759dc79635928fc05d23e320f0b59723afb

See more details on using hashes here.

File details

Details for the file cuda_tile-1.1.0-cp311-cp311-manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for cuda_tile-1.1.0-cp311-cp311-manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 8cc2d5c4e263d369514c577c175ee6458a80bd50357351f5104790b46933842b
MD5 519e19848e26748b22d58c5004fe4b03
BLAKE2b-256 98f76d73b8111e3dcfdee9fc3f603e8971aebd177f00d27e4b4f0f152713ed5e

See more details on using hashes here.

File details

Details for the file cuda_tile-1.1.0-cp310-cp310-win_amd64.whl.

File metadata

  • Download URL: cuda_tile-1.1.0-cp310-cp310-win_amd64.whl
  • Upload date:
  • Size: 182.0 kB
  • Tags: CPython 3.10, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for cuda_tile-1.1.0-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 0a923938864eba2f2c00a175da2ee7f9632f881581574cd7c637b458f2a650f4
MD5 c71bcdf194ec861926fee2d2d23d2f62
BLAKE2b-256 9bcf45d197e60d26a24bf4561806cb657ef637fe06505108b16446d887ff3c6c

See more details on using hashes here.

File details

Details for the file cuda_tile-1.1.0-cp310-cp310-manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for cuda_tile-1.1.0-cp310-cp310-manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 d0e0b1e49d57123c79403dae2aa3ce372d3f75298b4f1be904e3e8285a1161c5
MD5 9227bf7335265c2e2a9f6569162984dc
BLAKE2b-256 b8b9d87d16f7aa20fed198bfd2ab2d760d781441f81ae635d1906f5962c24e09

See more details on using hashes here.

File details

Details for the file cuda_tile-1.1.0-cp310-cp310-manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for cuda_tile-1.1.0-cp310-cp310-manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 cb7a909bb76170f492537efccba89131022332c66acc4b0d56f3ec81f0f0f102
MD5 8b831aa28643a5032b9ae21435e1c2c1
BLAKE2b-256 5f134d0279b87c646b6016722aac7bf95d256f5412de5b3690e365196f46cfd3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page