Skip to main content

CUDA Tile Compiler

Project description

cuTile Python

cuTile Python is a programming language for NVIDIA GPUs. The official documentation can be found on docs.nvidia.com, or built from source located in the docs folder.

Example

# This examples uses CuPy which can be installed via `pip install cupy-cuda13x`
# Make sure cuda toolkit 13.1+ is installed: https://developer.nvidia.com/cuda-downloads

import cuda.tile as ct
import cupy
import numpy as np

TILE_SIZE = 16

# cuTile kernel for adding two dense vectors. It runs in parallel on the GPU.
@ct.kernel
def vector_add_kernel(a, b, result):
    block_id = ct.bid(0)
    a_tile = ct.load(a, index=(block_id,), shape=(TILE_SIZE,))
    b_tile = ct.load(b, index=(block_id,), shape=(TILE_SIZE,))
    result_tile = a_tile + b_tile
    ct.store(result, index=(block_id,), tile=result_tile)

# Generate input arrays
rng = cupy.random.default_rng()
a = rng.random(128)
b = rng.random(128)
expected = cupy.asnumpy(a) + cupy.asnumpy(b)

# Allocate an output array and launch the kernel
result = cupy.zeros_like(a)
grid = (ct.cdiv(a.shape[0], TILE_SIZE), 1, 1)
ct.launch(cupy.cuda.get_current_stream(), grid, vector_add_kernel, (a, b, result))

# Verify the results
result_np = cupy.asnumpy(result)
np.testing.assert_array_almost_equal(result_np, expected)

More examples can be found at Samples and TileGym.

System Requirements

cuTile Python generates kernels based on Tile IR which requires NVIDIA Driver r580 or later to run. Furthermore, the tileiras compiler (version 13.2) only supports Blackwell GPU and Ampere/Ada GPU. Hopper GPU will be supported in the coming versions. Checkout the prerequisites for full list of requirements.

Installing from PyPI

cuTile Python is published on PyPI under the cuda-tile package name and can be installed with pip:

pip install cuda-tile[tileiras]

The optional tileiras dependency installs the tileiras compiler directly into your python environment.

If you do not want to have tileiras inside the python environment, run

pip install cuda-tile

and install CUDA Toolkit 13.1+ separately.

On a Debian-based system, use apt-get install cuda-tileiras-13.2 cuda-compiler-13.2 instead of apt-get install cuda-toolkit-13.2 if you wish to avoid installing the full CUDA Toolkit.

Building from Source

cuTile is written mostly in Python, but includes a C++ extension which needs to be built. You will need:

  • A C++17-capable compiler, such as GNU C++ or MSVC;
  • CMake 3.18+;
  • GNU Make on Linux or msbuild on Windows;
  • Python 3.10+ with development headers (venv module is recommended but optional);
  • CUDA Toolkit 13.1+

On an Ubuntu system, the first four dependencies can be installed with APT:

sudo apt-get update && sudo apt-get install build-essential cmake python3-dev python3-venv

The CMakeLists.txt script will also automatically download the DLPack dependency from GitHub. If you wish to disable this behavior and provide your own copy of DLPack, set the CUDA_TILE_CMAKE_DLPACK_PATH environment variable to a local path to the DLPack source tree.

Unless you are already using a Python virtual environment, it is recommended to create one in order to avoid installing cuTile globally:

python3 -m venv env
source env/bin/activate

Once the build dependencies are in place, the simplest way to build cuTile is to install it in editable mode by running the following command in the source root directory:

pip install -e .

This will create the build directory and invoke the CMake-based build process. In editable mode, the compiled extension module will be placed in the build directory, and then a symbolic link to it will be created in the source directory. This makes sure that the pip install -e . command above is needed only once, and recompiling the extension after making changes to the C++ code can be done with make -C build which is much faster. This logic is defined in setup.py.

Experimental Features (Optional)

cuTile now provides an experimental package containing APIs that are still under active development. These are not part of the stable cuda.tile API and may change.

To enable the experimental features when working from a source checkout, install the experimental package from the repository root:

pip install ./experimental/tile_experimental

You can also install it directly from a GitHub repository subdirectory:

pip install \
  "git+https://github.com/NVIDIA/cutile-python.git#egg=cuda-tile-experimental&subdirectory=experimental/tile_experimental"

For example, this will make the experimental namespace available for autotuner:

from cuda.tile_experimental import autotune_launch, clear_autotune_cache

Running Tests

cuTile uses the pytest framework for testing. Tests have extra dependencies, such as PyTorch, which can be installed with

For Python non-free-threading build:

pip install -r test/requirements.txt

Or for Python free-threading build:

pip install -r test/requirements-ft.txt

The tests are located in the test/ directory. To run a specific test file, for example test_copy.py, use the following command:

pytest test/test_copy.py

Copyright and License Information

Copyright © 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.

cuTile-Python is licensed under the Apache 2.0 license. See the LICENSES folder for the full license text.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

cuda_tile-1.4.0-cp314-cp314t-win_amd64.whl (275.9 kB view details)

Uploaded CPython 3.14tWindows x86-64

cuda_tile-1.4.0-cp314-cp314t-manylinux2014_x86_64.whl (284.3 kB view details)

Uploaded CPython 3.14t

cuda_tile-1.4.0-cp314-cp314t-manylinux2014_aarch64.whl (283.2 kB view details)

Uploaded CPython 3.14t

cuda_tile-1.4.0-cp314-cp314-win_amd64.whl (270.6 kB view details)

Uploaded CPython 3.14Windows x86-64

cuda_tile-1.4.0-cp314-cp314-manylinux2014_x86_64.whl (282.6 kB view details)

Uploaded CPython 3.14

cuda_tile-1.4.0-cp314-cp314-manylinux2014_aarch64.whl (281.2 kB view details)

Uploaded CPython 3.14

cuda_tile-1.4.0-cp313-cp313-win_amd64.whl (269.8 kB view details)

Uploaded CPython 3.13Windows x86-64

cuda_tile-1.4.0-cp313-cp313-manylinux2014_x86_64.whl (282.5 kB view details)

Uploaded CPython 3.13

cuda_tile-1.4.0-cp313-cp313-manylinux2014_aarch64.whl (281.0 kB view details)

Uploaded CPython 3.13

cuda_tile-1.4.0-cp312-cp312-win_amd64.whl (269.8 kB view details)

Uploaded CPython 3.12Windows x86-64

cuda_tile-1.4.0-cp312-cp312-manylinux2014_x86_64.whl (282.5 kB view details)

Uploaded CPython 3.12

cuda_tile-1.4.0-cp312-cp312-manylinux2014_aarch64.whl (281.0 kB view details)

Uploaded CPython 3.12

cuda_tile-1.4.0-cp311-cp311-win_amd64.whl (269.4 kB view details)

Uploaded CPython 3.11Windows x86-64

cuda_tile-1.4.0-cp311-cp311-manylinux2014_x86_64.whl (282.1 kB view details)

Uploaded CPython 3.11

cuda_tile-1.4.0-cp311-cp311-manylinux2014_aarch64.whl (280.9 kB view details)

Uploaded CPython 3.11

cuda_tile-1.4.0-cp310-cp310-win_amd64.whl (269.8 kB view details)

Uploaded CPython 3.10Windows x86-64

cuda_tile-1.4.0-cp310-cp310-manylinux2014_x86_64.whl (282.6 kB view details)

Uploaded CPython 3.10

cuda_tile-1.4.0-cp310-cp310-manylinux2014_aarch64.whl (281.2 kB view details)

Uploaded CPython 3.10

File details

Details for the file cuda_tile-1.4.0-cp314-cp314t-win_amd64.whl.

File metadata

  • Download URL: cuda_tile-1.4.0-cp314-cp314t-win_amd64.whl
  • Upload date:
  • Size: 275.9 kB
  • Tags: CPython 3.14t, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for cuda_tile-1.4.0-cp314-cp314t-win_amd64.whl
Algorithm Hash digest
SHA256 3f58eac5577ea3ed7c17bfcab015a506fd2cf61f8848407c5b403f1bf46c55ca
MD5 6c5952a01be31e9029ba426739c7e0c5
BLAKE2b-256 0dab0883194457932150a5ad334d609ac17bd704345974d21c8bae6ea251e7ed

See more details on using hashes here.

File details

Details for the file cuda_tile-1.4.0-cp314-cp314t-manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for cuda_tile-1.4.0-cp314-cp314t-manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 675b2afff62af5d4e72c34bc72d0be27b0933a44933b8a449f590fbded8c1107
MD5 f20ad46fccea4377e8089fa30af2ce71
BLAKE2b-256 18c0fee527a085fca414fc993769912eb8ba2e15ce388f3168b868706e6d4c61

See more details on using hashes here.

File details

Details for the file cuda_tile-1.4.0-cp314-cp314t-manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for cuda_tile-1.4.0-cp314-cp314t-manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 b3cbeffbe0fedac4936edcf00b6ba13ab5ddb74d3b7ce4a287dfc04491b5f6af
MD5 ac997927308ca202b632394c6d9c8998
BLAKE2b-256 abdff7f1dfa4d1ee7cc5b69e11d756be6ffec1561a5c7e3836fd0f71ca49adcf

See more details on using hashes here.

File details

Details for the file cuda_tile-1.4.0-cp314-cp314-win_amd64.whl.

File metadata

  • Download URL: cuda_tile-1.4.0-cp314-cp314-win_amd64.whl
  • Upload date:
  • Size: 270.6 kB
  • Tags: CPython 3.14, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for cuda_tile-1.4.0-cp314-cp314-win_amd64.whl
Algorithm Hash digest
SHA256 c19e10fe70ba92709b6ca446d1c52a8a346b56f4f8ad7c8941736f60e32f3c87
MD5 f43d0a0ec8fbdf9c6721b486ccd3175e
BLAKE2b-256 61bb211c0d5121230ee76cfc1a9ee107ec28aaae9e6ffb43a04aa172d0d4f4dc

See more details on using hashes here.

File details

Details for the file cuda_tile-1.4.0-cp314-cp314-manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for cuda_tile-1.4.0-cp314-cp314-manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 4b1a591c26836a550c2bf87c22d31c4716e5f83d24d255f843d9429625cca973
MD5 d149167648fd4b1552c8c97d6ea28932
BLAKE2b-256 8ffbbf3849ad68b1858ba50e6992863d266892d7d7db02d11c485c26cd090a1b

See more details on using hashes here.

File details

Details for the file cuda_tile-1.4.0-cp314-cp314-manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for cuda_tile-1.4.0-cp314-cp314-manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 738593650784ebb3c601486914b563e7569144fe596048766ea9e12280ac3bb9
MD5 5bd52b36a4b38ca455be098511135294
BLAKE2b-256 0dc646a329f4c56ce54471784366394e235804423df2531307e14112e4636c76

See more details on using hashes here.

File details

Details for the file cuda_tile-1.4.0-cp313-cp313-win_amd64.whl.

File metadata

  • Download URL: cuda_tile-1.4.0-cp313-cp313-win_amd64.whl
  • Upload date:
  • Size: 269.8 kB
  • Tags: CPython 3.13, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for cuda_tile-1.4.0-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 edd1df4d7955032c7be2a26c6d7e47261415ba7c87587705e0f4f1fd0d61650a
MD5 bac940e5f92738d5e4fdf8b515a850d0
BLAKE2b-256 a16741f1acdf21bf6214a3a1c3b46d39b8eb0f9eba7aecc6b57005db35d56f9a

See more details on using hashes here.

File details

Details for the file cuda_tile-1.4.0-cp313-cp313-manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for cuda_tile-1.4.0-cp313-cp313-manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 45be74f6568c440446f510bc7799b953858e64c6abf26e96f2c9598a79084860
MD5 e94a881f5a2274642dee4fdf3e47c7c7
BLAKE2b-256 110b4770f9e36b8108ce8c9078f71eb21c65e594d79c0770dd38daa045cfbd6c

See more details on using hashes here.

File details

Details for the file cuda_tile-1.4.0-cp313-cp313-manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for cuda_tile-1.4.0-cp313-cp313-manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 fc74185efd81f6153af0a19549d111dec6861ee9b9bc27927a2cef6e19173eb5
MD5 4255744117b1617858cb4cceff87cadd
BLAKE2b-256 5ead42f0655e6aee5c59015634b46d7f13bc22e74af28d10fb2008a062b37349

See more details on using hashes here.

File details

Details for the file cuda_tile-1.4.0-cp312-cp312-win_amd64.whl.

File metadata

  • Download URL: cuda_tile-1.4.0-cp312-cp312-win_amd64.whl
  • Upload date:
  • Size: 269.8 kB
  • Tags: CPython 3.12, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for cuda_tile-1.4.0-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 616f13cbc7af6caa7b92430b85ba0a429d1f96ca9e7e04a29d89114cfe859663
MD5 2d1abe525b85bad5cc3dbff7ccbfb64e
BLAKE2b-256 6fbb4152dc08a8de5bcdc4b9d80b6917216289526f6e786b09ee80d4df27bcfb

See more details on using hashes here.

File details

Details for the file cuda_tile-1.4.0-cp312-cp312-manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for cuda_tile-1.4.0-cp312-cp312-manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 1d9d99b6fa57366af3f8707ac4fd91411275af2ee736996a60620240fcf92070
MD5 251ef9cfc60dee3524b35474000ca927
BLAKE2b-256 d79a7fbdbdb30c375f80818941165adfc4f1dc6cebaf937c6a9081a02d5871f0

See more details on using hashes here.

File details

Details for the file cuda_tile-1.4.0-cp312-cp312-manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for cuda_tile-1.4.0-cp312-cp312-manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 9e358a85a153820aa0a51d0e09346d884a3c14b88c0313d20d0fb9f53952abae
MD5 3566cc70c5235c265789925691111c98
BLAKE2b-256 429364ef40d3982dcda7a97ebfa3e3bb9045b573d4eb3877fa5d1fa3cd2541d3

See more details on using hashes here.

File details

Details for the file cuda_tile-1.4.0-cp311-cp311-win_amd64.whl.

File metadata

  • Download URL: cuda_tile-1.4.0-cp311-cp311-win_amd64.whl
  • Upload date:
  • Size: 269.4 kB
  • Tags: CPython 3.11, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for cuda_tile-1.4.0-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 9825202c1175dcac6c2daafd176da4444801e13049b4e2688d92f5b582f6ea6d
MD5 f5c7690e55a1995137608eb77b1a7eba
BLAKE2b-256 1fc91b8df78c55f7ef544e7b7aab381a99495650d49c65e3ba85189e888b89b7

See more details on using hashes here.

File details

Details for the file cuda_tile-1.4.0-cp311-cp311-manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for cuda_tile-1.4.0-cp311-cp311-manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 5741a789aaff85e3b2417b8611f0d11f967b9bac567432f0057b2b8bf72259ac
MD5 8a11c20e2d2a69bae110449663817ecd
BLAKE2b-256 c61b575207f424c75e7b1608e056a89c15bfc3750f6178e5c659f693a7e822b1

See more details on using hashes here.

File details

Details for the file cuda_tile-1.4.0-cp311-cp311-manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for cuda_tile-1.4.0-cp311-cp311-manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 da2649de97cbaf886d564a9f75b3bd2fb112999c99c58a27c817a46bb8725f29
MD5 5c7561d862e5cfa503715b2fc0a2098a
BLAKE2b-256 aed4a5849ee8ee58d0275c9e7738aa5b16d1ad669ed5aa4d1b26af683eda065e

See more details on using hashes here.

File details

Details for the file cuda_tile-1.4.0-cp310-cp310-win_amd64.whl.

File metadata

  • Download URL: cuda_tile-1.4.0-cp310-cp310-win_amd64.whl
  • Upload date:
  • Size: 269.8 kB
  • Tags: CPython 3.10, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for cuda_tile-1.4.0-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 bbfaa33f86d93c7a0ae1f82b626a9f442cd6ff5f991c8a4e7214abd170cf3d21
MD5 9961cb796aaf9e1dfc5e80a6a58f27d3
BLAKE2b-256 cfc2bbf50ef77786c8b4d5160786d4ab205f567eec0e80dc93906b3340bd7edd

See more details on using hashes here.

File details

Details for the file cuda_tile-1.4.0-cp310-cp310-manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for cuda_tile-1.4.0-cp310-cp310-manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 63fc9edd9376a196870ff9dcd9cdb9a7e4d2f2271a4e531c390b232bc60ee0f0
MD5 d73029d52790b1f00179f29c431ff31a
BLAKE2b-256 f06fa20b815a20b578c42bd056e40749477efc33dc098cb04dc3c7a3417b17b3

See more details on using hashes here.

File details

Details for the file cuda_tile-1.4.0-cp310-cp310-manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for cuda_tile-1.4.0-cp310-cp310-manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 8758454ae3971cd8dc195a61210e42ad3696528a9da6a4a429da4cdbbf305d80
MD5 c0f6430e7180b2462672cfb5fbc74b0b
BLAKE2b-256 c2df0bb510403487484bed0a844793a1e8a043e3c4fe35f92f6e1cb5de3bd35e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page