Skip to main content

CUDA Tile Compiler

Project description

cuTile Python

cuTile Python is a programming language for NVIDIA GPUs. The official documentation can be found on docs.nvidia.com, or built from source located in the docs folder.

Example

# This examples uses CuPy which can be installed via `pip install cupy-cuda13x`
# Make sure cuda toolkit 13.1+ is installed: https://developer.nvidia.com/cuda-downloads

import cuda.tile as ct
import cupy
import numpy as np

TILE_SIZE = 16

# cuTile kernel for adding two dense vectors. It runs in parallel on the GPU.
@ct.kernel
def vector_add_kernel(a, b, result):
    block_id = ct.bid(0)
    a_tile = ct.load(a, index=(block_id,), shape=(TILE_SIZE,))
    b_tile = ct.load(b, index=(block_id,), shape=(TILE_SIZE,))
    result_tile = a_tile + b_tile
    ct.store(result, index=(block_id,), tile=result_tile)

# Generate input arrays
a = cupy.random.uniform(-5, 5, 128)
b = cupy.random.uniform(-5, 5, 128)
expected = cupy.asnumpy(a) + cupy.asnumpy(b)

# Allocate an output array and launch the kernel
result = cupy.zeros_like(a)
grid = (ct.cdiv(a.shape[0], TILE_SIZE), 1, 1)
ct.launch(cupy.cuda.get_current_stream(), grid, vector_add_kernel, (a, b, result))

# Verify the results
result_np = cupy.asnumpy(result)
np.testing.assert_array_almost_equal(result_np, expected)

More examples can be found at Samples and TileGym.

System Requirements

cuTile Python generates kernels based on Tile IR which requries NVIDIA Driver r580 or later to run. Furthermore, the tileiras compiler only supports Blackwell GPU with 13.1 release, but the restriction will be removed in the coming versions. Checkout the prerequisites for full list of requirements.

Installing from PyPI

cuTile Python is published on PyPI under the cuda-tile package name and can be installed with pip:

pip install cuda-tile

Currently, the CUDA Toolkit 13.1+ is required and needs to be installed separately. On a Debian-based system, use apt-get install cuda-tileiras-13.1 cuda-compiler-13.1 instead of apt-get install cuda-toolkit-13.1 if you wish to avoid installing the full CUDA Toolkit.

Building from Source

cuTile is written mostly in Python, but includes a C++ extension which needs to be built. You will need:

  • A C++17-capable compiler, such as GNU C++ or MSVC;
  • CMake 3.18+;
  • GNU Make on Linux or msbuild on Windows;
  • Python 3.10+ with development headers (venv module is recommended but optional);
  • CUDA Toolkit 13.1+

On an Ubuntu system, the first four dependencies can be installed with APT:

sudo apt-get update && sudo apt-get install build-essential cmake python3-dev python3-venv

The CMakeLists.txt script will also automatically download the DLPack dependency from GitHub. If you wish to disable this behavior and provide your own copy of DLPack, set the CUDA_TILE_CMAKE_DLPACK_PATH environment variable to a local path to the DLPack source tree.

Unless you are already using a Python virtual environment, it is recommended to create one in order to avoid installing cuTile globally:

python3 -m venv env
source env/bin/activate

Once the build dependencies are in place, the simplest way to build cuTile is to install it in editable mode by running the following command in the source root directory:

pip install -e .

This will create the build directory and invoke the CMake-based build process. In editable mode, the compiled extension module will be placed in the build directory, and then a symbolic link to it will be created in the source directory. This makes sure that the pip install -e . command above is needed only once, and recompiling the extension after making changes to the C++ code can be done with make -C build which is much faster. This logic is defined in setup.py.

Experimental Features (Optional)

cuTile now provides an experimental package containing APIs that are still under active development. These are not part of the stable cuda.tile API and may change.

To enable the experimental features when working from a source checkout, install the experimental package from the repository root:

pip install ./experimental

You can also install it directly from a GitHub repository subdirectory:

pip install \
  "git+https://github.com/NVIDIA/cutile-python.git#egg=cuda-tile-experimental&subdirectory=experimental"

For example, this will make the experimental namespace available for autotuner:

from cuda.tile_experimental import autotune_launch, clear_autotune_cache

Running Tests

cuTile uses the pytest framework for testing. Tests have extra dependencies, such as PyTorch, which can be installed with

pip install -r test/requirements.txt

The tests are located in the test/ directory. To run a specific test file, for example test_copy.py, use the following command:

pytest test/test_copy.py

Copyright and License Information

Copyright © 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.

cuTile-Python is licensed under the Apache 2.0 license. See the LICENSES folder for the full license text.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

cuda_tile-1.0.1-cp313-cp313-win_amd64.whl (168.2 kB view details)

Uploaded CPython 3.13Windows x86-64

cuda_tile-1.0.1-cp313-cp313-manylinux2014_x86_64.whl (174.4 kB view details)

Uploaded CPython 3.13

cuda_tile-1.0.1-cp313-cp313-manylinux2014_aarch64.whl (173.3 kB view details)

Uploaded CPython 3.13

cuda_tile-1.0.1-cp312-cp312-win_amd64.whl (168.2 kB view details)

Uploaded CPython 3.12Windows x86-64

cuda_tile-1.0.1-cp312-cp312-manylinux2014_x86_64.whl (174.4 kB view details)

Uploaded CPython 3.12

cuda_tile-1.0.1-cp312-cp312-manylinux2014_aarch64.whl (173.3 kB view details)

Uploaded CPython 3.12

cuda_tile-1.0.1-cp311-cp311-win_amd64.whl (168.0 kB view details)

Uploaded CPython 3.11Windows x86-64

cuda_tile-1.0.1-cp311-cp311-manylinux2014_x86_64.whl (173.7 kB view details)

Uploaded CPython 3.11

cuda_tile-1.0.1-cp311-cp311-manylinux2014_aarch64.whl (173.2 kB view details)

Uploaded CPython 3.11

cuda_tile-1.0.1-cp310-cp310-win_amd64.whl (168.1 kB view details)

Uploaded CPython 3.10Windows x86-64

cuda_tile-1.0.1-cp310-cp310-manylinux2014_x86_64.whl (173.9 kB view details)

Uploaded CPython 3.10

cuda_tile-1.0.1-cp310-cp310-manylinux2014_aarch64.whl (173.3 kB view details)

Uploaded CPython 3.10

File details

Details for the file cuda_tile-1.0.1-cp313-cp313-win_amd64.whl.

File metadata

  • Download URL: cuda_tile-1.0.1-cp313-cp313-win_amd64.whl
  • Upload date:
  • Size: 168.2 kB
  • Tags: CPython 3.13, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for cuda_tile-1.0.1-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 edab46b1e061e07c6a29f7b82e9826d0824c9e06cb6f69051044a8068491fb4c
MD5 b060a7b7aadf80dcb99eb51f1b9cbeed
BLAKE2b-256 5fc4e753bfae4fc6d3ae659ed3a948b58df7ebcf4c57e8ba71bbf7976e374178

See more details on using hashes here.

File details

Details for the file cuda_tile-1.0.1-cp313-cp313-manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for cuda_tile-1.0.1-cp313-cp313-manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 4db5eab6e7aed5eb0c91698831d7afcb36d92063d4683b6b2448674ac10d621b
MD5 53ff6a301a6142b87b883ebfb064e828
BLAKE2b-256 2e1fc3bae797d6df6808c68afb6f9d247fa3c49b39e022481ef944e6e5b1132c

See more details on using hashes here.

File details

Details for the file cuda_tile-1.0.1-cp313-cp313-manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for cuda_tile-1.0.1-cp313-cp313-manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 d9426536691afe86c2fc4c1c242e50943c316694a1a18eab9600c7780962bb36
MD5 c94be13bb3fc431dd5e67d26c5e5e8f4
BLAKE2b-256 010ce88746a51941ca8af98a9f2950dea87f71188cd3014b507f06b70afb9ce7

See more details on using hashes here.

File details

Details for the file cuda_tile-1.0.1-cp312-cp312-win_amd64.whl.

File metadata

  • Download URL: cuda_tile-1.0.1-cp312-cp312-win_amd64.whl
  • Upload date:
  • Size: 168.2 kB
  • Tags: CPython 3.12, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for cuda_tile-1.0.1-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 fbb94106858152561e626b5d16968ce35998dedb4132cf345b0d49380f10fe73
MD5 08a2bbf8b4971a2a531ccef49586e2c6
BLAKE2b-256 624f9893df9d6898f4cb2abaefaf1fb04f48162e5ca0706c902e3a0dd0cc6a37

See more details on using hashes here.

File details

Details for the file cuda_tile-1.0.1-cp312-cp312-manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for cuda_tile-1.0.1-cp312-cp312-manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 7d1708d90b6b01798ac6b044c44e9ecd640e6d647104874733c679128f5f66cd
MD5 94a8f07b05f4f56a01cae7beb315e2df
BLAKE2b-256 637abcbcb783bb07bfbdc3e9c4ed28da9603d6a1c71f59e79ac25e974ec3d7e2

See more details on using hashes here.

File details

Details for the file cuda_tile-1.0.1-cp312-cp312-manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for cuda_tile-1.0.1-cp312-cp312-manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 d209f2edd03902f7ee4a2fbbf0dcf89ef7f30112b2a6701acb40403745157fc0
MD5 c1850a9663d1df4bfded759f665f7642
BLAKE2b-256 cfba54d28c3325faa209a5d23d02918bb36cc67c39c24bb256132a8f547457a4

See more details on using hashes here.

File details

Details for the file cuda_tile-1.0.1-cp311-cp311-win_amd64.whl.

File metadata

  • Download URL: cuda_tile-1.0.1-cp311-cp311-win_amd64.whl
  • Upload date:
  • Size: 168.0 kB
  • Tags: CPython 3.11, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for cuda_tile-1.0.1-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 32af9de1c146360155e85f12b3622ea269d5e5bf292749513fdbb07318acc603
MD5 f199b4fffc0585dfb14e52edaae1a324
BLAKE2b-256 eb0c5bb979fccdfc9e9911d08ee17776fbab161dc6bf9788dc1591790f39ddbf

See more details on using hashes here.

File details

Details for the file cuda_tile-1.0.1-cp311-cp311-manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for cuda_tile-1.0.1-cp311-cp311-manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 7113c8b1810ac9645a8651f5037ac72fc4a627b4f1ae81622042e28886a325f1
MD5 c9960200c532e3d800f50c5361be5741
BLAKE2b-256 a0f260433698e461f226772bbb39769cbb30803dc2d640009b8d4be92c1393ff

See more details on using hashes here.

File details

Details for the file cuda_tile-1.0.1-cp311-cp311-manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for cuda_tile-1.0.1-cp311-cp311-manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 71311fef14bc1815e1363da5e477df5c032ca78b2cac3e4c536cd68a5a0b3a0d
MD5 b5c013dc03c215849246b36a45bacd78
BLAKE2b-256 891a6e7477784f1b52cd932ba9af4faea51b14fb548a12143822344865090d63

See more details on using hashes here.

File details

Details for the file cuda_tile-1.0.1-cp310-cp310-win_amd64.whl.

File metadata

  • Download URL: cuda_tile-1.0.1-cp310-cp310-win_amd64.whl
  • Upload date:
  • Size: 168.1 kB
  • Tags: CPython 3.10, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for cuda_tile-1.0.1-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 d13425daa0b31e7d9aadf1b61b4db5d9b173b3b49e45f65fe1b8b906ca67fc53
MD5 128c69e4e9e8e1df592613c5f52e1ca4
BLAKE2b-256 21d667fbb375edce7a50dfc1d3b741f84e6d89ac7a86b3b6e886f41326222bb5

See more details on using hashes here.

File details

Details for the file cuda_tile-1.0.1-cp310-cp310-manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for cuda_tile-1.0.1-cp310-cp310-manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 fb24e4a48cf25d1fc9d32702fbf2cc5bbf2072af60106d5c3f4396ad044aee83
MD5 ee2e25c8eb97331b3c3bd7417e846a1b
BLAKE2b-256 bb097c86bad746bcb9f1104ee1ce974987e421f8b3b607e5048e3069f5ad6911

See more details on using hashes here.

File details

Details for the file cuda_tile-1.0.1-cp310-cp310-manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for cuda_tile-1.0.1-cp310-cp310-manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 be68b319bd4dfcbd00c1c4e032aa2eef477ac9596fd6788f2fb56f50725e7b4a
MD5 96c6d80ce88c02f74b5b90db6b682938
BLAKE2b-256 04bd44ee7be1e51a41c24a5004862f0664f626bf63630da77a38d035ed4979e7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page