A simple CUDA/OpenCL kernel tuner in Python
Project description
The goal of this project is to provide a - as simple as possible - tool for tuning CUDA and OpenCL kernels. This implies that any CUDA or OpenCL kernel can be tuned without requiring extensive changes to the original kernel code.
A very common problem in GPU programming is that some combination of thread block dimensions and other kernel parameters, like tiling or unrolling factors, results in dramatically better performance than other kernel configurations. The goal of auto-tuning is to automate the process of finding the best performing configuration for a given device.
This kernel tuner aims that you can directly use the tuned kernel without introducing any new dependencies. The tuned kernels can afterwards be used independently of the programming environment, whether that is using C/C++/Java/Fortran or Python doesn’t matter.
The kernel_tuner module currently only contains main one function which is called tune_kernel to which you pass at least the kernel name, a string containing the kernel code, the problem size, a list of kernel function arguments, and a dictionary of tunable parameters. There are also a lot of optional parameters, for a complete list see the full documentation of tune_kernel.
Documentation
The full documentation is available here.
Installation
The easiest way to install the Kernel Tuner is using pip:
pip install kernel_tuner
But you can also install from the git repository. This way you also get the examples and the tutorials:
git clone https://github.com/benvanwerkhoven/kernel_tuner.git
cd kernel_tuner
pip install -r requirements.txt
pip install .
To tune CUDA kernels:
First, make sure you have the CUDA Toolkit installed
You can install PyCuda using pip install pycuda
To tune OpenCL kernels:
First, make sure you have an OpenCL compiler for your intended OpenCL platform
You can install PyOpenCL using pip install pyopencl
If you need more information about how to install the Kernel Tuner and all dependencies see the installation guide
Example usage
The following shows a simple example for tuning a CUDA kernel:
kernel_string = """
__global__ void vector_add(float *c, float *a, float *b, int n) {
int i = blockIdx.x * block_size_x + threadIdx.x;
if (i<n) {
c[i] = a[i] + b[i];
}
}
"""
size = 10000000
a = numpy.random.randn(size).astype(numpy.float32)
b = numpy.random.randn(size).astype(numpy.float32)
c = numpy.zeros_like(b)
n = numpy.int32(size)
args = [c, a, b, n]
tune_params = dict()
tune_params["block_size_x"] = [32, 64, 128, 256, 512]
tune_kernel("vector_add", kernel_string, size, args, tune_params)
The exact same Python code can be used to tune an OpenCL kernel:
kernel_string = """
__kernel void vector_add(__global float *c, __global float *a, __global float *b, int n) {
int i = get_global_id(0);
if (i<n) {
c[i] = a[i] + b[i];
}
}
"""
Or even just a C function, see the example here.
You can find these and many - more extensive - example codes, in the examples directory.
See the full documentation for several highly detailed tutorial-style explanations of example kernels and the scripts to tune them.
Tuning host and kernel code
It is possible to tune for combinations of tunable parameters in both host and kernel code. This allows for a number of powerfull things, such as tuning the number of streams for a kernel that uses CUDA Streams or OpenCL Command Queues to overlap transfers between host and device with kernel execution. This can be done in combination with tuning the parameters inside the kernel code. See the convolution_streams example code and the documentation for a detailed explanation of the kernel tuner Python script.
Correctness verification
Optionally, you can let the kernel tuner verify the output of every kernel it compiles and benchmarks, by passing an answer list. This list matches the list of arguments to the kernel, but contains the expected output of the kernel. Input arguments are replaced with None.
answer = [a+b, None, None] # the order matches the arguments (in args) to the kernel
tune_kernel("vector_add", kernel_string, size, args, tune_params, answer=answer)
Contributing
Please see the Contributions Guide.
Citation
A scientific paper about the Kernel Tuner is in preparation, in the meantime please cite the Kernel Tuner as follows:
@misc{
author = {Ben van Werkhoven},
title = {Kernel Tuner: A simple CUDA/OpenCL Kernel Tuner in Python},
year = {2017}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.