A code generator for array-based code on CPUs and GPUs

## Project description

Loopy lets you easily generate the tedious, complicated code that is necessary to get good performance out of GPUs and multi-core CPUs. Loopy’s core idea is that a computation should be described simply and then transformed into a version that gets high performance. This transformation takes place under user control, from within Python.

It can capture the following types of optimizations:

• Vector and multi-core parallelism in the OpenCL/CUDA model

• Data layout transformations (structure of arrays to array of structures)

• Loop unrolling

• Loop tiling with efficient handling of boundary cases

• Prefetching/copy optimizations

• Instruction level parallelism

• and many more

Loopy targets array-type computations, such as the following:

• dense linear algebra,

• convolutions,

• n-body interactions,

• PDE solvers, such as finite element, finite difference, and Fast-Multipole-type computations

It is not (and does not want to be) a general-purpose programming language.

pip install loopy

In addition, Loopy is compatible with and enhances pyopencl.

