GPGPU algorithms for PyCUDA and PyOpenCL
- separation of computation cores (matrix multiplication, random numbers generation etc) from simple transformations on their input and output values (scaling, typecast etc);
- separation of the preparation and execution stage, maximizing the performance of the execution stage at the expense of the preparation stage (in other words, aiming at large simulations)
- partial abstraction from CUDA/OpenCL
Tests can be run by installing Py.Test and running py.test from the test folder (run py.test --help to get the list of options).