An easy way to run OpenCL kernel files
Project description
OpenCL Kernel Python Wrapper
Install
Requirements
- OpenCL GPU hardware
- numpy
- cmake(if compile from source)
Install from wheel
pip install pyoclk
or download wheel from release and install
Compile from source
Clone this repo
clone by http
git clone --recursive https://github.com/jinmingyi1998/opencl_kernels.git
with ssh
git clone --recursive git@github.com:jinmingyi1998/opencl_kernels.git
Install
cd opencl_kernels
python setup.py install
DO NOT move this directory after install
Usage
Kernel File:
a file named add.cl
kernel void add(global float*a, global float*out, int int_arg, float float_arg){
int x = get_global_id(0);
if(x==0){
printf(" accept int arg: %d, accept float arg: %f\n",int_arg,float_arg);
}
out[x] = a[x] * float_arg + int_arg;
}
Python Code
OOP Style
import numpy as np
import oclk
a = np.random.rand(100, 100).reshape([10, -1])
a = np.float32(a)
out = np.zeros(a.shape)
out = np.float32(out)
runner = oclk.Runner()
runner.load_kernel("add.cl", "add", "")
timer = oclk.TimerArgs(
enable=True,
warmup=10,
repeat=50,
name='add_kernel'
)
runner.run(
kernel_name="add",
input=[
{"name": "a", "value": a, },
{"name": "out", "value": out, },
{"name": "int_arg", "value": 1, "type": "int"},
{"name": "float_arg", "value": 12.34}
],
output=['out'],
local_work_size=[1, 1],
global_work_size=a.shape,
timer=timer
)
# check result
a = a.reshape([-1])
out = out.reshape([-1])
print(a[:8])
print(out[:8])
Call with Functions
import numpy as np
import oclk
a = np.random.rand(100, 100).reshape([10, -1])
a = np.float32(a)
out = np.zeros(a.shape)
out = np.float32(out)
oclk.init()
oclk.load_kernel("add.cl", "add", "")
r = oclk.run(
kernel_name="add",
input=[
{"name": "a", "value": a, },
{"name": "out", "value": out, },
{"name": "int_arg", "value": 1, },
{"name": "float_arg", "value": 12.34}
],
output=['out'],
local_work_size=[1, 1],
global_work_size=a.shape
)
# check result
a = a.reshape([-1])
out = out.reshape([-1])
print(a[:8])
print(out[:8])
Python api Usage
API
load_kernel
def loak_kernel(
cl_file: str, kernel_name: str, compile_option: Union[str, List[str]]
) -> int: ...
- filename can be absolute or relative path
- kernel_name is the kernel functions' name
- compile option can be strings like
-DMY_DEF=1
,-D
is necessary
release_kernel
def release_kernel(kernel_name: str) -> int: ...
unload kernel from context, kernel name cannot be duplicated.
If you want to reload a kernel, you have to release it firstly.
run
def run(*, kernel_name: str,
input: List[Dict[str, Union[int, float, np.array]]],
output: List[str],
local_work_size: List[int],
global_work_size: List[int],
wait: bool = True,
timer: Union[Dict, TimerArgs] = TimerArgsDisabled) -> List[np.ndarray]: ...
- input: Dictionary to set input args, in the same order as kernel function
- args from np.array should be contiguous array
- constant args:
- python type: float -> c type: float
- python type: int -> c type: long
- or specify c type with field "type", support types:
- [unsigned] int
- [unsigned] long
- float
- double
- output: List of names to specify which array will be get back from GPU buffer
- local_work_size/global_work_work: list of integer, specified work sizes. local_work_size can be set to
[-1]
, then will passnullptr
toclEnqueueNDRangeKernel
- wait: Optional, default true, wait for GPU
- timer: Optional, arguments to set up a timer for benchmark kernels
- warmup: recycle times before timing
- repeat: repeat multiple times and get AVERAGE TIME of multiple times, the result is
elapsed time / repeat
- name: name of a global timer
example
a = np.zeros([16, 16, 16], dtype=np.float32)
b = np.zeros([16, 16, 16], dtype=np.float32)
c = np.zeros([16, 16, 16], dtype=np.float32)
timer = TimerArgs(enable=True,
warmup=10,
repeat=100,
name='timer_name'
)
run(kernel_name='add',
input=[
{"name": "a", "value": a, },
{"name": "b", "value": b, },
{"name": "int_arg", "value": 1, "type": "int"},
{"name": "float_arg", "value": 12.34},
{"name": "c", "value": c}
],
output=['c'],
local_work_size=[1, 1, 1],
global_work_size=a.shape,
timer=timer
)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
No source distribution files available for this release.See tutorial on generating distribution archives.
Built Distributions
Close
Hashes for pyoclk-1.1.1-cp312-cp312-manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2061fe4c0a31ad23c45e46f611e950dd78c9edef54d34b52955cac1ff0c6454f |
|
MD5 | 5ab2932f50c08ffd0cda834b2a654b54 |
|
BLAKE2b-256 | f16e497cd3d1ca00bb5c96ca03e7daf8965519e05a9246a169d9e2ed5a1d26e1 |
Close
Hashes for pyoclk-1.1.1-cp311-cp311-manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2ef7025889b7743f50dd9790de8819b454c9f3861fd082b204ee9b90006e6952 |
|
MD5 | 181f17e99ce77a07eb25fd05ad160e2a |
|
BLAKE2b-256 | 2680dcc6f729e5d8391dd4d26ded157a09d58ed7e2b59dab2454c9b721e92ec0 |
Close
Hashes for pyoclk-1.1.1-cp310-cp310-manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5d703c1fdc786c98a842b7e4ee4e414155de51f7eaff9427564626b65ee240a9 |
|
MD5 | 95086ebee02cc6fbd0c60eab9b1fc639 |
|
BLAKE2b-256 | eefc418fa0182e6ab9e27628bd712d308069aadef77fcf33a5316995b113cf5a |
Close
Hashes for pyoclk-1.1.1-cp39-cp39-manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ca3c39456af2ef00a3b4fb2c49b6ac38216ddcc36bea25a73aa35605b50f8479 |
|
MD5 | ff3c68342e9ce14ec8676e70576b4311 |
|
BLAKE2b-256 | d88bdda7e636e7f08b6d0eba1e66ac5e8a7a8b43d4c6c5c3f3437c49d1f16cfc |
Close
Hashes for pyoclk-1.1.1-cp38-cp38-manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 32aaae80630c8aaa615e99f37b8abd1b9f097d2c3ab855f6fd2f539ee2350a96 |
|
MD5 | dba55481832fa92770d3db2181262111 |
|
BLAKE2b-256 | d4950c1e69bc45a478d7ea4056b3f50550133fa25bffe1c676a1c2e6bb117fe0 |
Close
Hashes for pyoclk-1.1.1-cp37-cp37m-manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9b94c9c332010c6bde8bacc4eec190212f5cdc07dbdb466a16728bf7207b9354 |
|
MD5 | b185c965c6a149cee64857d40b0f05fd |
|
BLAKE2b-256 | d3a9a332fe8afc5f975bf4f688b13f167c5c54efdb49c10024f66536bd8d8f77 |
Close
Hashes for pyoclk-1.1.1-cp36-cp36m-manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1516c22e268e11f07dc5acd7e29b8f25a2b591bd25ec09d17b6c0226a53f7a1d |
|
MD5 | caad2d5422d8d8add8f2fd5035f5ac22 |
|
BLAKE2b-256 | a647ff78c21cf3b2e2852ce90c7b2103b9f69898fffc2639b576f8fcd0c7cfb4 |