用于调用ascendc编写的算子
Project description
1 功能描述
由于在ascendc算子开发过程中运行算子比较复杂,为了简化算子的运行,将运行算子变成可以用python直接调用的函数。所以编写了此代码。
2 安装
pip install l0n0lacl
3 运行算子实例
3.1 先切换到cann环境,比如我的环境是:
source /home/HwHiAiUser/Ascend/ascend-toolkit/set_env.sh
3.2 先安装我们编写的算子
bash custom_opp_xxx_aarch64.run
3.3 创建算子运行器
from l0n0lacl import *
ascendc_gelu = OpRunner("Gelu", op_path_prefix='customize')
3.4 调用算子
3.4.1 先看调用传参顺序
在算子工程编译后,会有代码生成,在算子工程目录:
${算子目录}/build_out/autogen/aclnn_xxx.h
中可以找到aclnnXXXGetWorkspaceSize
函数。以Gelu为例:
__attribute__((visibility("default")))
aclnnStatus aclnnGeluGetWorkspaceSize(
const aclTensor *x,
const aclTensor *out,
uint64_t *workspaceSize,
aclOpExecutor **executor);
可以看到参数为 x
, out
, workspaceSize
, executor
。其中 workspaceSize
, executor
不需要管。
aclTensor*
对应numpy.ndarray
- 其他参考: ctypes类型
3.4.2 调用算子
import torch
from l0n0lacl import *
ascendc_gelu = OpRunner("Gelu", op_path_prefix='customize')
target_dtype = torch.float
x = torch.empty(shape, dtype=target_dtype).uniform_(-1, 1)
y = torch.empty(shape, dtype=target_dtype).zero_()
out = ascendc_gelu(x.numpy(), y.numpy()).to_cpu()
print(out)
4. api参考
4.1 AclNDTensor
class AclNDTensor:
def __init__(self, np_array: np.ndarray):
pass
def to_cpu(self):
pass
numpy ndarray与ascend nd tensor间的桥梁
4.1.1 __init__
np_array
: numpy的tensor
4.1.2 to_cpu
将运算结果从npu拷贝到cpu
4.2 OpRunner
class OpRunner:
def __init__(self, name, op_path_prefix='customize', op_path=None, device_id=0) -> None:
pass
def __call__(self, *args, outCout=1, argtypes=None, stream=None) -> Union[AclNDTensor, List[AclNDTensor]]:
pass
def sync_stream(self)->None:
pass
4.2.1 __init__
name
:算子名称,op_path_prefix
: 算子工程中CMakePresets.json文件中vender_name的值。默认是customize
,可以不传
"vendor_name": {
"type": "STRING",
"value": "customize"
},
op_path
: 算子libcust_opapi.so
库的绝对位置。不传。device_id
: 设备ID。默认0
4.2.2 __call__
args
: 表示传给aclnnXXXGetWorkspaceSize
除了workspaceSize
,executor
的参数outCout
: 表示算子的输出个数。如果出处个数为1
,返回一个AclNDTensor
。如果输出个数大于1,返回List[AclNDTensor]
argtypes
: 表示aclnnXXXGetWorkspaceSize
的参数ctypes
参数类型,对于特别复杂的算子,如果发现调用异常,可以手动指定类型。 比如(仅用于举例,其实可以不传,自动推导就可运行。但是当发现运行异常的情况下,可以自己指定),对于:
__attribute__((visibility("default")))
aclnnStatus aclnnCumsumGetWorkspaceSize(
const aclTensor *x,
const aclTensor *axis,
bool exclusiveOptional,
bool reverseOptional,
const aclTensor *out,
uint64_t *workspaceSize,
aclOpExecutor **executor);
import ctypes
from l0n0lacl import *
ascendc_cumsum = OpRunner("Cumsum")
target_dtype = np.float32
data_range = (-10, 10)
shape = [100, 3, 2304]
axis_py = 1
exclusive = True
reverse = False
x = np.random.uniform(*data_range, shape).astype(target_dtype)
axis = np.array([axis_py]).astype(np.int32)
golden: np.ndarray = tf.cumsum(x, axis_py, exclusive, reverse, argtypes=[
ctypes.c_void_p, # x
ctypes.c_void_p, # axis
ctypes.c_bool, # exclusiveOptional
ctypes.c_bool, # reverseOptional
ctypes.c_void_p, # out
ctypes.c_void_p, # workspaceSize
ctypes.c_void_p, # executor
]).numpy()
y = np.ones_like(golden, golden.dtype) * 123
ascendc_cumsum(x, axis, exclusive, reverse, y).to_cpu()
print(y)
stream
如果是多stream的情况下,可以自己指定stream: 例如:
import ctypes
import tensorflow as tf
from l0n0lacl import *
ascendc_cumsum = OpRunner("Cumsum")
target_dtype = np.float32
data_range = (-10, 10)
shape = [100, 3, 2304]
axis_py = 1
exclusive = True
reverse = False
x = np.random.uniform(*data_range, shape).astype(target_dtype)
axis = np.array([axis_py]).astype(np.int32)
golden: np.ndarray = tf.cumsum(x, axis_py, exclusive, reverse).numpy()
y = np.ones_like(golden, golden.dtype) * 123
ascendc_cumsum(x, axis, exclusive, reverse, y, argtypes=[
ctypes.c_void_p, # x
ctypes.c_void_p, # axis
ctypes.c_bool, # exclusiveOptional
ctypes.c_bool, # reverseOptional
ctypes.c_void_p, # out
ctypes.c_void_p, # workspaceSize
ctypes.c_void_p, # executor
]).to_cpu()
verify_result(y, golden)
print(y)
4.2.3 sync_stream
用于同步stream
4.3 verify_result
def verify_result(real_result:numpy.ndarray, golden:numpy.ndarray):
pass
判断精度是否符合 float16: 千分之一 float32: 万分之一 int16,int32,int8: 0
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
l0n0lacl-0.0.1.tar.gz
(12.0 kB
view details)
Built Distribution
File details
Details for the file l0n0lacl-0.0.1.tar.gz
.
File metadata
- Download URL: l0n0lacl-0.0.1.tar.gz
- Upload date:
- Size: 12.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 601e48aa15a6fb9dfdaf0e55680680e167d8c3992445e44350b65e724b4eea22 |
|
MD5 | 80bf280f9293fddbb72e04f7715d43e4 |
|
BLAKE2b-256 | b8b1f427f509dc590e276feb9a5d18d7b43f3b1e7df344dd89ea79398cd56fdc |
File details
Details for the file l0n0lacl-0.0.1-py3-none-any.whl
.
File metadata
- Download URL: l0n0lacl-0.0.1-py3-none-any.whl
- Upload date:
- Size: 8.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 30d1085ac48b00c8882e85034b057ac4a0981d334228e9841af03c70d646c5b6 |
|
MD5 | 8527adaf5fd0cb3f7d716d8333f37ab9 |
|
BLAKE2b-256 | e7a7f11a77a0390c560cdba6c63040367c6b45aeb328b96d547d3b1f48b0504f |