用于调用ascendc编写的算子
Project description
1 功能描述
由于在ascendc算子开发过程中运行算子比较复杂,为了简化算子的运行,将运行算子变成可以用python直接调用的函数。所以编写了此代码。
2 安装
pip install l0n0lacl
3 运行算子实例
3.1 先切换到cann环境,比如我的环境是:
source /home/HwHiAiUser/Ascend/ascend-toolkit/set_env.sh
3.2 先安装我们编写的算子
bash custom_opp_xxx_aarch64.run
3.3 创建算子运行器
from l0n0lacl import *
ascendc_gelu = OpRunner("Gelu", op_path_prefix='customize')
3.4 调用算子
3.4.1 先看调用传参顺序
在算子工程编译后,会有代码生成,在算子工程目录:
${算子目录}/build_out/autogen/aclnn_xxx.h
中可以找到aclnnXXXGetWorkspaceSize
函数。以Gelu为例:
__attribute__((visibility("default")))
aclnnStatus aclnnGeluGetWorkspaceSize(
const aclTensor *x,
const aclTensor *out,
uint64_t *workspaceSize,
aclOpExecutor **executor);
可以看到参数为 x
, out
, workspaceSize
, executor
。其中 workspaceSize
, executor
不需要管。
aclTensor*
对应numpy.ndarray
- 其他参考: ctypes类型
3.4.2 调用算子
import torch
from l0n0lacl import *
ascendc_gelu = OpRunner("Gelu", op_path_prefix='customize')
target_dtype = torch.float
x = torch.empty(shape, dtype=target_dtype).uniform_(-1, 1)
y = torch.empty(shape, dtype=target_dtype).zero_()
out = ascendc_gelu(x.numpy(), y.numpy()).to_cpu()
print(out)
4. api参考
4.1 AclNDTensor
class AclNDTensor:
def __init__(self, np_array: np.ndarray):
pass
def to_cpu(self):
pass
numpy ndarray与ascend nd tensor间的桥梁
4.1.1 __init__
np_array
: numpy的tensor
4.1.2 to_cpu
将运算结果从npu拷贝到cpu
4.2 OpRunner
class OpRunner:
def __init__(self, name, op_path_prefix='customize', op_path=None, device_id=0) -> None:
pass
def __call__(self, *args, outCout=1, argtypes=None, stream=None) -> Union[AclNDTensor, List[AclNDTensor]]:
pass
def sync_stream(self)->None:
pass
4.2.1 __init__
name
:算子名称,op_path_prefix
: 算子工程中CMakePresets.json文件中vender_name的值。默认是customize
,可以不传
"vendor_name": {
"type": "STRING",
"value": "customize"
},
op_path
: 算子libcust_opapi.so
库的绝对位置。不传。device_id
: 设备ID。默认0
4.2.2 __call__
args
: 表示传给aclnnXXXGetWorkspaceSize
除了workspaceSize
,executor
的参数outCout
: 表示算子的输出个数。如果输出个数为1
,返回一个AclNDTensor
。如果输出个数大于1,返回List[AclNDTensor]
argtypes
: 表示aclnnXXXGetWorkspaceSize
的参数ctypes
参数类型,对于特别复杂的算子,如果发现调用异常,可以手动指定类型。 比如(仅用于举例,其实可以不传,自动推导就可运行。但是当发现运行异常的情况下,可以自己指定),对于:
__attribute__((visibility("default")))
aclnnStatus aclnnCumsumGetWorkspaceSize(
const aclTensor *x,
const aclTensor *axis,
bool exclusiveOptional,
bool reverseOptional,
const aclTensor *out,
uint64_t *workspaceSize,
aclOpExecutor **executor);
import ctypes
from l0n0lacl import *
ascendc_cumsum = OpRunner("Cumsum")
target_dtype = np.float32
data_range = (-10, 10)
shape = [100, 3, 2304]
axis_py = 1
exclusive = True
reverse = False
x = np.random.uniform(*data_range, shape).astype(target_dtype)
axis = np.array([axis_py]).astype(np.int32)
golden: np.ndarray = tf.cumsum(x, axis_py, exclusive, reverse, argtypes=[
ctypes.c_void_p, # x
ctypes.c_void_p, # axis
ctypes.c_bool, # exclusiveOptional
ctypes.c_bool, # reverseOptional
ctypes.c_void_p, # out
ctypes.c_void_p, # workspaceSize
ctypes.c_void_p, # executor
]).numpy()
y = np.ones_like(golden, golden.dtype) * 123
ascendc_cumsum(x, axis, exclusive, reverse, y).to_cpu()
print(y)
stream
如果是多stream的情况下,可以自己指定stream: 例如:
import numpy as np
from l0n0lacl import *
ascendc_gelu = OpRunner("Gelu", op_path_prefix='customize')
target_dtype = np.float32
shape = [10, 10]
x = np.random.uniform(-10, 10, shape).astype(target_dtype)
y = np.zeros_like(x, dtype=target_dtype)
with AclStream(0) as stream:
out = ascendc_gelu(x, y, stream=stream).to_cpu()
print(out)
4.2.3 sync_stream
用于同步stream
4.3 verify_result
def verify_result(real_result:numpy.ndarray, golden:numpy.ndarray):
pass
判断精度是否符合 float16: 千分之一 float32: 万分之一 int16,int32,int8: 0
4.4 AclArray
class AclArray:
def __init__(self, np_array: np.ndarray):
pass
实例:
__attribute__((visibility("default")))
aclnnStatus aclnnEyeGetWorkspaceSize(
aclTensor *yRef,
int64_t numRows,
int64_t numColumnsOptional,
const aclIntArray *batchShapeOptional,
int64_t dtypeOptional,
uint64_t *workspaceSize,
aclOpExecutor **executor);
import tensorflow as tf
from l0n0lacl import *
ascendc_fn = OpRunner("Eye")
for i, target_dtype in enumerate([np.float16, np.float32]):
numRows = 2
numColumnsOptional = 3
batchShapeOptional = 0
dtypeOptional = 0
shape = [numRows * numColumnsOptional]
for value_range in [(-1, 1), (1, 10), (-1000, 1000)]:
y = np.zeros(shape, dtype=target_dtype)
batchShape = AclArray(np.array([1, 2, 3], dtype=np.int64))
output = ascendc_fn(y, numRows, numColumnsOptional, batchShape, 0, outCout=5)
output[0].to_cpu()
golden = tf.eye(numRows, numColumnsOptional)
print(y)
print(golden)
print(value_range)
verify_result(y, golden.numpy().reshape(shape))
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
l0n0lacl-1.0.1.tar.gz
(14.0 kB
view details)
Built Distribution
l0n0lacl-1.0.1-py3-none-any.whl
(10.8 kB
view details)
File details
Details for the file l0n0lacl-1.0.1.tar.gz
.
File metadata
- Download URL: l0n0lacl-1.0.1.tar.gz
- Upload date:
- Size: 14.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 886e4aebb2026db1c4254cb318ad7a7833a97f2d59c7ac36105aa00aa27d032d |
|
MD5 | ce29015a5dfd8a6c7cd48c7d01e0d89a |
|
BLAKE2b-256 | 2ba65a31d88f7d2974614af99a87427183954ef9ebbf9670f9677681e3710b13 |
File details
Details for the file l0n0lacl-1.0.1-py3-none-any.whl
.
File metadata
- Download URL: l0n0lacl-1.0.1-py3-none-any.whl
- Upload date:
- Size: 10.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8a58bcc0dbdacd80684c6154ea039b0c2f464d418d7e3207b344132ed5361c14 |
|
MD5 | d8f0692ed32abc9a68901d998f26b50b |
|
BLAKE2b-256 | 7ecb5b7e934e2144c30c1c6e73a03cf5688f3b3dc6ab62cd37cf29c03dd4150d |