用于调用ascendc编写的算子

These details have not been verified by PyPI

Project description

1 功能描述

由于在ascendc算子开发过程中运行算子比较复杂，为了简化算子的运行，将运行算子变成可以用python直接调用的函数。所以编写了此代码。

2 安装

pip install l0n0lacl

3 运行算子实例

3.1 先切换到cann环境,比如我的环境是:

source /home/HwHiAiUser/Ascend/ascend-toolkit/set_env.sh

3.2 先安装我们编写的算子

bash custom_opp_xxx_aarch64.run

3.3 创建算子运行器

from l0n0lacl import *
ascendc_gelu = OpRunner("Gelu", op_path_prefix='customize')

3.4 调用算子

3.4.1 先看调用传参顺序

在算子工程编译后，会有代码生成，在算子工程目录: ${算子目录}/build_out/autogen/aclnn_xxx.h中可以找到aclnnXXXGetWorkspaceSize函数。以Gelu为例：

__attribute__((visibility("default")))
aclnnStatus aclnnGeluGetWorkspaceSize(
    const aclTensor *x,
    const aclTensor *out,
    uint64_t *workspaceSize,
    aclOpExecutor **executor);

可以看到参数为 x, out, workspaceSize, executor。其中 workspaceSize, executor不需要管。

aclTensor*对应numpy.ndarray
其他参考: ctypes类型

3.4.2 调用算子

import torch
from l0n0lacl import *
ascendc_gelu = OpRunner("Gelu", op_path_prefix='customize')
target_dtype = torch.float
x = torch.empty(shape, dtype=target_dtype).uniform_(-1, 1)
y = torch.empty(shape, dtype=target_dtype).zero_()
out = ascendc_gelu(x.numpy(), y.numpy()).to_cpu()
print(out)

4. api参考

4.1 AclNDTensor

class AclNDTensor:
    def __init__(self, np_array: np.ndarray):
        pass
    def to_cpu(self):
        pass

numpy ndarray与ascend nd tensor间的桥梁

4.1.1 `init`

np_array: numpy的tensor

4.1.2 `to_cpu`

将运算结果从npu拷贝到cpu

4.2 OpRunner

class OpRunner:
    def __init__(self, name, op_path_prefix='customize', op_path=None, device_id=0) -> None:
        pass
    def __call__(self, *args, outCout=1, argtypes=None, stream=None) -> Union[AclNDTensor, List[AclNDTensor]]:
        pass
    def sync_stream(self)->None:
        pass

4.2.1 `init`

name:算子名称，
op_path_prefix: 算子工程中CMakePresets.json文件中vender_name的值。默认是customize,可以不传

"vendor_name": {
    "type": "STRING",
    "value": "customize"
},

op_path: 算子libcust_opapi.so库的绝对位置。不传。
device_id: 设备ID。默认0

4.2.2 `call`

args: 表示传给aclnnXXXGetWorkspaceSize除了workspaceSize, executor的参数
outCout : 表示算子的输出个数。如果输出个数为1,返回一个AclNDTensor。如果输出个数大于1,返回List[AclNDTensor]
argtypes: 表示aclnnXXXGetWorkspaceSize的参数ctypes参数类型，对于特别复杂的算子，如果发现调用异常，可以手动指定类型。比如(仅用于举例，其实可以不传，自动推导就可运行。但是当发现运行异常的情况下，可以自己指定)，对于:

__attribute__((visibility("default")))
aclnnStatus aclnnCumsumGetWorkspaceSize(
    const aclTensor *x,
    const aclTensor *axis,
    bool exclusiveOptional,
    bool reverseOptional,
    const aclTensor *out,
    uint64_t *workspaceSize,
    aclOpExecutor **executor);

import ctypes
from l0n0lacl import *
ascendc_cumsum = OpRunner("Cumsum")
target_dtype = np.float32
data_range = (-10, 10)
shape = [100, 3, 2304]
axis_py = 1
exclusive = True
reverse = False
x = np.random.uniform(*data_range, shape).astype(target_dtype)
axis = np.array([axis_py]).astype(np.int32)
golden: np.ndarray = tf.cumsum(x, axis_py, exclusive, reverse, argtypes=[
    ctypes.c_void_p, # x
    ctypes.c_void_p, # axis
    ctypes.c_bool,   # exclusiveOptional
    ctypes.c_bool,   # reverseOptional
    ctypes.c_void_p, # out
    ctypes.c_void_p, # workspaceSize
    ctypes.c_void_p, # executor
]).numpy()
y = np.ones_like(golden, golden.dtype) * 123
ascendc_cumsum(x, axis, exclusive, reverse,  y).to_cpu()
print(y)

stream 如果是多stream的情况下，可以自己指定stream: 例如:

import numpy as np
from l0n0lacl import *
ascendc_gelu = OpRunner("Gelu", op_path_prefix='customize')
target_dtype = np.float32
shape = [10, 10]
x = np.random.uniform(-10, 10, shape).astype(target_dtype)
y = np.zeros_like(x, dtype=target_dtype)
with AclStream(0) as stream:
    out = ascendc_gelu(x, y, stream=stream).to_cpu()
print(out)

4.2.3 `sync_stream`

用于同步stream

4.3 verify_result

参考自：https://gitee.com/ascend/samples/blob/master/operator/AddCustomSample/KernelLaunch/AddKernelInvocationNeo/scripts/verify_result.py

def verify_result(real_result:numpy.ndarray, golden:numpy.ndarray):
    pass

判断精度是否符合 float16: 千分之一 float32: 万分之一 int16,int32,int8: 0

4.4 AclArray

class AclArray:
    def __init__(self, np_array: np.ndarray):
        pass

实例：

__attribute__((visibility("default")))
aclnnStatus aclnnEyeGetWorkspaceSize(
    aclTensor *yRef,
    int64_t numRows,
    int64_t numColumnsOptional,
    const aclIntArray *batchShapeOptional,
    int64_t dtypeOptional,
    uint64_t *workspaceSize,
    aclOpExecutor **executor);

import tensorflow as tf
from l0n0lacl import *
ascendc_fn = OpRunner("Eye")
for i, target_dtype in enumerate([np.float16, np.float32]):
    numRows = 2
    numColumnsOptional = 3
    batchShapeOptional = 0
    dtypeOptional = 0
    shape = [numRows * numColumnsOptional]
    for value_range in [(-1, 1), (1, 10), (-1000, 1000)]:
        y = np.zeros(shape, dtype=target_dtype)
        batchShape = AclArray(np.array([1, 2, 3], dtype=np.int64))
        output = ascendc_fn(y, numRows, numColumnsOptional, batchShape, 0, outCout=5)
        output[0].to_cpu()
        golden = tf.eye(numRows, numColumnsOptional)
        print(y)
        print(golden)
        print(value_range)
        verify_result(y, golden.numpy().reshape(shape))

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

2.2.0

Apr 14, 2026

2.1.0

Aug 9, 2025

2.0.14

Jul 26, 2025

2.0.13

Jun 24, 2025

2.0.12

Jun 23, 2025

2.0.11

Jun 23, 2025

2.0.9

Jun 23, 2025

2.0.8

Jun 23, 2025

2.0.6

Jun 23, 2025

2.0.5

Jun 22, 2025

2.0.4

Jun 22, 2025

2.0.3

Jun 22, 2025

2.0.2

Jun 22, 2025

2.0.1

Jun 22, 2025

2.0.0

Jun 21, 2025

1.0.5

Dec 24, 2024

This version

1.0.4

Dec 24, 2024

1.0.3

Nov 16, 2024

1.0.2

Oct 11, 2024

1.0.1

Oct 8, 2024

1.0.0

Sep 6, 2024

0.0.3

Sep 6, 2024

0.0.2

Sep 6, 2024

0.0.1

Sep 5, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

l0n0lacl-1.0.4.tar.gz (14.6 kB view details)

Uploaded Dec 24, 2024 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

l0n0lacl-1.0.4-py3-none-any.whl (11.7 kB view details)

Uploaded Dec 24, 2024 Python 3

File details

Details for the file l0n0lacl-1.0.4.tar.gz.

File metadata

Download URL: l0n0lacl-1.0.4.tar.gz
Upload date: Dec 24, 2024
Size: 14.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.1 CPython/3.9.17

File hashes

Hashes for l0n0lacl-1.0.4.tar.gz
Algorithm	Hash digest
SHA256	`603f21a342a39f906105aaf0e6cb4e454167da01d71526e450a4a8d118ef150b`
MD5	`08a3a1159e101588eb44513536498c38`
BLAKE2b-256	`d7a2435758ee588226609f5bf63fdae929551003a51ef3efae432ecb36a87080`

See more details on using hashes here.

File details

Details for the file l0n0lacl-1.0.4-py3-none-any.whl.

File metadata

Download URL: l0n0lacl-1.0.4-py3-none-any.whl
Upload date: Dec 24, 2024
Size: 11.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.1 CPython/3.9.17

File hashes

Hashes for l0n0lacl-1.0.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`1faa4c843390ad601e7ea58b6475cee57d30683170179cbdf7f703653f3c6d6d`
MD5	`0199727dcffdd1ebacf7f7b0431e86e0`
BLAKE2b-256	`28f9821c78625d720a0cdff1e44fd37eef7a3f88ae43ce3757a1fc8a548ad8b5`

See more details on using hashes here.

l0n0lacl 1.0.4

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

1 功能描述

2 安装

3 运行算子实例

3.1 先切换到cann环境,比如我的环境是:

3.2 先安装我们编写的算子

3.3 创建算子运行器

3.4 调用算子

3.4.1 先看调用传参顺序

3.4.2 调用算子

4. api参考

4.1 AclNDTensor

4.1.1 __init__

4.1.2 to_cpu

4.2 OpRunner

4.2.1 __init__

4.2.2 __call__

4.2.3 sync_stream

4.3 verify_result

4.4 AclArray

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

4.1.1 `init`

4.1.2 `to_cpu`

4.2.1 `init`

4.2.2 `call`

4.2.3 `sync_stream`