用于cpu调试ascendc编写的算子

These details have not been verified by PyPI

Project description

1 功能描述

由于在ascendc算子开发过程中运行算子比较复杂，为了简化算子的运行，将运行算子变成可以用python直接调用的函数。所以编写了此代码。

2 安装

pip install l0n0lacltester

3 运行算子实例

3.1 先切换到cann环境,比如我的环境是:

source /home/HwHiAiUser/Ascend/ascend-toolkit/set_env.sh

4 创建测试用例工程

4.1 命令行参数

l0n0lacltester -h
usage: l0n0lacltester [-h] op_path test_path

创建测试工程

positional arguments:
  op_path     ascendc算子目录
  test_path   测试工程目录

optional arguments:
  -h, --help  show this help message and exit

4.2 举例

l0n0lacltester 算子目录 测试工程目录

4.2.1 工程结构:

cmake
    - cpu_lib.cmake
    - npu_lib.cmake
include
    - *.h
.gitignore
CMakeLists.txt
gen_code.py
run.py
run.sh    
tiling_context.cpp

上面需要关注的只有gen_code.py 与 run.py

4.2.2 算子工程设置

4.2.2.1 设置tiling namespace

默认情况下的optiling如下

namespace optiling {
static ge::graphStatus TilingFunc(gert::TilingContext *context) {
  return ge::GRAPH_SUCCESS;
}
}

由于本工具是用python的re模块正则表达式匹配的，所以需要在在namespace末尾添加 // namespace optiling

namespace optiling {
static ge::graphStatus TilingFunc(gert::TilingContext *context) {
  return ge::GRAPH_SUCCESS;
}
} // namespace optiling

4.3 gen_code.py

import os
from l0n0lacltester.gen_cpu_call_code import generate_all_codes
current_path = os.path.split(__file__)[0]
# 比如['/mnt/code/a','/mnt/code/b']
include_dirs=[
   
]
# enum class KernelMode {
#     MIX_MODE = 0, # 融合模式，启用向量运算单元(aiv)与矩阵运算单元(aic)。一个block一个aic, n个aiv(n >= 1)。
#     AIC_MODE, # 仅使用矩阵运算单元(aic)。一个block仅包含一个aic。
#     AIV_MODE, # 仅使用向量运算单元(aiv)。一个block仅包含一个aiv。
#     MIX_AIC_1_1, # 矩阵运算单元(aic)与向量运算单元(aiv) 1:1合并为一个block。 一个block包含一个aiv一个aic。
# };
CPU_KERNEL_MODE='KernelMode::AIV_MODE'
generate_all_codes(f'算子目录绝对地址', '.', include_dirs, CPU_KERNEL_MODE)

需要关注的是include_dirs 与generate_all_codes的第一个参数

4.3.1 include_dirs

如果算子工程使用了算子工程目录之外的.h文件。则需要将该include目录绝对地址写到include_dirs中

比如

include_dirs=[
 '/mnt/code/a',
 '/mnt/code/b'  
]

4.3.2 CPU_KERNEL_MODE

仅在cpu模式下情况下起效可选项有

enum class KernelMode {
    MIX_MODE = 0, # 融合模式，启用向量运算单元(aiv)与矩阵运算单元(aic)。一个block一个aic, n个aiv(n >= 1)。
    AIC_MODE, # 仅使用矩阵运算单元(aic)。一个block仅包含一个aic。
    AIV_MODE, # 仅使用向量运算单元(aiv)。一个block仅包含一个aiv。
    MIX_AIC_1_1, # 矩阵运算单元(aic)与向量运算单元(aiv) 1:1合并为一个block。 一个block包含一个aiv一个aic。
};

4.3.3 generate_all_codes

generate_all_codes用于生成cpu|sim运行模式所需要的代码。

generate_all_codes的第一个参数是算子工程的绝对地址

比如

generate_all_codes(f'算子目录绝对地址', '.', include_dirs, CPU_KERNEL_MODE)

current_path表示gen_code.py所在的目录

4.4 run.py

初始情况下

import sys
import numpy as np
import l0n0lacltester as tester
from op_args import AscendCOpArgs
if sys.argv[1] == 'cpu' or sys.argv[1] == 'sim':
    from op_cpu import AscendCOp
else:
    from op_npu import AscendCOp

4.4.1 `AscendCOp` 可以用于调用算子

b = 8
c = 32
ignore_index = -100
reduction='sum'
x_shape = [b, c]
target_shape = [b]
weight_shape = [c]
input_x = np.random.uniform(-5, 5, x_shape).astype(np.float32)
input_target = np.random.uniform(0, 31, target_shape).astype(np.int32)
input_weight = np.random.uniform(0, 1, weight_shape).astype(np.float32)
y = np.random.uniform(0, 1, [1]).astype(np.float32)
op = AscendCOp(reduction, ignore_index)
op(input_x, input_target, input_weight, y)
print('y = ', y)

4.4.2 `AscendCOpArgs` 用于保存参数,并且可以用于调用`AscendCOp`

基本范式为:

# 创建测试用例
args = AscendCOpArgs(‘保存文件.json’)
# 尝试读取 '保存文件.json'
if not args.try_load():
  # 生成测试数据
  pass
# 调用算子
args.run_op(AscendCOp)
# 检测精度
if 精度检测通过:
  tester.print_green("成功")
  # 移除存储的测试数据
  args.remove_record()
else:
  tester.print_red("失败")
  # 将测试数据保存到 '保存文件.json'
  args.save()

举例

import sys
import torch
import numpy as np
import l0n0lacltester as tester
from op_args import AscendCOpArgs
if sys.argv[1] == 'cpu' or sys.argv[1] == 'sim':
    from op_cpu import AscendCOp
else:
    from op_npu import AscendCOp
args = AscendCOpArgs(name)
if not args.try_load():
    input_x = np.random.uniform(-5, 5, x_shape)
    golden = 标杆算子(input_x)
    args.set_x(input_x)
    args.set_golden(golden)
args.run_op(AscendCOp)
output = torch.tensor(args.get_y())
golden = torch.tensor(args.get_golden())
if torch.allclose(output, golden, 1e-4, 1e-4):
    tester.print_green('成功')
    args.remove_record()
else:
    tester.print_red("失败")
    args.save()

4.3 关于COMMON_TILING宏定义

COMMON_TILING宏定义是用于在tiling结构定义时，复用某些结构用的

范式为:

#define COMMON_TILING_XXX(arg) \
  ...
// END COMMON_TILING_XXX

注意 // END COMMON_TILING_XXX 是必须的。用于正则表达式匹配

比如我有一个关于tiling的宏定义如下

#define COMMON_TILING_FILED_DEF(prefix)                                        \
  TILING_DATA_FIELD_DEF(int64_t, prefix##TileLength);                          \
  TILING_DATA_FIELD_DEF(int64_t, prefix##FormerNum);                           \
  TILING_DATA_FIELD_DEF(int64_t, prefix##FormerLength);                        \
  TILING_DATA_FIELD_DEF(int64_t, prefix##FormerFinalCalcCount);                \
  TILING_DATA_FIELD_DEF(int64_t, prefix##TailLength);                          \
  TILING_DATA_FIELD_DEF(int64_t, prefix##TailFinalCalcCount);                  \
  TILING_DATA_FIELD_DEF(int64_t, prefix##FinalKernelFinalCalcCount);           \
  TILING_DATA_FIELD_DEF(int64_t, prefix##KernelCount);
// END COMMON_TILING_FILED_DEF

tiling.h就可以使用它了

#include "register/tilingdata_base.h"
#include "tiling_defines.h"
namespace optiling {
BEGIN_TILING_DATA_DEF(NLLLossTilingData)
  TILING_DATA_FIELD_DEF(uint64_t, b);
  TILING_DATA_FIELD_DEF(uint64_t, c);
  TILING_DATA_FIELD_DEF(uint64_t, d);
  TILING_DATA_FIELD_DEF(int64_t, reduction);
  TILING_DATA_FIELD_DEF(int64_t, ignore_index);
  COMMON_TILING_FILED_DEF(b); 
  TILING_DATA_FIELD_DEF(int32_t, dimFlag);
END_TILING_DATA_DEF;

REGISTER_TILING_DATA_CLASS(NLLLoss, NLLLossTilingData)
} // namespace optiling

5 运行

# bash run.sh -h
run.sh [option]
-v 芯片型号(默认Ascend910B1)
-r 运行模式(cpu[默认]|sim|npu)
-n 对于cpu|sim模式不重新编译代码
-h 显示此帮助

实例

默认为 Ascend910B1 cpu 模式

bash run.sh

bash run.sh -v Ascend910B1 -r cpu

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

1.0.15

Mar 13, 2025

1.0.14

Mar 13, 2025

1.0.13

Mar 7, 2025

1.0.11

Dec 13, 2024

1.0.9

Dec 13, 2024

1.0.8

Dec 11, 2024

1.0.7

Dec 9, 2024

1.0.6

Dec 8, 2024

1.0.5

Dec 8, 2024

1.0.4

Dec 8, 2024

1.0.3

Dec 7, 2024

1.0.2

Nov 30, 2024

1.0.1

Nov 30, 2024

1.0.0

Nov 30, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

l0n0lacltester-1.0.15.tar.gz (31.6 kB view details)

Uploaded Mar 13, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

l0n0lacltester-1.0.15-py3-none-any.whl (32.9 kB view details)

Uploaded Mar 13, 2025 Python 3

File details

Details for the file l0n0lacltester-1.0.15.tar.gz.

File metadata

Download URL: l0n0lacltester-1.0.15.tar.gz
Upload date: Mar 13, 2025
Size: 31.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.9.21

File hashes

Hashes for l0n0lacltester-1.0.15.tar.gz
Algorithm	Hash digest
SHA256	`844ab23ca1668aefa7446e7bbceb287c03b709bd365f229c0de6ea5c3b276562`
MD5	`32457040f8d399987761fd19b7e77f47`
BLAKE2b-256	`32a26acd1e00e999eea80c6e0fcf9a00cbc6874924b9f720eca336d3178acbf9`

See more details on using hashes here.

File details

Details for the file l0n0lacltester-1.0.15-py3-none-any.whl.

File metadata

Download URL: l0n0lacltester-1.0.15-py3-none-any.whl
Upload date: Mar 13, 2025
Size: 32.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.9.21

File hashes

Hashes for l0n0lacltester-1.0.15-py3-none-any.whl
Algorithm	Hash digest
SHA256	`cc338b87a4adceeed9958e14324c462e78ae91e91d51167c1caa626257279717`
MD5	`3024d16079c80531d3a0a4ddf5321e84`
BLAKE2b-256	`1f6ef0c1ac350c1453a107e833fbb0f625ab7d0fc4321306de5906d594f543be`

See more details on using hashes here.

l0n0lacltester 1.0.15

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

1 功能描述

2 安装

3 运行算子实例

3.1 先切换到cann环境,比如我的环境是:

4 创建测试用例工程

4.1 命令行参数

4.2 举例

4.2.1 工程结构:

4.2.2 算子工程设置

4.2.2.1 设置tiling namespace

4.3 gen_code.py

4.3.1 include_dirs

4.3.2 CPU_KERNEL_MODE

4.3.3 generate_all_codes

4.4 run.py

4.4.1 AscendCOp 可以用于调用算子

4.4.2 AscendCOpArgs 用于保存参数,并且可以用于调用AscendCOp

4.3 关于COMMON_TILING宏定义

5 运行

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

4.4.1 `AscendCOp` 可以用于调用算子

4.4.2 `AscendCOpArgs` 用于保存参数,并且可以用于调用`AscendCOp`