Skip to main content

mock cuda runtime api

Project description

cuda-rt-hook(cuda_mock)

PyPI Version

cuda-rt-hook(cuda_mock)是一个用于拦截CUDA/XPU Runtime接口(例如,cudaMallocxpu_malloc)调用的Python库,通过修改PLT(Procedure Linkage Table)来实现动态拦截,无需重新编译PyTorch、Paddle等复杂框架,安装后即可使用,在调用堆栈追踪、调用耗时统计以及Paddle/PyTorch训练和推理的精度调试和性能优化等场景下非常有用。

本项目的灵感来自于plthook项目,项目的初衷是通过拦截CUDA的Runtime调用转为调用mock函数,可以在没有CUDA和GPU环境的情况下运行和调试triton等项目,因而项目取名cuda_mock。后续增加了多个功能,使得cuda_mock项目可以用于模型的调试和性能分析。

安装

直接安装(建议)

pip install cuda_mock

从源码构建

git clone --recursive https://github.com/lipracer/cuda-rt-hook
cd cuda-rt-hook

python setup.py sdist bdist_wheel
pip install dist/*.whl

# 或者:
# python setup.py install

快速开始

找到Paddle/PyTorch模型的训练/推理脚本入口,在首次import torch/import paddle之后添加如下代码:

import paddle
import cuda_mock; cuda_mock.xpu_initialize() # 加入这一行

或者

import torch
import cuda_mock; cuda_mock.xpu_initialize() # 加入这一行

根据实际的需求和场景设置cuda_mock的功能环境变量(参考功能使用演示章节),接着按照训练/推理脚本原有的执行方式运行脚本即可。

功能使用演示

功能1: 统计各个so库调用Runtime接口的次数和总耗时

LOG_LEVEL=WARN python run.py

在程序运行结束之后会显示:

runtime_api_counts

功能2: 打印xpu_wait的C++、C和Python调用堆栈

HOOK_ENABLE_TRACE="xpu_wait=1" python run.py

在程序运行结束之后会显示:

backtrace

功能3: 统计模型训练/推理过程中的峰值内存

LOG_LEVEL=WARN python run.py

在程序运行结束之后会显示:

memory_peaks

功能4:显示每次内存分配的信息

LOG_LEVEL=MEMORY=INFO python run.py

在程序运行过程中会显示:

memory_allocation

功能5: 打印Runtime接口的耗时

LOG_SYNC_MODE=1 LOG_LEVEL=PROFILE=INFO python run.py

在程序运行过程中会显示:

time_statistic

功能6:打印Runtime的参数

HOOK_ENABLE_TRACE=xpu_malloc=0b10 python run.py
HOOK_ENABLE_TRACE=xpu_malloc=0x2 python run.py

在程序运行过程中会显示:

print_args

功能7: 收集CUDA算子调用堆栈

  • 找到nvcc安装路径 which nvcc
  • 用我们的nvcc替换系统的nvcc(我们只是在编译选项加了-g
    mv /usr/local/bin/nvcc /usr/local/bin/nvcc_b
    chmod 777 tools/nvcc
    cp tools/nvcc /usr/local/bin/nvcc
  • 构建并且安装pytorch
  • 构建并且安装cuda_mock
  • 注意要在import torch之后import cuda_mock
  • 开始跑你的训练脚本
  • 我们将会把堆栈打印到控制台

环境变量

环境变量 默认值 简短说明
LOG_LEVEL WARN 设置全局和各个日志模块的日志级别
HOOK_ENABLE_TRACE 全部接口默认值为0(关闭backtrace) 是否开启backtrace或参数打印
LOG_OUTPUT_PATH "" 是否将日志重定向到文件
LOG_SYNC_MODE 0 是否使用同步日志输出

LOG_LEVEL

  • 用法示例: export LOG_LEVEL=WARN,TRACE=INFO
  • 可选值:
    • 日志级别: INFO, WARN, ERROR, FATAL
    • 日志模块: PROFILE, TRACE, HOOK, PYTHON, MEMORY
  • 默认值:
    • 全局日志级别: WARN
    • 各个日志模块的默认日志级别: WARN
  • 说明: 设置全局和各个日志模块的日志级别

HOOK_ENABLE_TRACE

  • 用法示例: export HOOK_ENABLE_TRACE='xpu_memcpy=1,xpu_set_device=0,xpu_wait=0x1'
  • 可选值: xpu_malloc, xpu_free, xpu_wait, xpu_memcpy, xpu_set_device, xpu_current_device, xpu_launch_async
  • 默认值: 所有接口的默认值均为0,即所有接口默认关闭backtrace
  • 说明: 是否开启backtrace和参数打印

HOOK_ENABLE_TRACE可接收十进制、二进制和十六进制的数字,不同的位作为不同的开关

Bit 开关说明
0 是否开启backtrace
1 是否开启参数打印

LOG_OUTPUT_PATH

  • 用法示例: export LOG_OUTPUT_PATH='/tmp/'
  • 可选值: 日志输出文件夹
  • 默认值: ""
  • 说明: 是否将日志重定向到文件, 默认是输出到标准输出

LOG_SYNC_MODE

  • 用法示例: export LOG_SYNC_MODE=1
  • 可选值: 0 或 1
  • 默认值: 0
  • 说明: 是否使用同步日志输出,同步日志输出可能会影响主线程的执行时间,但可以使CUDA_MOCK输出的日志与其它日志系统输出保序

高级功能

注意

hook函数要与被替换函数类型要保持一致,但是函数名字(特别指mangle后的名字)不能一样,否则会替换失败,或者无限递归调用,暂时未定位!

实现自定义hook函数

实现自定义hook installer例子:

class PythonHookInstaller(cuda_mock.HookInstaller):
    def is_target_lib(self, name):
        return name.find("libcuda_mock_impl.so") != -1
    def is_target_symbol(self, name):
        return name.find("malloc") != -1
lib = cuda_mock.dynamic_obj(cpp_code, True).appen_compile_opts('-g').compile().get_lib()
installer = PythonHookInstaller(lib)
  • 实现hook回调接口 PythonHookInstaller
  • 构造函数需要传入自定义hook函数的库路径(绝对路径 并且 传入库中必须存在与要替换的函数名字以及类型一致的函数 在hook发生过程中,将会把原函数的地址写入以__origin_为开头目标symbol接口的变量中,方便用户拿到原始函数地址 参考:test/py_test/test_import_mock.py:15处定义)
  • is_target_lib 是否是要hook的目标函数被调用的library
  • is_target_symbol 是否是要hook的目标函数名字(上面接口返回True才回调到这个接口)
  • new_symbol_name 构造函数中传入共享库中的新的用于替换的函数名字,参数name:当前准备替换的函数名字
  • dynamic_obj 可以运行时编译c++ code,支持引用所有模块:loggerstatistics

贡献代码

调试编译

# 编译
cmake -S . -B build -DCMAKE_INSTALL_PREFIX=`$pwd/`build -DENABLE_BUILD_WITH_GTEST=ON -GNinja
cmake --build build

# 运行单测
cd build
ctest -R

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

cuda_mock-1.1.1-pp310-pypy310_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.1 MB view details)

Uploaded PyPy manylinux: glibc 2.17+ x86-64

cuda_mock-1.1.1-pp39-pypy39_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.1 MB view details)

Uploaded PyPy manylinux: glibc 2.17+ x86-64

cuda_mock-1.1.1-cp313-cp313t-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.1 MB view details)

Uploaded CPython 3.13t manylinux: glibc 2.17+ x86-64

cuda_mock-1.1.1-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.1 MB view details)

Uploaded CPython 3.13 manylinux: glibc 2.17+ x86-64

cuda_mock-1.1.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.1 MB view details)

Uploaded CPython 3.12 manylinux: glibc 2.17+ x86-64

cuda_mock-1.1.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.1 MB view details)

Uploaded CPython 3.11 manylinux: glibc 2.17+ x86-64

cuda_mock-1.1.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.1 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

cuda_mock-1.1.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.1 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

cuda_mock-1.1.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.1 MB view details)

Uploaded CPython 3.8 manylinux: glibc 2.17+ x86-64

cuda_mock-1.1.1-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.1 MB view details)

Uploaded CPython 3.7m manylinux: glibc 2.17+ x86-64

cuda_mock-1.1.1-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.1 MB view details)

Uploaded CPython 3.6m manylinux: glibc 2.17+ x86-64

File details

Details for the file cuda_mock-1.1.1-pp310-pypy310_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for cuda_mock-1.1.1-pp310-pypy310_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 fffa985d212f4846dc0869f545bf103121f0c2aa351a3a5a0f972d665f9927fd
MD5 a0564ddbb2cc5207ecb15e657f7486e1
BLAKE2b-256 fd7032ccc8a2dc5c2c764db5c30aaa6a099f3ffb257b9f0a0e264969cd012514

See more details on using hashes here.

File details

Details for the file cuda_mock-1.1.1-pp39-pypy39_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for cuda_mock-1.1.1-pp39-pypy39_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 88ec934ce86c788b80818d6bff280671d78b9879368c997ed41ef7b48e35eb54
MD5 ecd06af05bfa74b6f77759b955fe9d36
BLAKE2b-256 a16398c37cf64877582ba9f3b63c7db35ad767977a8d972a38c01876112d873d

See more details on using hashes here.

File details

Details for the file cuda_mock-1.1.1-cp313-cp313t-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for cuda_mock-1.1.1-cp313-cp313t-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 3ec90935684644ae00a77a61d79df28d2ff63a148936f41e6c0bda6f663efb59
MD5 a6a7cda4dcec78261edbe5930f1c26a5
BLAKE2b-256 8c8683ce30296312be2f295495386a809fcb58ca8a4744ef804723c306c2b594

See more details on using hashes here.

File details

Details for the file cuda_mock-1.1.1-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for cuda_mock-1.1.1-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 681af3da3d411ac421e7d38ec38a10b497a21e510c907616d074ed1694312910
MD5 0c5f8ca60764e8a703b964a7ebb5fda5
BLAKE2b-256 0add7d3aa3f814aa5c250c5c85d508ad954ed416238afcfd9c76df71f24f971e

See more details on using hashes here.

File details

Details for the file cuda_mock-1.1.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for cuda_mock-1.1.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 6d23f49183698675a1da8eb37f1195fe9acb29df8df6839909e7529b6d31bb4a
MD5 00c2a77996466468fc02cd41e35f3611
BLAKE2b-256 56e5ed36322265c34b225978fc36756f66983fe1dd8cd2cfe52a15f13fa94768

See more details on using hashes here.

File details

Details for the file cuda_mock-1.1.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for cuda_mock-1.1.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 8008e71f9cec7130f033aadd18d76579d14c85561d1b556f5b3135ea92039773
MD5 ddbe3ab6c4e8abb1cf2e1d5e7f362216
BLAKE2b-256 94502dff422d445aaf48afa598a16e4e53de43b59735cfb7d229c9a55613299d

See more details on using hashes here.

File details

Details for the file cuda_mock-1.1.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for cuda_mock-1.1.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 4e2e6960ae7c5d3a56f44279c37fb26ad349920133a42792637c8447c4f0997e
MD5 80784513b2544e9a8f06a7cccdec1994
BLAKE2b-256 ab17958a9aef8ff169da0e2f150b1e7879416166359a9b63406f46533a2dd8db

See more details on using hashes here.

File details

Details for the file cuda_mock-1.1.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for cuda_mock-1.1.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 7075d1223093d45647b232f658731ef8d8cc2af52899220246215b33c1a8059f
MD5 9ab87afa62830dda965b492ae7afb499
BLAKE2b-256 323ba1016345d6fe5feef460afb9e96df6391c63091f76a69a64a3ace8a63af1

See more details on using hashes here.

File details

Details for the file cuda_mock-1.1.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for cuda_mock-1.1.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 8b15dba23b221f1447471da537aedbe09cf8041deb9a39b65e8e699746c06233
MD5 4cf41e396d7f5dd7bf0f08d3e1b945b7
BLAKE2b-256 aed213eeb900c33efbf760dde6bce0b38fa0cb19c8ec1614a44b98badabf7b4a

See more details on using hashes here.

File details

Details for the file cuda_mock-1.1.1-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for cuda_mock-1.1.1-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 0a00fe496ee9150853d7a68a5d22a8c7fd97c23361ca6c2813087ba86948a409
MD5 ba99a9bc6c8a035ba57b1ecb1be6cef4
BLAKE2b-256 e899c17f9a00c782a20f4a5c406d36561f52a9280cf7d5c44fe344fcb143a6a0

See more details on using hashes here.

File details

Details for the file cuda_mock-1.1.1-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for cuda_mock-1.1.1-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 a347631ea42d1a523544780a4cef3451357fea386f61a0198f8a32dfb3b14862
MD5 91a446b8d05e72d3b7c1c1f2cc9b08a5
BLAKE2b-256 584166f50ea7874f81ef8ccadd5e9ecc11a2eb9913a45d45a6bd1924e330dc18

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page