Skip to main content

mock cuda runtime api

Project description

cuda-rt-hook(cuda_mock)

PyPI Version

cuda-rt-hook(cuda_mock)是一个用于拦截CUDA/XPU Runtime接口(例如,cudaMallocxpu_malloc)调用的Python库,通过修改PLT(Procedure Linkage Table)来实现动态拦截,无需重新编译PyTorch、Paddle等复杂框架,安装后即可使用,在调用堆栈追踪、调用耗时统计以及Paddle/PyTorch训练和推理的精度调试和性能优化等场景下非常有用。

本项目的灵感来自于plthook项目,项目的初衷是通过拦截CUDA的Runtime调用转为调用mock函数,可以在没有CUDA和GPU环境的情况下运行和调试triton等项目,因而项目取名cuda_mock。后续增加了多个功能,使得cuda_mock项目可以用于模型的调试和性能分析。

安装

直接安装(建议)

pip install cuda_mock

从源码构建

git clone --recursive https://github.com/lipracer/cuda-rt-hook
cd cuda-rt-hook

python setup.py sdist bdist_wheel
pip install dist/*.whl

# 或者:
# python setup.py install

快速开始

找到Paddle/PyTorch模型的训练/推理脚本入口,在首次import torch/import paddle之后添加如下代码:

import paddle
import cuda_mock; cuda_mock.xpu_initialize() # 加入这一行

或者

import torch
import cuda_mock; cuda_mock.xpu_initialize() # 加入这一行

根据实际的需求和场景设置cuda_mock的功能环境变量(参考功能使用演示章节),接着按照训练/推理脚本原有的执行方式运行脚本即可。

功能使用演示

目前,支持以下功能的Runtime接口有:

  • xpu_malloc
  • xpu_free
  • xpu_current_device
  • xpu_set_device
  • xpu_wait
  • xpu_memcpy
  • xpu_launch_async
  • xpu_stream_create
  • xpu_stream_destroy
  • cudaMalloc
  • cudaFree
  • cudaMemcpy
  • cudaSetDevice
  • cudaGetDevice

具体的支持情况请查阅xpu_mock.cppXpuRuntimeApiHook

功能1: 统计各个so库调用Runtime接口的次数和总耗时

LOG_LEVEL=WARN python run.py

在程序运行结束之后会显示:

runtime_api_counts

功能2: 打印xpu_wait的C++、C和Python调用堆栈

HOOK_ENABLE_TRACE="xpu_wait=1" python run.py

在程序运行结束之后会显示:

backtrace

功能3: 统计模型训练/推理过程中的峰值内存

LOG_LEVEL=WARN python run.py

在程序运行结束之后会显示:

memory_peaks

功能4:显示每次内存分配的信息

LOG_LEVEL=MEMORY=INFO python run.py

在程序运行过程中会显示:

memory_allocation

功能5: 打印Runtime接口的耗时

LOG_SYNC_MODE=1 LOG_LEVEL=PROFILE=INFO python run.py

在程序运行过程中会显示:

time_statistic

功能6:打印Runtime的参数

HOOK_ENABLE_TRACE=xpu_malloc=0b10 python run.py
HOOK_ENABLE_TRACE=xpu_malloc=0x2 python run.py

在程序运行过程中会显示:

print_args

功能7: 收集CUDA算子调用堆栈

  • 找到nvcc安装路径 which nvcc
  • 用我们的nvcc替换系统的nvcc(我们只是在编译选项加了-g
    mv /usr/local/bin/nvcc /usr/local/bin/nvcc_b
    chmod 777 tools/nvcc
    cp tools/nvcc /usr/local/bin/nvcc
  • 构建并且安装pytorch
  • 构建并且安装cuda_mock
  • 注意要在import torch之后import cuda_mock
  • 开始跑你的训练脚本
  • 我们将会把堆栈打印到控制台

环境变量

环境变量 默认值 简短说明
LOG_LEVEL WARN 设置全局和各个日志模块的日志级别
HOOK_ENABLE_TRACE 全部接口默认值为0(关闭backtrace) 是否开启backtrace或参数打印
LOG_OUTPUT_PATH "" 是否将日志重定向到文件
LOG_SYNC_MODE 0 是否使用同步日志输出

LOG_LEVEL

  • 用法示例: export LOG_LEVEL=WARN,TRACE=INFO
  • 可选值:
    • 日志级别: INFO, WARN, ERROR, FATAL
    • 日志模块: PROFILE, TRACE, HOOK, PYTHON, MEMORY
  • 默认值:
    • 全局日志级别: WARN
    • 各个日志模块的默认日志级别: WARN
  • 说明: 设置全局和各个日志模块的日志级别

HOOK_ENABLE_TRACE

  • 用法示例: export HOOK_ENABLE_TRACE='xpu_memcpy=1,xpu_set_device=0,xpu_wait=0x1'
  • 可选值: xpu_malloc, xpu_free, xpu_wait, xpu_memcpy, xpu_set_device, xpu_current_device, xpu_launch_async
  • 默认值: 所有接口的默认值均为0,即所有接口默认关闭backtrace
  • 说明: 是否开启backtrace和参数打印

HOOK_ENABLE_TRACE可接收十进制、二进制和十六进制的数字,不同的位作为不同的开关

Bit 开关说明
0 是否开启backtrace
1 是否开启参数打印

LOG_OUTPUT_PATH

  • 用法示例: export LOG_OUTPUT_PATH='/tmp/'
  • 可选值: 日志输出文件夹
  • 默认值: ""
  • 说明: 是否将日志重定向到文件, 默认是输出到标准输出

LOG_SYNC_MODE

  • 用法示例: export LOG_SYNC_MODE=1
  • 可选值: 0 或 1
  • 默认值: 0
  • 说明: 是否使用同步日志输出,同步日志输出可能会影响主线程的执行时间,但可以使CUDA_MOCK输出的日志与其它日志系统输出保序

高级功能

注意

hook函数要与被替换函数类型要保持一致,但是函数名字(特别指mangle后的名字)不能一样,否则会替换失败,或者无限递归调用,暂时未定位!

实现自定义hook函数

实现自定义hook installer例子:

class PythonHookInstaller(cuda_mock.HookInstaller):
    def is_target_lib(self, name):
        return name.find("libcuda_mock_impl.so") != -1
    def is_target_symbol(self, name):
        return name.find("malloc") != -1
lib = cuda_mock.dynamic_obj(cpp_code, True).appen_compile_opts('-g').compile().get_lib()
installer = PythonHookInstaller(lib)
  • 实现hook回调接口 PythonHookInstaller
  • 构造函数需要传入自定义hook函数的库路径(绝对路径 并且 传入库中必须存在与要替换的函数名字以及类型一致的函数 在hook发生过程中,将会把原函数的地址写入以__origin_为开头目标symbol接口的变量中,方便用户拿到原始函数地址 参考:test/py_test/test_import_mock.py:15处定义)
  • is_target_lib 是否是要hook的目标函数被调用的library
  • is_target_symbol 是否是要hook的目标函数名字(上面接口返回True才回调到这个接口)
  • new_symbol_name 构造函数中传入共享库中的新的用于替换的函数名字,参数name:当前准备替换的函数名字
  • dynamic_obj 可以运行时编译c++ code,支持引用所有模块:loggerstatistics

贡献代码

调试编译

# 编译
cmake -S . -B build -DCMAKE_INSTALL_PREFIX=`$pwd/`build -DENABLE_BUILD_WITH_GTEST=ON -GNinja
cmake --build build

# 运行单测
cd build
ctest -R

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

cuda_mock-1.1.3-pp310-pypy310_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.2 MB view details)

Uploaded PyPymanylinux: glibc 2.17+ x86-64

cuda_mock-1.1.3-cp313-cp313t-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.13tmanylinux: glibc 2.17+ x86-64

cuda_mock-1.1.3-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.17+ x86-64

cuda_mock-1.1.3-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

cuda_mock-1.1.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

cuda_mock-1.1.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

cuda_mock-1.1.3-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.9manylinux: glibc 2.17+ x86-64

cuda_mock-1.1.3-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.8manylinux: glibc 2.17+ x86-64

cuda_mock-1.1.3-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.7mmanylinux: glibc 2.17+ x86-64

cuda_mock-1.1.3-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.6mmanylinux: glibc 2.17+ x86-64

File details

Details for the file cuda_mock-1.1.3-pp310-pypy310_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for cuda_mock-1.1.3-pp310-pypy310_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 67b71f94bbc4623c2d83c1ba4ccb6ce00b0d8f2d2c3030aa91995fb953154cf0
MD5 8c82f207e2eec0333a76a6d5eef6787c
BLAKE2b-256 acde5361d6447c7cf03233995dba810f313b119e7ea6f838db46c4e69eb010a7

See more details on using hashes here.

File details

Details for the file cuda_mock-1.1.3-cp313-cp313t-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for cuda_mock-1.1.3-cp313-cp313t-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 a8c49f6b5fd299871854193fe9a942c851348b4441ed2b269fa026f51af69ec2
MD5 ffe9c16a51d5ca7cba04c6ea0f744519
BLAKE2b-256 4a632404cefd345be766bfc6ca52ba113dde2546892e1ad8a09b084ddcc346ea

See more details on using hashes here.

File details

Details for the file cuda_mock-1.1.3-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for cuda_mock-1.1.3-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 13aee7b088e43139872ec41155edfa7fbb13cee5cc8a487acaa9a73c9a67412a
MD5 7457ef117df9c7d3454fea1b7147b0ed
BLAKE2b-256 20e621687e3415889f4ec0aae5917c73ee951c2de17fa88db3a480fc7e9a57b7

See more details on using hashes here.

File details

Details for the file cuda_mock-1.1.3-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for cuda_mock-1.1.3-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 c67622d6e10620d8b48ea1e394b656f76a120a30fe781cdde2abd8f3461b81b5
MD5 414a968f0ebd2d6a7c767ba20ebbafb9
BLAKE2b-256 a766419d6255ddcd68f6d4f1c41268d9691401add5742a79c4424cd28aaa9aca

See more details on using hashes here.

File details

Details for the file cuda_mock-1.1.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for cuda_mock-1.1.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 39f56bc47333340f89587cbe041cf94d7d367d1cfcd34f3cc088e8845262342e
MD5 57d2d6ad1cc1085c53b8e8959b0be18a
BLAKE2b-256 d9f2927818d40592afc953dd594232673b80591b6981a5e62301ac757b816e2d

See more details on using hashes here.

File details

Details for the file cuda_mock-1.1.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for cuda_mock-1.1.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 681c0e926d61e1e15557232abb26d15eeb17a7e9dabac79ebc2cf13c12ed143d
MD5 3ad03250509933e4f0357a50e8fec186
BLAKE2b-256 5e4f0b0fb674e599d73e1f8be3d841f88802f5a7520dde6aac1836032fab1b8e

See more details on using hashes here.

File details

Details for the file cuda_mock-1.1.3-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for cuda_mock-1.1.3-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 5a7f35c6f674c24ecf29deab24c79089c9f8693ad00eb04340139032a3d9f297
MD5 a8a378b530c2d4f9aa1faafeaf06cfd6
BLAKE2b-256 de2f7e6bab16e74e37d3e94524dea93cb3511b82af27503bcaeabfc5c5f7ffb1

See more details on using hashes here.

File details

Details for the file cuda_mock-1.1.3-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for cuda_mock-1.1.3-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 a6c35e918aad8df5a6aa772a5e9bfa49426c951d5844278eeeafb6f950747ddb
MD5 34c53bc40cce6110ad634b63ed7c3160
BLAKE2b-256 35c43a859c9282ee6311b8b2e029d6b5bb1b1783df12bd71f14e0e7cffe3c713

See more details on using hashes here.

File details

Details for the file cuda_mock-1.1.3-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for cuda_mock-1.1.3-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 a23421025ab2869540d659e46e7d7958c88574a904fff15122d48d9b970333b0
MD5 4cb73c6c1622cac2f6e346ca8bda7ee7
BLAKE2b-256 13817d7d27bbe2d62ab90d2ebfe12044b453658da24100b3f07b1b665f589c7e

See more details on using hashes here.

File details

Details for the file cuda_mock-1.1.3-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for cuda_mock-1.1.3-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 2e4002d80b3e4faa24e5311f4c6593b3b57319b3bb5f27ad34bfd1a5acbf1662
MD5 57a9a06eeca08cfc5be3b9405a67a820
BLAKE2b-256 5818a8293795afc43a16ab78810a062faea45239bf9f24256d04d4cfd4a2d024

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page