mock cuda runtime api
Project description
cuda-rt-hook(cuda_mock)
cuda-rt-hook(cuda_mock)是一个用于拦截CUDA/XPU Runtime接口(例如,cudaMalloc
和xpu_malloc
)调用的Python库,通过修改PLT(Procedure Linkage Table)来实现动态拦截,无需重新编译PyTorch、Paddle等复杂框架,安装后即可使用,在调用堆栈追踪、调用耗时统计以及Paddle/PyTorch训练和推理的精度调试和性能优化等场景下非常有用。
本项目的灵感来自于plthook项目,项目的初衷是通过拦截CUDA的Runtime调用转为调用mock函数,可以在没有CUDA和GPU环境的情况下运行和调试triton等项目,因而项目取名cuda_mock。后续增加了多个功能,使得cuda_mock项目可以用于模型的调试和性能分析。
安装
直接安装(建议)
pip install cuda_mock
从源码构建
git clone --recursive https://github.com/lipracer/cuda-rt-hook
cd cuda-rt-hook
python setup.py sdist bdist_wheel
pip install dist/*.whl
# 或者:
# python setup.py install
快速开始
找到Paddle/PyTorch模型的训练/推理脚本入口,在首次import torch
/import paddle
之后添加如下代码:
import paddle
import cuda_mock; cuda_mock.xpu_initialize() # 加入这一行
或者
import torch
import cuda_mock; cuda_mock.xpu_initialize() # 加入这一行
根据实际的需求和场景设置cuda_mock的功能环境变量(参考功能使用演示章节),接着按照训练/推理脚本原有的执行方式运行脚本即可。
功能使用演示
功能1: 统计各个so库调用Runtime接口的次数和总耗时
LOG_LEVEL=WARN python run.py
在程序运行结束之后会显示:
功能2: 打印xpu_wait
的C++、C和Python调用堆栈
HOOK_ENABLE_TRACE="xpu_wait=1" python run.py
在程序运行结束之后会显示:
功能3: 统计模型训练/推理过程中的峰值内存
LOG_LEVEL=WARN python run.py
在程序运行结束之后会显示:
功能4:显示每次内存分配的信息
LOG_LEVEL=MEMORY=INFO python run.py
在程序运行过程中会显示:
功能5: 打印Runtime接口的耗时
LOG_SYNC_MODE=1 LOG_LEVEL=PROFILE=INFO python run.py
在程序运行过程中会显示:
功能6:打印Runtime的参数
HOOK_ENABLE_TRACE=xpu_malloc=0b10 python run.py
HOOK_ENABLE_TRACE=xpu_malloc=0x2 python run.py
在程序运行过程中会显示:
功能7: 收集CUDA算子调用堆栈
- 找到nvcc安装路径
which nvcc
- 用我们的nvcc替换系统的nvcc(我们只是在编译选项加了
-g
)
mv /usr/local/bin/nvcc /usr/local/bin/nvcc_b
chmod 777 tools/nvcc
cp tools/nvcc /usr/local/bin/nvcc
- 构建并且安装pytorch
- 构建并且安装cuda_mock
- 注意要在import torch之后import cuda_mock
- 开始跑你的训练脚本
- 我们将会把堆栈打印到控制台
环境变量
环境变量 | 默认值 | 简短说明 |
---|---|---|
LOG_LEVEL | WARN | 设置全局和各个日志模块的日志级别 |
HOOK_ENABLE_TRACE | 全部接口默认值为0(关闭backtrace) | 是否开启backtrace或参数打印 |
LOG_OUTPUT_PATH | "" | 是否将日志重定向到文件 |
LOG_SYNC_MODE | 0 | 是否使用同步日志输出 |
LOG_LEVEL
- 用法示例:
export LOG_LEVEL=WARN,TRACE=INFO
- 可选值:
- 日志级别: INFO, WARN, ERROR, FATAL
- 日志模块: PROFILE, TRACE, HOOK, PYTHON, MEMORY
- 默认值:
- 全局日志级别: WARN
- 各个日志模块的默认日志级别: WARN
- 说明: 设置全局和各个日志模块的日志级别
HOOK_ENABLE_TRACE
- 用法示例:
export HOOK_ENABLE_TRACE='xpu_memcpy=1,xpu_set_device=0,xpu_wait=0x1'
- 可选值: xpu_malloc, xpu_free, xpu_wait, xpu_memcpy, xpu_set_device, xpu_current_device, xpu_launch_async
- 默认值: 所有接口的默认值均为0,即所有接口默认关闭backtrace
- 说明: 是否开启backtrace和参数打印
HOOK_ENABLE_TRACE
可接收十进制、二进制和十六进制的数字,不同的位作为不同的开关
Bit | 开关说明 |
---|---|
0 | 是否开启backtrace |
1 | 是否开启参数打印 |
LOG_OUTPUT_PATH
- 用法示例:
export LOG_OUTPUT_PATH='/tmp/'
- 可选值: 日志输出文件夹
- 默认值: ""
- 说明: 是否将日志重定向到文件, 默认是输出到标准输出
LOG_SYNC_MODE
- 用法示例:
export LOG_SYNC_MODE=1
- 可选值: 0 或 1
- 默认值: 0
- 说明: 是否使用同步日志输出,同步日志输出可能会影响主线程的执行时间,但可以使CUDA_MOCK输出的日志与其它日志系统输出保序
高级功能
注意
hook函数要与被替换函数类型要保持一致,但是函数名字(特别指mangle后的名字)不能一样,否则会替换失败,或者无限递归调用,暂时未定位!
实现自定义hook函数
实现自定义hook installer例子:
class PythonHookInstaller(cuda_mock.HookInstaller):
def is_target_lib(self, name):
return name.find("libcuda_mock_impl.so") != -1
def is_target_symbol(self, name):
return name.find("malloc") != -1
lib = cuda_mock.dynamic_obj(cpp_code, True).appen_compile_opts('-g').compile().get_lib()
installer = PythonHookInstaller(lib)
- 实现hook回调接口
PythonHookInstaller
- 构造函数需要传入自定义hook函数的库路径(绝对路径 并且 传入库中必须存在与要替换的函数名字以及类型一致的函数 在hook发生过程中,将会把原函数的地址写入以
__origin_
为开头目标symbol
接口的变量中,方便用户拿到原始函数地址 参考:test/py_test/test_import_mock.py:15
处定义) is_target_lib
是否是要hook的目标函数被调用的libraryis_target_symbol
是否是要hook的目标函数名字(上面接口返回True才回调到这个接口)new_symbol_name
构造函数中传入共享库中的新的用于替换的函数名字,参数name
:当前准备替换的函数名字dynamic_obj
可以运行时编译c++ code,支持引用所有模块:logger
、statistics
贡献代码
调试编译
# 编译
cmake -S . -B build -DCMAKE_INSTALL_PREFIX=`$pwd/`build -DENABLE_BUILD_WITH_GTEST=ON -GNinja
cmake --build build
# 运行单测
cd build
ctest -R
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
File details
Details for the file cuda_mock-1.1.1-pp310-pypy310_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
.
File metadata
- Download URL: cuda_mock-1.1.1-pp310-pypy310_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 1.1 MB
- Tags: PyPy, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.12.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | fffa985d212f4846dc0869f545bf103121f0c2aa351a3a5a0f972d665f9927fd |
|
MD5 | a0564ddbb2cc5207ecb15e657f7486e1 |
|
BLAKE2b-256 | fd7032ccc8a2dc5c2c764db5c30aaa6a099f3ffb257b9f0a0e264969cd012514 |
File details
Details for the file cuda_mock-1.1.1-pp39-pypy39_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
.
File metadata
- Download URL: cuda_mock-1.1.1-pp39-pypy39_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 1.1 MB
- Tags: PyPy, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.12.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 88ec934ce86c788b80818d6bff280671d78b9879368c997ed41ef7b48e35eb54 |
|
MD5 | ecd06af05bfa74b6f77759b955fe9d36 |
|
BLAKE2b-256 | a16398c37cf64877582ba9f3b63c7db35ad767977a8d972a38c01876112d873d |
File details
Details for the file cuda_mock-1.1.1-cp313-cp313t-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
.
File metadata
- Download URL: cuda_mock-1.1.1-cp313-cp313t-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 1.1 MB
- Tags: CPython 3.13t, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.12.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3ec90935684644ae00a77a61d79df28d2ff63a148936f41e6c0bda6f663efb59 |
|
MD5 | a6a7cda4dcec78261edbe5930f1c26a5 |
|
BLAKE2b-256 | 8c8683ce30296312be2f295495386a809fcb58ca8a4744ef804723c306c2b594 |
File details
Details for the file cuda_mock-1.1.1-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
.
File metadata
- Download URL: cuda_mock-1.1.1-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 1.1 MB
- Tags: CPython 3.13, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.12.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 681af3da3d411ac421e7d38ec38a10b497a21e510c907616d074ed1694312910 |
|
MD5 | 0c5f8ca60764e8a703b964a7ebb5fda5 |
|
BLAKE2b-256 | 0add7d3aa3f814aa5c250c5c85d508ad954ed416238afcfd9c76df71f24f971e |
File details
Details for the file cuda_mock-1.1.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
.
File metadata
- Download URL: cuda_mock-1.1.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 1.1 MB
- Tags: CPython 3.12, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.12.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6d23f49183698675a1da8eb37f1195fe9acb29df8df6839909e7529b6d31bb4a |
|
MD5 | 00c2a77996466468fc02cd41e35f3611 |
|
BLAKE2b-256 | 56e5ed36322265c34b225978fc36756f66983fe1dd8cd2cfe52a15f13fa94768 |
File details
Details for the file cuda_mock-1.1.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
.
File metadata
- Download URL: cuda_mock-1.1.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 1.1 MB
- Tags: CPython 3.11, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.12.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8008e71f9cec7130f033aadd18d76579d14c85561d1b556f5b3135ea92039773 |
|
MD5 | ddbe3ab6c4e8abb1cf2e1d5e7f362216 |
|
BLAKE2b-256 | 94502dff422d445aaf48afa598a16e4e53de43b59735cfb7d229c9a55613299d |
File details
Details for the file cuda_mock-1.1.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
.
File metadata
- Download URL: cuda_mock-1.1.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 1.1 MB
- Tags: CPython 3.10, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.12.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4e2e6960ae7c5d3a56f44279c37fb26ad349920133a42792637c8447c4f0997e |
|
MD5 | 80784513b2544e9a8f06a7cccdec1994 |
|
BLAKE2b-256 | ab17958a9aef8ff169da0e2f150b1e7879416166359a9b63406f46533a2dd8db |
File details
Details for the file cuda_mock-1.1.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
.
File metadata
- Download URL: cuda_mock-1.1.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 1.1 MB
- Tags: CPython 3.9, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.12.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7075d1223093d45647b232f658731ef8d8cc2af52899220246215b33c1a8059f |
|
MD5 | 9ab87afa62830dda965b492ae7afb499 |
|
BLAKE2b-256 | 323ba1016345d6fe5feef460afb9e96df6391c63091f76a69a64a3ace8a63af1 |
File details
Details for the file cuda_mock-1.1.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
.
File metadata
- Download URL: cuda_mock-1.1.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 1.1 MB
- Tags: CPython 3.8, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.12.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8b15dba23b221f1447471da537aedbe09cf8041deb9a39b65e8e699746c06233 |
|
MD5 | 4cf41e396d7f5dd7bf0f08d3e1b945b7 |
|
BLAKE2b-256 | aed213eeb900c33efbf760dde6bce0b38fa0cb19c8ec1614a44b98badabf7b4a |
File details
Details for the file cuda_mock-1.1.1-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
.
File metadata
- Download URL: cuda_mock-1.1.1-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 1.1 MB
- Tags: CPython 3.7m, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.12.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0a00fe496ee9150853d7a68a5d22a8c7fd97c23361ca6c2813087ba86948a409 |
|
MD5 | ba99a9bc6c8a035ba57b1ecb1be6cef4 |
|
BLAKE2b-256 | e899c17f9a00c782a20f4a5c406d36561f52a9280cf7d5c44fe344fcb143a6a0 |
File details
Details for the file cuda_mock-1.1.1-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
.
File metadata
- Download URL: cuda_mock-1.1.1-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 1.1 MB
- Tags: CPython 3.6m, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.12.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a347631ea42d1a523544780a4cef3451357fea386f61a0198f8a32dfb3b14862 |
|
MD5 | 91a446b8d05e72d3b7c1c1f2cc9b08a5 |
|
BLAKE2b-256 | 584166f50ea7874f81ef8ccadd5e9ecc11a2eb9913a45d45a6bd1924e330dc18 |