a tools hook some api call at runtime
Project description
The plt hook technology used refers to plthook
mock pytorch cuda runtime interface
-
update submodule
git submodule update --init --recursive
-
build wheel package
python setup.py sdist bdist_wheel
-
direct install
pip install dist/*.whl
collect cuda operator call stack
- find nvcc installed path
which nvcc
- replace nvcc with my nvcc
mv /usr/local/bin/nvcc /usr/local/bin/nvcc_b
chmod 777 tools/nvcc
cp tools/nvcc /usr/local/bin/nvcc
- build and install pytorch
- build and install cuda_mock
- import cuda_mock after import torch
- run your torch train script
- we will dump the stack into console
收集cuda 算子调用堆栈
- 找到nvcc安装路径
which nvcc
- 用我们的nvcc替换系统的nvcc(我们只是在编译选项加了
-g
)
mv /usr/local/bin/nvcc /usr/local/bin/nvcc_b
chmod 777 tools/nvcc
cp tools/nvcc /usr/local/bin/nvcc
- 构建并且安装pytorch
- 构建并且安装cuda_mock
- 注意要在import torch之后import cuda_mock
- 开始跑你的训练脚本
- 我们将会把堆栈打印到控制台
收集统计xpu runtime 内存分配信息/xpu_wait
调用堆栈
-
打印
xpu_malloc
调用序列,统计实时内存使用情况以及历史使用的峰值内存,排查内存碎片问题 -
打印
xpu_wait
调用堆栈,排查流水中断处问题 -
注意要在
import torch
/import paddle
之后import cuda_mock; cuda_mock.xpu_initialize()
-
使用方法:
import paddle import cuda_mock; cuda_mock.xpu_initialize() # 加入这一行
-
关闭打印backtrace(获取backtrace性能下降比较严重)
export HOOK_DISABLE_TRACE='xpuMemcpy=0,xpuSetDevice=0'
实现自定义hook函数
-
实现自定义hook installer例子:
class PythonHookInstaller(cuda_mock.HookInstaller): def is_target_lib(self, name): return name.find("libcuda_mock_impl.so") != -1 def is_target_symbol(self, name): return name.find("malloc") != -1 lib = cuda_mock.dynamic_obj(cpp_code, True).appen_compile_opts('-g').compile().get_lib() installer = PythonHookInstaller(lib)
-
实现hook回调接口
PythonHookInstaller
-
构造函数需要传入自定义hook函数的库路径(绝对路径 并且 传入库中必须存在与要替换的函数名字以及类型一致的函数 在hook发生过程中,将会把原函数的地址写入以
__origin_
为开头目标symbol
接口的变量中,方便用户拿到原始函数地址 参考:test/py_test/test_import_mock.py:15
处定义) -
is_target_lib
是否是要hook的目标函数被调用的library -
is_target_symbol
是否是要hook的目标函数名字(上面接口返回True才回调到这个接口) -
new_symbol_name
构造函数中传入共享库中的新的用于替换的函数名字,参数name
:当前准备替换的函数名字 -
dynamic_obj
可以运行时编译c++ code,支持引用所有模块:logger
、statistics
example
python test/test_import_mock.py
debug
export LOG_LEVEL=WARN,TRACE=INFO
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Hashes for cuda_mock-0.1.1-pp310-pypy310_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | bf5611110101331ddb70dfb20da40d5f8d84a23a5ab3be6ab44823902a8701f7 |
|
MD5 | d3c461cbc9d66e725207f0ff58b1b819 |
|
BLAKE2b-256 | e10a5ce934d3d461080d0b7b647321c2796eb98126758e84cdb054b88c6cd144 |
Hashes for cuda_mock-0.1.1-pp39-pypy39_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 39e5d15f4fe02119c0ced0f543b2bea73c744e9ccd2330ecf34b81eedbb8160a |
|
MD5 | ad30fa0756eb77d858714ee3ea8a836e |
|
BLAKE2b-256 | 0e7a9f3ad9472285bc39ed3db073aaaf00f768ecb2cc66dae9fd8880cb32d44a |
Hashes for cuda_mock-0.1.1-pp38-pypy38_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | fd5a04d179d53bb0539521bed5197ff350cafd16b62c4993638676d16e69e616 |
|
MD5 | a9f305ea77dd4c6c708770f00213639c |
|
BLAKE2b-256 | d92989343c8a0f7848ecb58020a5c2f4378e32576d4163ef517c1105e8c5f4f9 |
Hashes for cuda_mock-0.1.1-pp37-pypy37_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f835d255b8491bb9f308b83241b77a4e9b6124c8c50c67412ca71b019ee0ef79 |
|
MD5 | 847f3c6fc2b7707a1f5a2261afa1d76d |
|
BLAKE2b-256 | a88f90762d952606ed9523f4885ad42e09621776e8d9b13c28801b52454c1d38 |
Hashes for cuda_mock-0.1.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8332a75f22a3018fa36a00bf907092ecd568497de626c333b92bdba00afba2c8 |
|
MD5 | 9677bf68b52ce1784c56445ea19dbe27 |
|
BLAKE2b-256 | 1e747922454437e3e53b2fc51769d4f7b43aa17bd751e82241fef81c2dd0789a |
Hashes for cuda_mock-0.1.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ce942af0adb7b1c5336d5b351d48dc8f0a20cdab27ebb09da8ce38c79fcb71fd |
|
MD5 | c2bde276ebe6d83160936f408b94404b |
|
BLAKE2b-256 | 8c7085c5c4b78166bab1a393a2fe07a5678dd75e38511922e68d24c60bd807d5 |
Hashes for cuda_mock-0.1.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6a29b8eb1bd9191e2ea95443eaafaf37cdaf3fb9265906fb18e8a609e1afeb70 |
|
MD5 | a9f7bd448a53cfd145798d94d0a1b4db |
|
BLAKE2b-256 | e980fafc8601460f5ff969e30ae9951c0b3ff347928658055f8ce729088ec0b9 |
Hashes for cuda_mock-0.1.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3df9994e25caba08af94f879bafc2c59748911b7cd6da37b0a33cdef3b475e1f |
|
MD5 | 2f83e83aea359530eb7321b042785ce4 |
|
BLAKE2b-256 | 65e785f250e389f9df793b5268c879c348f9985dd4d2889cda2548be76dcc5d0 |
Hashes for cuda_mock-0.1.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 55856a6f9b926af7cf19c64b35e9cbc9b0de6ce449bfd7554e44e368d643383e |
|
MD5 | 3617691b0c08143c981676875317228a |
|
BLAKE2b-256 | bc531709935e431820d23e932559d5b0eda995cbc1d77bc08b36a29d28628d7a |
Hashes for cuda_mock-0.1.1-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 66857fa96beaccc563d99c8e16584fb321a684b9b4d36151dc1a62f4330b8f64 |
|
MD5 | 7518489803fc69b3b523f540790cc0d6 |
|
BLAKE2b-256 | eb86a0df0d41fc2a7213c5298464700d416cef8c52c548a4312c4448b823e2c9 |
Hashes for cuda_mock-0.1.1-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d98e8ae5b051618d636e6fa6f306c366be66ad2133f57c047fd22863871bc052 |
|
MD5 | cc6a04dd1849ddede3c8463530d175d5 |
|
BLAKE2b-256 | 24528f215a34b31827f39561e9765865bdad98a337c6e833036469a85d5ee84a |