mock cuda runtime api
Project description
The plt hook technology used refers to plthook
mock pytorch cuda runtime interface
-
update submodule
git submodule update --init --recursive
-
build wheel package
python setup.py sdist bdist_wheel
-
direct install
pip install dist/*.whl
collect cuda operator call stack
- find nvcc installed path
which nvcc
- replace nvcc with my nvcc
mv /usr/local/bin/nvcc /usr/local/bin/nvcc_b
chmod 777 tools/nvcc
cp tools/nvcc /usr/local/bin/nvcc
- build and install pytorch
- build and install cuda_mock
- import cuda_mock after import torch
- run your torch train script
- we will dump the stack into console
收集cuda 算子调用堆栈
- 找到nvcc安装路径
which nvcc
- 用我们的nvcc替换系统的nvcc(我们只是在编译选项加了
-g
)
mv /usr/local/bin/nvcc /usr/local/bin/nvcc_b
chmod 777 tools/nvcc
cp tools/nvcc /usr/local/bin/nvcc
- 构建并且安装pytorch
- 构建并且安装cuda_mock
- 注意要在import torch之后import cuda_mock
- 开始跑你的训练脚本
- 我们将会把堆栈打印到控制台
收集统计xpu runtime 内存分配信息/xpu_wait
调用堆栈
-
打印
xpu_malloc
调用序列,统计实时内存使用情况以及历史使用的峰值内存,排查内存碎片问题 -
打印
xpu_wait
调用堆栈,排查流水中断处问题 -
注意要在
import torch
/import paddle
之后import cuda_mock; cuda_mock.xpu_initialize()
-
使用方法:
import paddle import cuda_mock; cuda_mock.xpu_initialize() # 加入这一行
-
关闭打印backtrace(获取backtrace性能下降比较严重)
export HOOK_DISABLE_TRACE='xpuMemcpy=0,xpuSetDevice=0'
实现自定义hook函数
-
实现自定义hook installer例子:
class PythonHookInstaller(cuda_mock.HookInstaller): def is_target_lib(self, name): return name.find("libcuda_mock_impl.so") != -1 def is_target_symbol(self, name): return name.find("malloc") != -1 lib = cuda_mock.dynamic_obj(cpp_code, True).appen_compile_opts('-g').compile().get_lib() installer = PythonHookInstaller(lib)
-
实现hook回调接口
PythonHookInstaller
-
构造函数需要传入自定义hook函数的库路径(绝对路径 并且 传入库中必须存在与要替换的函数名字以及类型一致的函数 在hook发生过程中,将会把原函数的地址写入以
__origin_
为开头目标symbol
接口的变量中,方便用户拿到原始函数地址 参考:test/py_test/test_import_mock.py:15
处定义) -
is_target_lib
是否是要hook的目标函数被调用的library -
is_target_symbol
是否是要hook的目标函数名字(上面接口返回True才回调到这个接口) -
new_symbol_name
构造函数中传入共享库中的新的用于替换的函数名字,参数name
:当前准备替换的函数名字 -
dynamic_obj
可以运行时编译c++ code,支持引用所有模块:logger
、statistics
example
python test/test_import_mock.py
debug
export LOG_LEVEL=WARN,TRACE=INFO
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for cuda_mock-0.1.5-pp310-pypy310_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f4118fdf53ecb836e016613607229ab1cf786c24aeedb4affa790fe15af2b544 |
|
MD5 | 9c509c41f701866db96f66698d2c3eba |
|
BLAKE2b-256 | 2d7d33dead9683afcceabd5d81a3728f77e7c6bd8e0944a4dd280cee80e8a96e |
Hashes for cuda_mock-0.1.5-pp39-pypy39_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f6e459e18f50339003774d6fbb8df2dc33f00b2e49bcb615230fd09696b354d8 |
|
MD5 | 52a4e1c78441291a0aa74e9614f5c00c |
|
BLAKE2b-256 | 428f92064faffb84fa8e222480fb119c928814f798bbed83239cc0539f52f899 |
Hashes for cuda_mock-0.1.5-pp38-pypy38_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d9bfd04c7d88a8b7824641107a3cc8875353bcafdedbced64dde6af3e0eedf9a |
|
MD5 | 2e395e23a565349cd3c420458bd6d833 |
|
BLAKE2b-256 | 293e47d8038a32d2c444de2bf34f7f3c069ac0b62d35ad85fa00b2e3111994d8 |
Hashes for cuda_mock-0.1.5-pp37-pypy37_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e24158212f37d92d973ed09d233e3ea53e04766d726d34ebf5f2bd148cfb1b05 |
|
MD5 | 4deb3a28aedc7cd0abab0d8d70a05250 |
|
BLAKE2b-256 | ceb6b178874fc2890569d59645ab37bffec6828bc48c3b4a7de04418910935b1 |
Hashes for cuda_mock-0.1.5-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2d75750ca68f1e9d30c78534a47cb969e1a082a692a765e52238bcdadbf5792d |
|
MD5 | f580a39cc49f445e09cb2b548f55c537 |
|
BLAKE2b-256 | f94bf8a53be1111bf19a81824ff9bd60915731b5e80dec82aeac5fe161ec5037 |
Hashes for cuda_mock-0.1.5-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d8ebdb0342b3a00df4da19a26cc0dc142cfa311f4c8881264d666cef51666efe |
|
MD5 | e6ca14752555b5f46bd841ba7e9f9b18 |
|
BLAKE2b-256 | b85e6f4d704c0811edbe9e5759257245cb1b3de5ce958f5d31eceeee8a8f7176 |
Hashes for cuda_mock-0.1.5-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | bf53877245a2a7880d2e9db8c4b5836cf7fe1491bb0bf3160a0a7d63ba6d21b1 |
|
MD5 | aafe2e721e932eda715b1e8a416cd533 |
|
BLAKE2b-256 | 93c8e883b5a8789a6b2eed90600fed06795fcdc2f0e4c1f0969a34b442abd0b7 |
Hashes for cuda_mock-0.1.5-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ddd9dc28ecd20cf7477446d65025c46e1280eb456a868e678bb9b9cd197dbd63 |
|
MD5 | 2207dc063f0890cf7b023dff09dbf10e |
|
BLAKE2b-256 | 2d1238ad8cbf16aeac568ec6675d814b7bc32de67e6d6d1a032f22d550b9aa19 |
Hashes for cuda_mock-0.1.5-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3b2ba9b2e2ad23e436a1b816c141cddeccbfac5798466d994cd25d9e417bc3ca |
|
MD5 | 9544c35d835f363851fade742fe83c57 |
|
BLAKE2b-256 | 975c08e3c46e9e5ced48856fcee0b170f1903afd3944f2446006687f47a662fc |
Hashes for cuda_mock-0.1.5-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f1a4bbcae7ef0f0ad0dce189d33014630766159521fd0d34dbe853ce013c60a8 |
|
MD5 | 81306e15f5de0ec4c3e9924b0a4ad090 |
|
BLAKE2b-256 | dce5e41a881f0993ffdf0d9c111eb7837b8940e099f1057ae5f7d03ba1c43843 |
Hashes for cuda_mock-0.1.5-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4cce10b3e03ba181dddfcded46408fb5827f6e772d71b9c79e3b8ac62eebbfac |
|
MD5 | e4ab8afcc95cfa306c9da564b66d1f9f |
|
BLAKE2b-256 | cccdbecbc125f4455d97b52cd15972e811274b1a52859d89bef4c8be0be0f17b |