mock cuda runtime api
Project description
The plt hook technology used refers to plthook
mock pytorch cuda runtime interface
-
update submodule
git submodule update --init --recursive
-
build wheel package
python setup.py sdist bdist_wheel
-
direct install
pip install dist/*.whl
collect cuda operator call stack
- find nvcc installed path
which nvcc
- replace nvcc with my nvcc
mv /usr/local/bin/nvcc /usr/local/bin/nvcc_b
chmod 777 tools/nvcc
cp tools/nvcc /usr/local/bin/nvcc
- build and install pytorch
- build and install cuda_mock
- import cuda_mock after import torch
- run your torch train script
- we will dump the stack into console
收集cuda 算子调用堆栈
- 找到nvcc安装路径
which nvcc
- 用我们的nvcc替换系统的nvcc(我们只是在编译选项加了
-g
)
mv /usr/local/bin/nvcc /usr/local/bin/nvcc_b
chmod 777 tools/nvcc
cp tools/nvcc /usr/local/bin/nvcc
- 构建并且安装pytorch
- 构建并且安装cuda_mock
- 注意要在import torch之后import cuda_mock
- 开始跑你的训练脚本
- 我们将会把堆栈打印到控制台
收集统计xpu runtime 内存分配信息/xpu_wait
调用堆栈
-
打印
xpu_malloc
调用序列,统计实时内存使用情况以及历史使用的峰值内存,排查内存碎片问题 -
打印
xpu_wait
调用堆栈,排查流水中断处问题 -
注意要在
import torch
/import paddle
之后import cuda_mock; cuda_mock.xpu_initialize()
-
使用方法:
import paddle import cuda_mock; cuda_mock.xpu_initialize() # 加入这一行
-
关闭打印backtrace(获取backtrace性能下降比较严重)
export HOOK_DISABLE_TRACE='xpuMemcpy=0,xpuSetDevice=0'
实现自定义hook函数
-
实现自定义hook installer例子:
class PythonHookInstaller(cuda_mock.HookInstaller): def is_target_lib(self, name): return name.find("libcuda_mock_impl.so") != -1 def is_target_symbol(self, name): return name.find("malloc") != -1 lib = cuda_mock.dynamic_obj(cpp_code, True).appen_compile_opts('-g').compile().get_lib() installer = PythonHookInstaller(lib)
-
实现hook回调接口
PythonHookInstaller
-
构造函数需要传入自定义hook函数的库路径(绝对路径 并且 传入库中必须存在与要替换的函数名字以及类型一致的函数 在hook发生过程中,将会把原函数的地址写入以
__origin_
为开头目标symbol
接口的变量中,方便用户拿到原始函数地址 参考:test/py_test/test_import_mock.py:15
处定义) -
is_target_lib
是否是要hook的目标函数被调用的library -
is_target_symbol
是否是要hook的目标函数名字(上面接口返回True才回调到这个接口) -
new_symbol_name
构造函数中传入共享库中的新的用于替换的函数名字,参数name
:当前准备替换的函数名字 -
dynamic_obj
可以运行时编译c++ code,支持引用所有模块:logger
、statistics
example
python test/test_import_mock.py
debug
export LOG_LEVEL=WARN,TRACE=INFO
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for cuda_mock-0.1.3-pp310-pypy310_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 79a1a876129b5b8b98955e5066ab40a5065cab30accee97b44816668ab7aa60d |
|
MD5 | 5ee1bac5585111f3f472c35473c6516d |
|
BLAKE2b-256 | 8133968092e0b9b8711ebd90242c46058c6932c65bf961bd271a4b91c5cb151f |
Hashes for cuda_mock-0.1.3-pp39-pypy39_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 666309a2b2e7a169c89f1b39772f754266d126ae9b74f1dbafc39b77d115923b |
|
MD5 | a41f8e11e908ea571ec4eff5ce0609e8 |
|
BLAKE2b-256 | 3ec38ffd41ff4db70b676e0e8ae701f2236fdb2bebf727fbb16041b912adb0e3 |
Hashes for cuda_mock-0.1.3-pp38-pypy38_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 919b5fdf1ec8ac19ceae29c1ba663bbd2f9948f67ae996c2334db81fe706ea81 |
|
MD5 | e1d021c7ea3369d49474cec137cdbe22 |
|
BLAKE2b-256 | a4d5d8275fd9aabc67f87807050eab0b5fc962e3196594ceefef474f4497f01b |
Hashes for cuda_mock-0.1.3-pp37-pypy37_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e1ba9ccd078723e95192afc95b87088fc721140111b2fbab5a3aa206ec927efd |
|
MD5 | 77ec1ccbaeba859b26da1233b7e25513 |
|
BLAKE2b-256 | c29d512504dd327458a079c8dd62641a00ac47829f44afff7577f192bf694052 |
Hashes for cuda_mock-0.1.3-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7c7eb1d66147d98c6b06396191eacf50b691c1ccb80d924cbdbf57449409f738 |
|
MD5 | 0f5d0b4db25974c90bec5cc3d8dd8355 |
|
BLAKE2b-256 | 08a5b89defe000248ddeda3d438db66b9d123adc21e77165fd114b4050119664 |
Hashes for cuda_mock-0.1.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d339c12b33671ec576be20ce97720493ed89c8c5d7bb7b64fa7c4808c66e230b |
|
MD5 | e7901e1819d25ed595a430083cc7c77d |
|
BLAKE2b-256 | fc9b192a9bf142fedc993062b9cb2a6cdda106d4cd8930367cf1eda476a862a9 |
Hashes for cuda_mock-0.1.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0467bea6816a53b5526f49eefeaf6c5a351a6d7d9710878bc0b233ccb3c76782 |
|
MD5 | f8b0d7c0b89ded272c522d8b80b3f1bb |
|
BLAKE2b-256 | f1b1e0bce24147d7e0b3b4d24c83e984d6019ddfbaecc5a49e2928c92e84f8b9 |
Hashes for cuda_mock-0.1.3-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | cb4efcccfdf4df86689c6a7d08b9665f2618c57e876ba246c44b92c8aacd72b8 |
|
MD5 | 2a58078ddf5c6f26cdbd6c285cae54cc |
|
BLAKE2b-256 | 3c01a9eaa662f5a37eaf056007265be7217a481e00319526541f9d1905ad2d82 |
Hashes for cuda_mock-0.1.3-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | bc4ea0eb1496271ab6c5ac93a3825e4e7c452ca7bc2bda4c80f1094d7c677578 |
|
MD5 | b6c95e492fed9e217b6d8f16671e67b5 |
|
BLAKE2b-256 | 4098c8a75bc9e25cfb51f97196d8262a26c1850023352e32cd1da996bcdea579 |
Hashes for cuda_mock-0.1.3-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2a29abbb197017cbe4a9df34785ae1f237720772dfe48e65d331df79a448a95b |
|
MD5 | 1d5cc6db35920fdbd973eca7d78aeac5 |
|
BLAKE2b-256 | a8e6fff8d1c020b9c39eb792dfcc504cbac716c1b406b84cc38b6e8179ca19e6 |
Hashes for cuda_mock-0.1.3-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 912ea30c51a7c47560ee291ae5c63d9b2fe0bc6ee0b763d9d8559dc7d695c673 |
|
MD5 | 7f88e8efceedb29d1f83e6c14f86ee0d |
|
BLAKE2b-256 | 21bb7827803f7b081594baf0db480b327d990f8ec2386d675d49e277a0feb345 |