mock cuda runtime api
Project description
The plt hook technology used refers to plthook
mock pytorch cuda runtime interface
-
update submodule
git submodule update --init --recursive
-
build wheel package
python setup.py sdist bdist_wheel
-
direct install
pip install dist/*.whl
collect cuda operator call stack
- find nvcc installed path
which nvcc
- replace nvcc with my nvcc
mv /usr/local/bin/nvcc /usr/local/bin/nvcc_b
chmod 777 tools/nvcc
cp tools/nvcc /usr/local/bin/nvcc
- build and install pytorch
- build and install cuda_mock
- import cuda_mock after import torch
- run your torch train script
- we will dump the stack into console
收集cuda 算子调用堆栈
- 找到nvcc安装路径
which nvcc
- 用我们的nvcc替换系统的nvcc(我们只是在编译选项加了
-g
)
mv /usr/local/bin/nvcc /usr/local/bin/nvcc_b
chmod 777 tools/nvcc
cp tools/nvcc /usr/local/bin/nvcc
- 构建并且安装pytorch
- 构建并且安装cuda_mock
- 注意要在import torch之后import cuda_mock
- 开始跑你的训练脚本
- 我们将会把堆栈打印到控制台
收集统计xpu runtime 内存分配信息/xpu_wait
调用堆栈
-
打印
xpu_malloc
调用序列,统计实时内存使用情况以及历史使用的峰值内存,排查内存碎片问题 -
打印
xpu_wait
调用堆栈,排查流水中断处问题 -
注意要在
import torch
/import paddle
之后import cuda_mock; cuda_mock.xpu_initialize()
-
使用方法:
import paddle import cuda_mock; cuda_mock.xpu_initialize() # 加入这一行
-
关闭打印backtrace(获取backtrace性能下降比较严重)
export HOOK_DISABLE_TRACE='xpuMemcpy=0,xpuSetDevice=0'
实现自定义hook函数
-
实现自定义hook installer例子:
class PythonHookInstaller(cuda_mock.HookInstaller): def is_target_lib(self, name): return name.find("libcuda_mock_impl.so") != -1 def is_target_symbol(self, name): return name.find("malloc") != -1 lib = cuda_mock.dynamic_obj(cpp_code, True).appen_compile_opts('-g').compile().get_lib() installer = PythonHookInstaller(lib)
-
实现hook回调接口
PythonHookInstaller
-
构造函数需要传入自定义hook函数的库路径(绝对路径 并且 传入库中必须存在与要替换的函数名字以及类型一致的函数 在hook发生过程中,将会把原函数的地址写入以
__origin_
为开头目标symbol
接口的变量中,方便用户拿到原始函数地址 参考:test/py_test/test_import_mock.py:15
处定义) -
is_target_lib
是否是要hook的目标函数被调用的library -
is_target_symbol
是否是要hook的目标函数名字(上面接口返回True才回调到这个接口) -
new_symbol_name
构造函数中传入共享库中的新的用于替换的函数名字,参数name
:当前准备替换的函数名字 -
dynamic_obj
可以运行时编译c++ code,支持引用所有模块:logger
、statistics
example
python test/test_import_mock.py
debug
export LOG_LEVEL=WARN,TRACE=INFO
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for cuda_mock-0.1.6-pp310-pypy310_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 79a063dbfcbaa2e3ff2f97dbcaaff103bdb465da511a51953bb2c16180d1b848 |
|
MD5 | f11e30347f137bb6a68583aa633880fe |
|
BLAKE2b-256 | cdddf167e8ace361008f4a409e1c2277959a1e8e7872c8cee08039ea80f4e9ce |
Hashes for cuda_mock-0.1.6-pp39-pypy39_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a56b1a2b006286920d867b6eea1c3510058d0e9711b1910c9297d12fbe1eb48e |
|
MD5 | 7bf380f9c94d3d88d709b4c2c891b64b |
|
BLAKE2b-256 | 998d5a5a5873bfa249b15fc9d7dc73aa41777ce657f7a6ec5aeb788e9f7f6412 |
Hashes for cuda_mock-0.1.6-pp38-pypy38_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0e86878366b40e7565954c915333d4917b407b52abeb327e91117b3109a0ab4e |
|
MD5 | 389106223390795363bffefe4bb8e3c5 |
|
BLAKE2b-256 | f173c9a37e2795805516f9ce3dcc976d57230edeaf99ee7ae21ce8bf453c663c |
Hashes for cuda_mock-0.1.6-pp37-pypy37_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 00fec15792595d51d4ac5c56c01d3003bfc5729ab23cd6302d2744433ecc6d7c |
|
MD5 | 990ce429e62872abbfc78c4849f725f7 |
|
BLAKE2b-256 | 5da48af66cfe47e0797b50e7076742ec84e4f19439f59060d0ff7ba88edaa0d5 |
Hashes for cuda_mock-0.1.6-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f4e4458563b4cb864b67e65166bdd6f001d0a398d083f983431fdf6d0c32f601 |
|
MD5 | dc0303aa60fed40f8b9bdaa59ec718e2 |
|
BLAKE2b-256 | 8d3ce2279dc8e9230212cc1df9006aca30e1225fe99392b3c5c0ef52dbc05ef5 |
Hashes for cuda_mock-0.1.6-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6d44ec49bba8b19094a85cd77a62e0928706c88423f893522c2e771b411347c7 |
|
MD5 | 1d57d8248d1172973432c79c81ab944a |
|
BLAKE2b-256 | e977fc09d82ab4c60ddcf52421ec7b26bcf96dde4ff00248cb7638f603079650 |
Hashes for cuda_mock-0.1.6-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a9e2b5ed27ce39e5148a281e49f30d4c8b4877980dbca1dd4988a1a594e94c55 |
|
MD5 | b31f6c1d521a27c29832f50141c64ddf |
|
BLAKE2b-256 | 0c80e15e86b0a32f208f60464685dcf582742bd73b2f4738a276992d2a9057c1 |
Hashes for cuda_mock-0.1.6-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 43e34f9b58fa6148582dbad4671e343055a1007873c1199e809c0f0873642db2 |
|
MD5 | fec2d569cad481327edba6d16e902cab |
|
BLAKE2b-256 | 2e9631bc2992f971717275205a93096afbc3d09ab16a6bfb540ebfcd4cd0ea26 |
Hashes for cuda_mock-0.1.6-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 187f7ad93d4c30164e378b11c9d4924d8a981ca090decc7851623484fc9575a9 |
|
MD5 | 37e9c5bdd25b5ffd31411f27d4421dc6 |
|
BLAKE2b-256 | 5cd427e5adf87332befaaa27c8ae02737c9a93bba0fbb7637e765663aace5535 |
Hashes for cuda_mock-0.1.6-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 74072cee6e6856dbb6a0cc7e6bd93423d2e44a6e8e379677b30f13739354911f |
|
MD5 | 06ee3faae2a454375842784db6f87756 |
|
BLAKE2b-256 | 448fadf90099b22590b4e0f0dfb981204b1d4c7bfc8f719663b9e099482d9963 |
Hashes for cuda_mock-0.1.6-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f32769233dc8d3d3c91ea9e38504c08404cd84d9d98e7be584a718eadd6f7473 |
|
MD5 | edbc191f34e276686ef413d780850515 |
|
BLAKE2b-256 | dff80d8e581da9415cc11defa7776a0fd0193bf69bfd2562d0173d3546ea094b |