mock cuda runtime api
Project description
The plt hook technology used refers to plthook
mock pytorch cuda runtime interface
-
update submodule
git submodule update --init --recursive
-
build wheel package
python setup.py sdist bdist_wheel
-
direct install
pip install dist/*.whl
collect cuda operator call stack
- find nvcc installed path
which nvcc
- replace nvcc with my nvcc
mv /usr/local/bin/nvcc /usr/local/bin/nvcc_b
chmod 777 tools/nvcc
cp tools/nvcc /usr/local/bin/nvcc
- build and install pytorch
- build and install cuda_mock
- import cuda_mock after import torch
- run your torch train script
- we will dump the stack into console
收集cuda 算子调用堆栈
- 找到nvcc安装路径
which nvcc
- 用我们的nvcc替换系统的nvcc(我们只是在编译选项加了
-g
)
mv /usr/local/bin/nvcc /usr/local/bin/nvcc_b
chmod 777 tools/nvcc
cp tools/nvcc /usr/local/bin/nvcc
- 构建并且安装pytorch
- 构建并且安装cuda_mock
- 注意要在import torch之后import cuda_mock
- 开始跑你的训练脚本
- 我们将会把堆栈打印到控制台
收集统计xpu runtime 内存分配信息/xpu_wait
调用堆栈
-
打印
xpu_malloc
调用序列,统计实时内存使用情况以及历史使用的峰值内存,排查内存碎片问题 -
打印
xpu_wait
调用堆栈,排查流水中断处问题 -
注意要在
import torch
/import paddle
之后import cuda_mock; cuda_mock.xpu_initialize()
-
使用方法:
import paddle import cuda_mock; cuda_mock.xpu_initialize() # 加入这一行
实现自定义hook函数
-
实现自定义hook installer例子:
class PythonHookInstaller(cuda_mock.HookInstaller): def is_target_lib(self, name): return name.find("libcuda_mock_impl.so") != -1 def is_target_symbol(self, name): return name.find("malloc") != -1 lib = cuda_mock.dynamic_obj(cpp_code, True).appen_compile_opts('-g').compile().get_lib() installer = PythonHookInstaller(lib)
-
实现hook回调接口
PythonHookInstaller
-
构造函数需要传入自定义hook函数的库路径(绝对路径 并且 传入库中必须存在与要替换的函数名字以及类型一致的函数 在hook发生过程中,将会把原函数的地址写入以
__origin_
为开头目标symbol
接口的变量中,方便用户拿到原始函数地址 参考:test/py_test/test_import_mock.py:15
处定义) -
is_target_lib
是否是要hook的目标函数被调用的library -
is_target_symbol
是否是要hook的目标函数名字(上面接口返回True才回调到这个接口) -
new_symbol_name
构造函数中传入共享库中的新的用于替换的函数名字,参数name
:当前准备替换的函数名字 -
dynamic_obj
可以运行时编译c++ code,支持引用所有模块:logger
、statistics
example
python test/test_import_mock.py
debug
export LOG_LEVEL=WARN,TRACE=INFO
环境变量
环境变量 | 用法示例 | 可选值 | 默认值 | 说明 |
---|---|---|---|---|
LOG_LEVEL | export LOG_LEVEL=WARN,TRACE=INFO |
日志级别有:INFO,WARN,ERROR,FATAL, 日志模块有: PROFILE,TRACE,HOOK,PYTHON,LAST | 全局日志级别默认为WARN,各个日志模块的默认日志级别为INFO | 日志级别, 日志模块级别 |
HOOK_ENABLE_TRACE | export HOOK_ENABLE_TRACE='xpuMemcpy=1,xpuSetDevice=0' |
xpuMalloc,xpuFree,xpuWait,xpuMemcpy,xpuSetDevice,xpuCurrentDeviceId | 默认所有接口的的值均为0,即所有接口默认关闭backtrace | 是否开启backtrace |
LOG_OUTPUT_PATH | export LOG_OUTPUT_PATH='/tmp/' |
日志输出文件夹 | - | 是否将日志重定向到文件 |
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for cuda_mock-0.1.9-pp310-pypy310_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0f1cbe78b2684de3a80c29f9bd61735e4297cf13caca338e5231e45139f9f569 |
|
MD5 | 5b03c8f9075abfc275ac1f07f1db568e |
|
BLAKE2b-256 | 6c1e28d36eb62987333b4993e2d8e1d1b3ce31ca3e8e14296af0e82aa91704db |
Hashes for cuda_mock-0.1.9-pp39-pypy39_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5b2127fec40f3f92d5587482b903be91d7eefa33c30a799ca3e7a96c33498824 |
|
MD5 | ff971a1e71892d0e25099a9cdf6bed0d |
|
BLAKE2b-256 | 53d570e2e6d18b783ee6ea5b5495e373aa08d4f92e125f14f54dce06de5252ba |
Hashes for cuda_mock-0.1.9-pp38-pypy38_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c3f46e203c8ea177e66dd622168a234c15f11ba0476b6081904177890d9981ab |
|
MD5 | 6696775436f11493fad6ac3b8626cd18 |
|
BLAKE2b-256 | 84640e7d96db01eafd8194692e5c188137fd0860fb49e2bd823cb2b949715d95 |
Hashes for cuda_mock-0.1.9-pp37-pypy37_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f0b94f607f9d39299512a74d786f97da7167df0c952444042dd832c59e458a24 |
|
MD5 | 81ba8a08d2af6d5c38daf5b2f65a4e71 |
|
BLAKE2b-256 | f5c75aedab2944fb9db03a34f872bba1abd07ed9ac2068927a402eb7a2006bcb |
Hashes for cuda_mock-0.1.9-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 544254d04a0465a6483cfa6a36729f3b3658f38a0894bcd263420dd6d3d92c56 |
|
MD5 | c75d20b421bf5e481c0db45acd1ce781 |
|
BLAKE2b-256 | 2e659c2b96b54e71db93baeb465aeaa5bf6222c8df2ba150de1846dba68f79ea |
Hashes for cuda_mock-0.1.9-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | be4805478a826b0afd2a6504e6dbb820eee3e49b22fafc68c6ee501e2490863c |
|
MD5 | ef6f8df5efd75e68c131b7cf6cec4e6f |
|
BLAKE2b-256 | f9096b303b5b418ec7f00d67b1b750f86ed362d12a3396b562237730a6cc58c5 |
Hashes for cuda_mock-0.1.9-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ff38d46621baf598e063076a84fffa7d7e3ee5a90d77b34182fbef89ef6b6ad3 |
|
MD5 | 84eb963e9d8e77ee060631550230dad5 |
|
BLAKE2b-256 | f29f41cd494f9c7f6ae67aa50f027128caae223e142a5694e450a152d4b13658 |
Hashes for cuda_mock-0.1.9-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ecca9b9769e22676db770fa883e9667e81f0c55a44082ed7830ec12c4bf7d508 |
|
MD5 | 2da47fe297e1ce4c36c751cbc8fcec16 |
|
BLAKE2b-256 | 384a1f1635b9651bc53abee2efb8ef421346ae6cf1ed5213957a50589cf993c6 |
Hashes for cuda_mock-0.1.9-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8d83099993438aa7bf16e7c8fe57864031a60a6d0dd90c253cba3ff0ff2265b2 |
|
MD5 | 8f9ce23f7e331874aa80f2ed978e918a |
|
BLAKE2b-256 | 6dbc6bbe5731b8b070990177cc2385c2b016bb2469462547f9f1c4462895e6c0 |
Hashes for cuda_mock-0.1.9-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a0468a69a8b070355cac3d7b0c84a2df30ab00d010fc6e15d9af8b4a26c30224 |
|
MD5 | 7608ef9d338572b950a08cb088c7a39d |
|
BLAKE2b-256 | 32d6acc62822fad4e5528d4d7223575c1379122cd4bfb4ed0da3009d95257311 |
Hashes for cuda_mock-0.1.9-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c3951d533c7c9fafcc7f50d115070f9f19c24ddc1c454472fba295377334c4b9 |
|
MD5 | 90dcc91cd59d33bb67cf8665774efd4e |
|
BLAKE2b-256 | 16b0d0c5f2dedbe9fd0a1502b7ed4e1d5b3f735afd200b4f05240c2403bd13c7 |
Hashes for cuda_mock-0.1.9-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ff930c96b7f2ebf2300b66ffb2430cc431eef870e39c4249d39c600b3ffac2b8 |
|
MD5 | 3afc4c7f4171e4ca006d102fe86b20ba |
|
BLAKE2b-256 | 850303c7e1e139467141de2a21b087fe3914995438c58b0e12dbdc530d4b0daa |