mock cuda runtime api
Project description
The plt hook technology used refers to plthook
mock pytorch cuda runtime interface
-
update submodule
git submodule update --init --recursive
-
build wheel package
python setup.py sdist bdist_wheel
-
direct install
pip install dist/*.whl
collect cuda operator call stack
- find nvcc installed path
which nvcc
- replace nvcc with my nvcc
mv /usr/local/bin/nvcc /usr/local/bin/nvcc_b
chmod 777 tools/nvcc
cp tools/nvcc /usr/local/bin/nvcc
- build and install pytorch
- build and install cuda_mock
- import cuda_mock after import torch
- run your torch train script
- we will dump the stack into console
收集cuda 算子调用堆栈
- 找到nvcc安装路径
which nvcc
- 用我们的nvcc替换系统的nvcc(我们只是在编译选项加了
-g
)
mv /usr/local/bin/nvcc /usr/local/bin/nvcc_b
chmod 777 tools/nvcc
cp tools/nvcc /usr/local/bin/nvcc
- 构建并且安装pytorch
- 构建并且安装cuda_mock
- 注意要在import torch之后import cuda_mock
- 开始跑你的训练脚本
- 我们将会把堆栈打印到控制台
收集统计xpu runtime 内存分配信息/xpu_wait
调用堆栈
-
打印
xpu_malloc
调用序列,统计实时内存使用情况以及历史使用的峰值内存,排查内存碎片问题 -
打印
xpu_wait
调用堆栈,排查流水中断处问题 -
注意要在
import torch
/import paddle
之后import cuda_mock; cuda_mock.xpu_initialize()
-
使用方法:
import paddle import cuda_mock; cuda_mock.xpu_initialize() # 加入这一行
或者
import torch import cuda_mock; cuda_mock.xpu_initialize() # 加入这一行
实现自定义hook函数
-
实现自定义hook installer例子:
class PythonHookInstaller(cuda_mock.HookInstaller): def is_target_lib(self, name): return name.find("libcuda_mock_impl.so") != -1 def is_target_symbol(self, name): return name.find("malloc") != -1 lib = cuda_mock.dynamic_obj(cpp_code, True).appen_compile_opts('-g').compile().get_lib() installer = PythonHookInstaller(lib)
-
实现hook回调接口
PythonHookInstaller
-
构造函数需要传入自定义hook函数的库路径(绝对路径 并且 传入库中必须存在与要替换的函数名字以及类型一致的函数 在hook发生过程中,将会把原函数的地址写入以
__origin_
为开头目标symbol
接口的变量中,方便用户拿到原始函数地址 参考:test/py_test/test_import_mock.py:15
处定义) -
is_target_lib
是否是要hook的目标函数被调用的library -
is_target_symbol
是否是要hook的目标函数名字(上面接口返回True才回调到这个接口) -
new_symbol_name
构造函数中传入共享库中的新的用于替换的函数名字,参数name
:当前准备替换的函数名字 -
dynamic_obj
可以运行时编译c++ code,支持引用所有模块:logger
、statistics
example
python test/test_import_mock.py
debug
export LOG_LEVEL=WARN,TRACE=INFO
环境变量
环境变量 | 用法示例 | 可选值 | 默认值 | 说明 |
---|---|---|---|---|
LOG_LEVEL | export LOG_LEVEL=WARN,TRACE=INFO |
日志级别有:INFO,WARN,ERROR,FATAL, 日志模块有: PROFILE,TRACE,HOOK,PYTHON,LAST | 全局日志级别默认为WARN,各个日志模块的默认日志级别为INFO | 日志级别, 日志模块级别 |
HOOK_ENABLE_TRACE | export HOOK_ENABLE_TRACE='xpuMemcpy=1,xpuSetDevice=0' |
xpuMalloc,xpuFree,xpuWait,xpuMemcpy,xpuSetDevice,xpuCurrentDeviceId | 默认所有接口的的值均为0,即所有接口默认关闭backtrace | 是否开启backtrace |
LOG_OUTPUT_PATH | export LOG_OUTPUT_PATH='/tmp/' |
日志输出文件夹 | - | 是否将日志重定向到文件 |
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for cuda_mock-0.1.10-pp310-pypy310_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2df540f9d1330c2f0c7d895529a5aab1881608ae86107f716b921aacf9df321e |
|
MD5 | 6253d44029dac6cc8bc56aa74db7b8ea |
|
BLAKE2b-256 | 0a7c214a798126155426b4c3141faca35450b0431bafb3a81ab7dcc6db70c3a1 |
Hashes for cuda_mock-0.1.10-pp39-pypy39_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 613951f242d65fc1e155c2efb6380c1dce8790df2e4a1b789b75dc515fb034a4 |
|
MD5 | c61b9c87d27ba2cf1b5bbdde671f5608 |
|
BLAKE2b-256 | 9f4fe53ffc11775ea5eee1ea0cf9abf634f8a76f32b3f728ad4d862ce6788059 |
Hashes for cuda_mock-0.1.10-pp38-pypy38_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 66b105e2854622480344b6c13f215a15b88f1ff2febd4b851450ee73d249fa10 |
|
MD5 | 1b3f760921565a8c0a3ef19c0884903b |
|
BLAKE2b-256 | 8e854e8b9d51d98f10dbd4e4f482723e496166fc93eefb4a3f3f3ad2c343d384 |
Hashes for cuda_mock-0.1.10-pp37-pypy37_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f7a6d039e8f0b3419d7fa4119fc6e0f6ae6b65ef7df2c04ae1cc67788ecdfa0b |
|
MD5 | b0ebf5ccee717e5fbc5e6b1ef8ceaf05 |
|
BLAKE2b-256 | 762d0e8786746e3dccf01b8ae21d8e3381bdab4f62c53066a046e52cf7764c10 |
Hashes for cuda_mock-0.1.10-cp313-cp313t-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 00e54d7549527c8dda80720b95bedcd73df3e221319d887455c9d05a3ba85364 |
|
MD5 | 7fff1925352fd492069395f4a44de078 |
|
BLAKE2b-256 | 1b80fb5c4c483a9d0959536c5cc21e6e2ef409e5b5f16cdf186cfe07da932526 |
Hashes for cuda_mock-0.1.10-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 84240f32abfdf1d9edefdbb6994d6ece65e14d79934e957f6d750592506b08af |
|
MD5 | 5c2c43bbd5a1094f6270eab35192bc83 |
|
BLAKE2b-256 | c590dfff9b0a572e8f254c215126b3dc02f760a6903c91d74afd00ca95b9f246 |
Hashes for cuda_mock-0.1.10-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a9f5ec1a2dfc8a4dbbadd6aed26d24026efcbca4cace669834161a3b0152ee44 |
|
MD5 | e6e531a84318ea524c0eb6bb5317fcab |
|
BLAKE2b-256 | 7dade5b9ea0a11f4377f1fc6ca1c78e0333f224e800e4bf8ac12666812604d31 |
Hashes for cuda_mock-0.1.10-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 76c8ca60ba4faf7dcc9e9668344e1fb0d6ec81d7faa437c208603dc13d786beb |
|
MD5 | bd3775ca404e767213dfb2d66af691b3 |
|
BLAKE2b-256 | 407c633580a7988c633b30434f4486ccb323dc1c7cb444bc238b4858000895c2 |
Hashes for cuda_mock-0.1.10-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | fd87ea165bfd7dcbb455856dcf74e5a5aab274e34c552749758392de335b63f0 |
|
MD5 | b4409e9e447d9cd1bf19036393e18212 |
|
BLAKE2b-256 | 1b6bfed4ff6bbdb1c6e4cec9f4eb5ae7f868db32e84c235b40a7b42ddd6c8362 |
Hashes for cuda_mock-0.1.10-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 63b1b1cb71ff89c5e6b8ddfdf4002bec636acbeb1426004d653e0397cfeab0ff |
|
MD5 | d2956d26e90bb4fdc45a79de471e79e1 |
|
BLAKE2b-256 | cf5c2ba9386f0719080d30210133a1f562ff4e5d7578ef871bf4bd7bfe33eaa7 |
Hashes for cuda_mock-0.1.10-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f7879f4bd1affc54cef56567c597bd796bbe7865367093f07d559771e24f266d |
|
MD5 | 9afa9cfbf59e5c6dd5b6f8abee78c1da |
|
BLAKE2b-256 | ac48ec1e68fc26a0f69ce44144b90ea814c912b7a560721387f0bce0b9cba9d1 |
Hashes for cuda_mock-0.1.10-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 341623e19e95e3620cffe6966a88452686dfd1f0ee1fe229fe48d0d6fd8cf235 |
|
MD5 | fd01296181d70f109ec2918b7fb5dd29 |
|
BLAKE2b-256 | 75d9d5d67fd41509b67213c5a0ffc274e29938946e2da032c36aca24a7c15410 |
Hashes for cuda_mock-0.1.10-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | eaaeb31b68f45300210467cafec6c32d62fbe640286693f89f39c6f7dc33fa61 |
|
MD5 | 0e3f7e8fdbecc1f272bff7b12c256c09 |
|
BLAKE2b-256 | ccaaee503118bd26d3682723d9d2181464ef25287213b83e7db1fb1246743d6c |