mock cuda runtime api
Project description
The plt hook technology used refers to plthook
mock pytorch cuda runtime interface
-
update submodule
git submodule update --init --recursive
-
build wheel package
python setup.py sdist bdist_wheel
-
direct install
pip install dist/*.whl
collect cuda operator call stack
- find nvcc installed path
which nvcc
- replace nvcc with my nvcc
mv /usr/local/bin/nvcc /usr/local/bin/nvcc_b
chmod 777 tools/nvcc
cp tools/nvcc /usr/local/bin/nvcc
- build and install pytorch
- build and install cuda_mock
- import cuda_mock after import torch
- run your torch train script
- we will dump the stack into console
收集cuda 算子调用堆栈
- 找到nvcc安装路径
which nvcc
- 用我们的nvcc替换系统的nvcc(我们只是在编译选项加了
-g
)
mv /usr/local/bin/nvcc /usr/local/bin/nvcc_b
chmod 777 tools/nvcc
cp tools/nvcc /usr/local/bin/nvcc
- 构建并且安装pytorch
- 构建并且安装cuda_mock
- 注意要在import torch之后import cuda_mock
- 开始跑你的训练脚本
- 我们将会把堆栈打印到控制台
收集统计xpu runtime 内存分配信息/xpu_wait
调用堆栈
-
打印
xpu_malloc
调用序列,统计实时内存使用情况以及历史使用的峰值内存,排查内存碎片问题 -
打印
xpu_wait
调用堆栈,排查流水中断处问题 -
注意要在
import torch
/import paddle
之后import cuda_mock; cuda_mock.xpu_initialize()
-
使用方法:
import paddle import cuda_mock; cuda_mock.xpu_initialize() # 加入这一行
-
关闭打印backtrace(获取backtrace性能下降比较严重)
export HOOK_DISABLE_TRACE='xpuMemcpy=0,xpuSetDevice=0'
实现自定义hook函数
-
实现自定义hook installer例子:
class PythonHookInstaller(cuda_mock.HookInstaller): def is_target_lib(self, name): return name.find("libcuda_mock_impl.so") != -1 def is_target_symbol(self, name): return name.find("malloc") != -1 lib = cuda_mock.dynamic_obj(cpp_code, True).appen_compile_opts('-g').compile().get_lib() installer = PythonHookInstaller(lib)
-
实现hook回调接口
PythonHookInstaller
-
构造函数需要传入自定义hook函数的库路径(绝对路径 并且 传入库中必须存在与要替换的函数名字以及类型一致的函数 在hook发生过程中,将会把原函数的地址写入以
__origin_
为开头目标symbol
接口的变量中,方便用户拿到原始函数地址 参考:test/py_test/test_import_mock.py:15
处定义) -
is_target_lib
是否是要hook的目标函数被调用的library -
is_target_symbol
是否是要hook的目标函数名字(上面接口返回True才回调到这个接口) -
new_symbol_name
构造函数中传入共享库中的新的用于替换的函数名字,参数name
:当前准备替换的函数名字 -
dynamic_obj
可以运行时编译c++ code,支持引用所有模块:logger
、statistics
example
python test/test_import_mock.py
debug
export LOG_LEVEL=WARN,TRACE=INFO
环境变量
环境变量 | 用法示例 | 可选值 | 默认值 | 说明 |
---|---|---|---|---|
LOG_LEVEL | export LOG_LEVEL=WARN,TRACE=INFO |
日志级别有:INFO,WARN,ERROR,FATAL, 日志模块有: PROFILE,TRACE,HOOK,PYTHON,LAST | 全局日志级别默认为WARN,各个日志模块的默认日志级别为INFO | 日志级别, 日志模块级别 |
HOOK_DISABLE_TRACE | export HOOK_DISABLE_TRACE='xpuMemcpy=0,xpuSetDevice=0' |
xpuMalloc,xpuFree,xpuWait,xpuMemcpy,xpuSetDevice,xpuCurrentDeviceId | 默认所有接口的的值均为1,即所有接口默认关闭backtrace | 是否关闭backtrace |
LOG_OUTPUT_PATH | export LOG_OUTPUT_PATH='cuda_mock.log' |
文件路径 | - | 是否将日志重定向到文件 |
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for cuda_mock-0.1.8-pp310-pypy310_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1ccdc30bc25fcee94f38dc10f39221a956af5d4c4904b296d5e609b46eb88719 |
|
MD5 | fcb877e127555950f769125137502179 |
|
BLAKE2b-256 | acbe1c1731ed43da17a82d0fcaaf36ae21c2eedb8dad083c6775b2517a7b19a5 |
Hashes for cuda_mock-0.1.8-pp39-pypy39_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d5769cbf46d473108e494f3131ef6b73a21ba39a66ea1965e8ad634b450a97b0 |
|
MD5 | 1a78b9b2c2ddc4b85ad3556cbaa6ba9b |
|
BLAKE2b-256 | 97281f32e31058fd22a63f25ae0ae98a7d3313aa92e2d05188138db63d24f006 |
Hashes for cuda_mock-0.1.8-pp38-pypy38_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | dd52a958e6ea4f02dbe2eb37f1aa023fb726535258f062b2493f06722729f016 |
|
MD5 | bc777fe2eeef07e9e194eae2e7c6cc2c |
|
BLAKE2b-256 | 8cb2f2aeeb7f332e1d66defe27f9d5892827161411baf09a1b801e284bbaadca |
Hashes for cuda_mock-0.1.8-pp37-pypy37_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 50c4d00d44a6696f6fe62656b4eef5a93b82b4e82ee9a8a927bd240266a222cf |
|
MD5 | 6e06f5c481758231bf42164c6f670a03 |
|
BLAKE2b-256 | e8227935b60fed0765aa09fad0228beedf2ffe9093ded27b7d22ddc377b0daa3 |
Hashes for cuda_mock-0.1.8-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c032dc4961589f076fad39abd70c5d3cb34158b522528fbb73d1ba347c66e41c |
|
MD5 | 8ac20cf8c2e021ccad25ec68a94a45de |
|
BLAKE2b-256 | 7f7b454e0d2a28f5e8738ac4943a499b33163e09a4b60e95cfb1535c5cc4e72b |
Hashes for cuda_mock-0.1.8-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c7f8111d5514bc634ec62533c2dbb0143425ff3277fd146176aac493ca625f64 |
|
MD5 | 21b61d352c7da14391bf3b698874acb4 |
|
BLAKE2b-256 | 3e4b0b5279c8a33ca7427642e0013a2c688183145d33b705f2b5c44142788c44 |
Hashes for cuda_mock-0.1.8-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7038141c8e07bc57cc96afc342550fa9e63a678a51d91d1f57cd468be376d810 |
|
MD5 | 211035faa04afbea2fc482e2e41305b8 |
|
BLAKE2b-256 | b850c41a525620b3ecbdf2823fa88a9f7c6e490c4418425810473b6e7091367e |
Hashes for cuda_mock-0.1.8-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9d28f6540972c16aa12676a8555cb112e38bc574e19ce3ad975ff1dc29b05cf8 |
|
MD5 | f5110593f413562e678758fdceadb119 |
|
BLAKE2b-256 | 37deef379c3bac000d77807e4ae5a8c4b85ba8bbc59f0d13cfc1a3b17e620068 |
Hashes for cuda_mock-0.1.8-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d5af0cff497312317f2f5b74e784e2f78ed81c17ef8898e33d81e1f42dfb63ea |
|
MD5 | 320a8c63ff89467bcada655ff48bb218 |
|
BLAKE2b-256 | c26b3975c1bb090064233e0ca0e301d35c864518ebd603e42c0d9b783ac42781 |
Hashes for cuda_mock-0.1.8-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0fe9f582966fcd1d1c2c2f27a32fa75518f77549bead395c7bbb4650777bbd8b |
|
MD5 | de4e57dd2d9623fdec5e4ece6773d8d9 |
|
BLAKE2b-256 | 96bb767cb440e8ff268a0dc51a07a219c8636ba3d2526253e350f5eaefdbbbd8 |
Hashes for cuda_mock-0.1.8-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 25c2da6a4f90386ac3e9b7885b4cba8d7556e2387af7c03df782460e0944cb58 |
|
MD5 | 557b11d9c679c11f42d486ff02180aa8 |
|
BLAKE2b-256 | 0519c54cd79c3745be0fef9390ff0a45fd628d012535a2a2ef0d5694ed21e98b |