mock cuda runtime api
Project description
The plt hook technology used refers to plthook
mock pytorch cuda runtime interface
-
update submodule
git submodule update --init --recursive
-
build wheel package
python setup.py sdist bdist_wheel
-
direct install
pip install dist/*.whl
collect cuda operator call stack
- find nvcc installed path
which nvcc
- replace nvcc with my nvcc
mv /usr/local/bin/nvcc /usr/local/bin/nvcc_b
chmod 777 tools/nvcc
cp tools/nvcc /usr/local/bin/nvcc
- build and install pytorch
- build and install cuda_mock
- import cuda_mock after import torch
- run your torch train script
- we will dump the stack into console
收集cuda 算子调用堆栈
- 找到nvcc安装路径
which nvcc
- 用我们的nvcc替换系统的nvcc(我们只是在编译选项加了
-g
)
mv /usr/local/bin/nvcc /usr/local/bin/nvcc_b
chmod 777 tools/nvcc
cp tools/nvcc /usr/local/bin/nvcc
- 构建并且安装pytorch
- 构建并且安装cuda_mock
- 注意要在import torch之后import cuda_mock
- 开始跑你的训练脚本
- 我们将会把堆栈打印到控制台
收集统计xpu runtime 内存分配信息/xpu_wait
调用堆栈
-
打印
xpu_malloc
调用序列,统计实时内存使用情况以及历史使用的峰值内存,排查内存碎片问题 -
打印
xpu_wait
调用堆栈,排查流水中断处问题 -
注意要在
import torch
/import paddle
之后import cuda_mock; cuda_mock.xpu_initialize()
-
使用方法:
import paddle import cuda_mock; cuda_mock.xpu_initialize() # 加入这一行
-
关闭打印backtrace(获取backtrace性能下降比较严重)
export HOOK_DISABLE_TRACE='xpuMemcpy=0,xpuSetDevice=0'
实现自定义hook函数
-
实现自定义hook installer例子:
class PythonHookInstaller(cuda_mock.HookInstaller): def is_target_lib(self, name): return name.find("libcuda_mock_impl.so") != -1 def is_target_symbol(self, name): return name.find("malloc") != -1 lib = cuda_mock.dynamic_obj(cpp_code, True).appen_compile_opts('-g').compile().get_lib() installer = PythonHookInstaller(lib)
-
实现hook回调接口
PythonHookInstaller
-
构造函数需要传入自定义hook函数的库路径(绝对路径 并且 传入库中必须存在与要替换的函数名字以及类型一致的函数 在hook发生过程中,将会把原函数的地址写入以
__origin_
为开头目标symbol
接口的变量中,方便用户拿到原始函数地址 参考:test/py_test/test_import_mock.py:15
处定义) -
is_target_lib
是否是要hook的目标函数被调用的library -
is_target_symbol
是否是要hook的目标函数名字(上面接口返回True才回调到这个接口) -
new_symbol_name
构造函数中传入共享库中的新的用于替换的函数名字,参数name
:当前准备替换的函数名字 -
dynamic_obj
可以运行时编译c++ code,支持引用所有模块:logger
、statistics
example
python test/test_import_mock.py
debug
export LOG_LEVEL=WARN,TRACE=INFO
环境变量
环境变量 | 用法示例 | 可选值 | 默认值 | 说明 |
---|---|---|---|---|
LOG_LEVEL | export LOG_LEVEL=WARN,TRACE=INFO |
日志级别有:INFO,WARN,ERROR,FATAL, 日志模块有: PROFILE,TRACE,HOOK,PYTHON,LAST | 全局日志级别默认为WARN,各个日志模块的默认日志级别为INFO | 日志级别, 日志模块级别 |
HOOK_DISABLE_TRACE | export HOOK_DISABLE_TRACE='xpuMemcpy=0,xpuSetDevice=0' |
xpuMalloc,xpuFree,xpuWait,xpuMemcpy,xpuSetDevice,xpuCurrentDeviceId | 默认所有接口的的值均为1,即所有接口默认关闭backtrace | 是否关闭backtrace |
LOG_OUTPUT_PATH | export LOG_OUTPUT_PATH='cuda_mock.log' |
文件路径 | - | 是否将日志重定向到文件 |
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for cuda_mock-0.1.7-pp310-pypy310_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 84defae96d7437f88f2ec98937ef92049045fa01687a5e013e19ae114d719265 |
|
MD5 | 4aa0726fc51ab85783a8527b77d1a19a |
|
BLAKE2b-256 | e0f829708ebcfc3ff6f013359e3dd98690b27332881c24eee03b1ce0a0ff7816 |
Hashes for cuda_mock-0.1.7-pp39-pypy39_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e7df5e4412710e7fc3a2ff838d4e0f87116a2232d45344f267981db90c41d9f2 |
|
MD5 | ded1213ec1e5167e1809dc5338354f80 |
|
BLAKE2b-256 | 2f35aa2d5a34f61274bdc512df038e120aa15b57a6a3bd96860e21af4d4ca2d5 |
Hashes for cuda_mock-0.1.7-pp38-pypy38_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e45956df4a263d432202b7e2cd59faa5541222235658ca3086234d597ca5bc74 |
|
MD5 | 0dad24b85f0751e3dc313076a30f1aac |
|
BLAKE2b-256 | 785865b08315faba60439c1e1910607fdeaac4c79e7f0ef42af07b5c87b350ea |
Hashes for cuda_mock-0.1.7-pp37-pypy37_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 269ef35e36e81fda28e9c1a9b88262dd341334aa54fc103c610b9f3ea3eaa948 |
|
MD5 | c7e6b8cb96eb9ec82f35fee752e31de8 |
|
BLAKE2b-256 | add4370c340542ac44ea3e3822602725938ae3d7024c9f34f47683ca68f9c598 |
Hashes for cuda_mock-0.1.7-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 553fd363d86a4942e4ba02f202c0d0a824a84e1e95eeef8563d10b6a46834097 |
|
MD5 | 886adafe9ec2e5e1dcca3c39b40f5c32 |
|
BLAKE2b-256 | 420cced9b91b8e2a1f723cc618a9a7357df2a7216746970153425147c80e15af |
Hashes for cuda_mock-0.1.7-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 44a6e5afd42559a3ff18fe3b64cbad0c0526235e1c571e9b835f271a45613993 |
|
MD5 | c17534399ac64ecca77273b95db8ac0d |
|
BLAKE2b-256 | ff08afc81dfec042983f6ccd3d382172b8c0f5784c4c92f8f3d948aedc2ddc88 |
Hashes for cuda_mock-0.1.7-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7ff53fc37b3edce6b2660bb159b6f2731d898ef4a3b33f91fba8b60da2ae5d64 |
|
MD5 | c9e7e0e1f9d82d6c0c357ca5d0b098eb |
|
BLAKE2b-256 | 89b5d8788eaa86d912779daf3a9af4cee6d3ab2673912417607fe266b17e6262 |
Hashes for cuda_mock-0.1.7-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 25766e8aad10d0c0b1797803d081044c3564f00ce9896a78c9cb57df68e15fd0 |
|
MD5 | 150488162c92a9f3a8cb03c44fbdebf0 |
|
BLAKE2b-256 | 7b4555863aed3071f4079a51e9fde559d9131c1d3dd0a8a161af58da3a7d3bac |
Hashes for cuda_mock-0.1.7-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6bbefe54cd40038275a180cf13177361d01f66167ac91df245974958d4a39e87 |
|
MD5 | 2ab5fd06ef25d5d4e67d999437b645c9 |
|
BLAKE2b-256 | 5eb2a5c57cbfd085ddb32d701be40fd9bf15e9d1399f597b3353b6fd2f837b53 |
Hashes for cuda_mock-0.1.7-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2c2245b8f139939133f0ee57091ddfa215123a9e2925674cf046bded5ca250f2 |
|
MD5 | 9cb4cfe7655c291543b8bfc0713fa4fb |
|
BLAKE2b-256 | 8682c11f7269877713065c23780c99b1f07219542ff1687ab719cdb55338c8eb |
Hashes for cuda_mock-0.1.7-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 38904736e2f19c6144ace3f84b7c4b159c522b8938d12940a1e29016bf8eb2cc |
|
MD5 | 2f685aa4f4010fb543502ebf0a2dfc7d |
|
BLAKE2b-256 | c3017274e30b182db9cf4ee7cc9e0e3fb4cd6099facca0ceeb0b166b27e62eb5 |