mock cuda runtime api
Project description
The plt hook technology used refers to plthook
mock pytorch cuda runtime interface
-
update submodule
git submodule update --init --recursive
-
build wheel package
python setup.py sdist bdist_wheel
-
direct install
pip install dist/*.whl
collect cuda operator call stack
- find nvcc installed path
which nvcc
- replace nvcc with my nvcc
mv /usr/local/bin/nvcc /usr/local/bin/nvcc_b
chmod 777 tools/nvcc
cp tools/nvcc /usr/local/bin/nvcc
- build and install pytorch
- build and install cuda_mock
- import cuda_mock after import torch
- run your torch train script
- we will dump the stack into console
收集cuda 算子调用堆栈
- 找到nvcc安装路径
which nvcc
- 用我们的nvcc替换系统的nvcc(我们只是在编译选项加了
-g
)
mv /usr/local/bin/nvcc /usr/local/bin/nvcc_b
chmod 777 tools/nvcc
cp tools/nvcc /usr/local/bin/nvcc
- 构建并且安装pytorch
- 构建并且安装cuda_mock
- 注意要在import torch之后import cuda_mock
- 开始跑你的训练脚本
- 我们将会把堆栈打印到控制台
收集统计xpu runtime 内存分配信息/xpu_wait
调用堆栈
-
打印
xpu_malloc
调用序列,统计实时内存使用情况以及历史使用的峰值内存,排查内存碎片问题 -
打印
xpu_wait
调用堆栈,排查流水中断处问题 -
注意要在
import torch
/import paddle
之后import cuda_mock; cuda_mock.xpu_initialize()
-
使用方法:
import paddle import cuda_mock; cuda_mock.xpu_initialize() # 加入这一行
-
关闭打印backtrace(获取backtrace性能下降比较严重)
export HOOK_DISABLE_TRACE='xpuMemcpy=0,xpuSetDevice=0'
实现自定义hook函数
-
实现自定义hook installer例子:
class PythonHookInstaller(cuda_mock.HookInstaller): def is_target_lib(self, name): return name.find("libcuda_mock_impl.so") != -1 def is_target_symbol(self, name): return name.find("malloc") != -1 lib = cuda_mock.dynamic_obj(cpp_code, True).appen_compile_opts('-g').compile().get_lib() installer = PythonHookInstaller(lib)
-
实现hook回调接口
PythonHookInstaller
-
构造函数需要传入自定义hook函数的库路径(绝对路径 并且 传入库中必须存在与要替换的函数名字以及类型一致的函数 在hook发生过程中,将会把原函数的地址写入以
__origin_
为开头目标symbol
接口的变量中,方便用户拿到原始函数地址 参考:test/py_test/test_import_mock.py:15
处定义) -
is_target_lib
是否是要hook的目标函数被调用的library -
is_target_symbol
是否是要hook的目标函数名字(上面接口返回True才回调到这个接口) -
new_symbol_name
构造函数中传入共享库中的新的用于替换的函数名字,参数name
:当前准备替换的函数名字 -
dynamic_obj
可以运行时编译c++ code,支持引用所有模块:logger
、statistics
example
python test/test_import_mock.py
debug
export LOG_LEVEL=WARN,TRACE=INFO
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for cuda_mock-0.1.2-pp310-pypy310_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 707c0b6548dffaaae34bee2029c2845acc237e791881332dd182297495ac4838 |
|
MD5 | 3b7d363aab56f1d4416f9e2c6ef53f8b |
|
BLAKE2b-256 | ede168fe8dea43a39d3efe9b28ef11b8264729c6aada9a2b8adfb48591c80d11 |
Hashes for cuda_mock-0.1.2-pp39-pypy39_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5161ed9e92baa8ceef9529485ff71c3caaa09b754424020a4b234b43c19060a1 |
|
MD5 | 9feaa3a45964fb3c6750b7fd04c1fd83 |
|
BLAKE2b-256 | 4e6d6423ede9dd0129eeb486777f9520c6a0c45fc189ad0183ca1084a0251fa2 |
Hashes for cuda_mock-0.1.2-pp38-pypy38_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1f9706dac15371a843675c604a75c726ebd57cb4f12800f87bfdf107827eff9c |
|
MD5 | 7195dbacdeeed062628a55aaa0117281 |
|
BLAKE2b-256 | e5c5111c8a6576c94fab7c314794cacd9bff009308b1f010a578ff7a232710dc |
Hashes for cuda_mock-0.1.2-pp37-pypy37_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b6cd4769f9628442ea40cf2311ac82151e20c92a7f95139a57b2df1cfb26fb87 |
|
MD5 | d629403e80fe7da505ae4588c5d0c10b |
|
BLAKE2b-256 | a05b1a1c2319574ec3682d8e6b073cf234ceb6982b28ebb207b27786d6891148 |
Hashes for cuda_mock-0.1.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d6f0d862a08f5560d42926c25d800c882869c2a62c28cf30e7f32934dd9497dd |
|
MD5 | e966fc187e25c00fa8f17e2606b8bfde |
|
BLAKE2b-256 | 9782e7325a39155b7520d67534792e54a5a3e3e9e79cef332c317ff50f180cad |
Hashes for cuda_mock-0.1.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | bcd3516e7f79f04ec1c41567d98a3b05d0873168c4b66ba871f384a11df7b364 |
|
MD5 | d488a26405be21a427b4978b21e593e2 |
|
BLAKE2b-256 | b00eab493b2b46dfd4014ac07ba3459570c08e3e444e071348eb1897a2801142 |
Hashes for cuda_mock-0.1.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4cfeefeac94e0c73f0ac25009807b9db3b8f74e1f479c1d8f15c517d9af9494e |
|
MD5 | 4b8bb25904be66a0acb82d9bbecf57f5 |
|
BLAKE2b-256 | 358bb32fb79d76bfd806b02dda0e82261c069dee431cee6fc0a187779bc3dbed |
Hashes for cuda_mock-0.1.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e1b33e169f63fcbf1ec19c15eeca755c211a8958352be15cd302fa773e2e68c7 |
|
MD5 | 217afb4ae52fe90248f4b3caf987dc48 |
|
BLAKE2b-256 | 315df52bc7727abae07505fb6e86fa8913f4c01dcff2b8169e7b274ba708a040 |
Hashes for cuda_mock-0.1.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 174c1bcc4914a9b9ac3930e39cbdb2ba693a67181abd6178e34d6f171b52e31e |
|
MD5 | 2a99d87c7d94fd7c791fd482c5e69ffb |
|
BLAKE2b-256 | 18b4f32c2651e16a711402e64e4fa26ac115bc8e1f7d0eda2710f75074c92a09 |
Hashes for cuda_mock-0.1.2-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ae07aa3d03c04630fca8c755f28f3f063d2f1d18e64570a6e7ce68f33b06e1b8 |
|
MD5 | 5b9cafcd936bed2e9c6d0c2372ac42c5 |
|
BLAKE2b-256 | 44459230c6d01295887701b28027195896693e441252f92db30516244defbb4a |
Hashes for cuda_mock-0.1.2-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0700ed1df85a5fd591e9846f55fd24199d7d43970df3683521a3cb6432355c6f |
|
MD5 | 51ec1b7f5d84c3da83a92bd4be7fa439 |
|
BLAKE2b-256 | fe459102a4072e8e81d87db4bb30b2a11107446dfffd6804017a5901ed4bea86 |