a tools hook some api call at runtime
Project description
The plt hook technology used refers to plthook
mock pytorch cuda runtime interface
-
update submodule
git submodule update --init --recursive
-
build wheel package
python setup.py sdist bdist_wheel
-
direct install
pip install dist/*.whl
collect cuda operator call stack
- find nvcc installed path
which nvcc
- replace nvcc with my nvcc
mv /usr/local/bin/nvcc /usr/local/bin/nvcc_b
chmod 777 tools/nvcc
cp tools/nvcc /usr/local/bin/nvcc
- build and install pytorch
- build and install cuda_mock
- import cuda_mock after import torch
- run your torch train script
- we will dump the stack into console
收集cuda 算子调用堆栈
- 找到nvcc安装路径
which nvcc
- 用我们的nvcc替换系统的nvcc(我们只是在编译选项加了
-g
)
mv /usr/local/bin/nvcc /usr/local/bin/nvcc_b
chmod 777 tools/nvcc
cp tools/nvcc /usr/local/bin/nvcc
- 构建并且安装pytorch
- 构建并且安装cuda_mock
- 注意要在import torch之后import cuda_mock
- 开始跑你的训练脚本
- 我们将会把堆栈打印到控制台
收集统计xpu runtime 内存分配信息/xpu_wait
调用堆栈
-
打印
xpu_malloc
调用序列,统计实时内存使用情况以及历史使用的峰值内存,排查内存碎片问题 -
打印
xpu_wait
调用堆栈,排查流水中断处问题 -
注意要在
import torch
/import paddle
之后import cuda_mock; cuda_mock.xpu_initialize()
-
使用方法:
import paddle import cuda_mock; cuda_mock.xpu_initialize() # 加入这一行
-
关闭打印backtrace(获取backtrace性能下降比较严重)
export HOOK_DISABLE_TRACE='xpuMemcpy=0,xpuSetDevice=0'
example
python test/test_import_mock.py
debug
export LOG_LEVEL=WARN,TRACE=INFO
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
No source distribution files available for this release.See tutorial on generating distribution archives.
Built Distributions
Close
Hashes for cuda_mock-0.1.0-pp310-pypy310_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c2410fde805869f13b7d7d58dfcbe6b138136dd97e96ff81074a871ff44ca81b |
|
MD5 | b5868adcd8cd1b87a8e5ef4078cb8107 |
|
BLAKE2b-256 | 9507c073445bc33ef7e5637d8f370e97b496bd28320e8807b4407e6de7aa3735 |
Close
Hashes for cuda_mock-0.1.0-pp39-pypy39_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0072286d1f4b0ec3bcfa165916c7dd79e02c6b166547caf38651362fdbeabe65 |
|
MD5 | d1b578d10d0095994222b67c337a8e71 |
|
BLAKE2b-256 | 29a7d4bea44bdc52ac2be1a4f4e566e1955770237b37d42d845db7987981eca3 |
Close
Hashes for cuda_mock-0.1.0-pp38-pypy38_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | dba0200c36c5f8bd01ddbd3e931c0d8ba462edd31ac52cea772815d8a8f2a99b |
|
MD5 | b3a170a703a05c224d9d991cdb85a73c |
|
BLAKE2b-256 | 333a42389f2e37e8006318dd47ca81cb26ce257c783e117cf6661e666510459a |
Close
Hashes for cuda_mock-0.1.0-pp37-pypy37_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 05bd5fad1d90d73efcfc04d0876d37b80db750147f17099e2e8ed95682a32d2b |
|
MD5 | 17df0303284318fb1b84547aa62c895e |
|
BLAKE2b-256 | dce4dc3886ebd304cf2e5f461d52d8201db01ebd28e0d439bf6c45e6fb99eb1b |
Close
Hashes for cuda_mock-0.1.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 405394466aa79aabdf173c880ea32ce8feb104124494b70dd18b47e9be79c211 |
|
MD5 | ce37ac70b35daffd167e638ebf7e143c |
|
BLAKE2b-256 | f670f66399604fb4567e2221e83d8b69a0b2fb3fea4b81e419f92a517d2f63b1 |
Close
Hashes for cuda_mock-0.1.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 269c926c5d593608aac88eb22f5310a1ee7f0ebc7a908b114cec4c3ec0937f13 |
|
MD5 | 54ce0716b9fa0eb5490372e952098e6a |
|
BLAKE2b-256 | 21c4ee70ed0536c69a3333dd192d2b3f40f26e59cce7a02dc2c7ea2089e3c035 |
Close
Hashes for cuda_mock-0.1.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2a232785dfcb64e302f8b7c9ddd9f8726a4567066f344c9e50adf66efdd01ca1 |
|
MD5 | c09e61024f56bef0740e7e2f8aa033fb |
|
BLAKE2b-256 | 7d6f2a327abed1dd67163670b282042e36d6e0dbc1ba02740da383243912803d |
Close
Hashes for cuda_mock-0.1.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4b062ff5665387da353a6acdc5c19f5cbf2dcb87ebc3e503a03f810ed792c9cf |
|
MD5 | 4f09aa25f9acfc5adcf92fb0d677d471 |
|
BLAKE2b-256 | 6f95085d36b526c7a618cd63f2d3b704bdfa7cdbb0de924b70c192300e2695e4 |
Close
Hashes for cuda_mock-0.1.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5e4f852fe28f1417f29cd3cd0608af5d1d36d084bf364b3bdfe18c16f016ea49 |
|
MD5 | 6b4be776d2a06bf26b4f55818ccbf69a |
|
BLAKE2b-256 | 9a25d8aad12154455888f96c56c7dba5c5296dc551229dd9409edb794fb5673a |
Close
Hashes for cuda_mock-0.1.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4ef98d69cbc5b6d2d0b0298ce6f2948f6aaee7180a58c0e893cbefb936eabdd9 |
|
MD5 | bd5e1a681959de7275080e86be569159 |
|
BLAKE2b-256 | 9aa134328991a04e3148b84b5ac3436b1074c426f51980a647617971ec68500a |
Close
Hashes for cuda_mock-0.1.0-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c0686414e389670c643ab215f6c8d8c292f778de46baa6e5abcdfac83e36d4e9 |
|
MD5 | 5e656f61012762c4f1554dc3a224ff33 |
|
BLAKE2b-256 | 33c85a7fde2af8ea4c5897ab97ff71fdfee4a8300d29c73945a213e4bd197ed9 |