TorchWrapper is a deep learning helper.
Project description
TensorWrapper
TensorWrapper is a extension library for PyTorch framework. It aims to supplement a few of common components: newest optimizer, opeartors, utils, drawer, common structure and etc.
Installation
# install 3rd pip depedency.
pip install cython matplotlib opencv-python numpy tensorboard future memory_profiler profilehooks tqdm scipy scikit-image
HOROVOD_GPU_OPERATIONS=NCCL pip install horovod
Distributed Train/Val
Install openmpi
# install openmpi 4.0 version
curl -O -L https://download.open-mpi.org/release/open-mpi/v4.0/openmpi-4.0.1.tar.gz
tar xvzf openmpi-4.0.1.tar.gz
./configure --prefix=/usr/local
make all && sudo make install
export PATH=/usr/local/bin:$PATH
# or via conda
conda install openmpi
Install NCCL
# download nccl library: https://developer.nvidia.com/nccl/nccl-legacy-downloads
# O/S agnostic local installer
# e.g. nccl_2.6.4-1+cuda10.0_x86_64.txz
# or using deb fashion
# https://developer.nvidia.com/compute/machine-learning/nccl/secure/v2.6/prod/nccl-repo-ubuntu1604-2.6.4-ga-cuda10.0_1-1_amd64.deb
sudo apt install libnccl2=2.6.4-1+cuda10.0 libnccl-dev=2.6.4-1+cuda10.0
sudo apt install libnccl2=2.6.4-1+cuda10.1 libnccl-dev=2.6.4-1+cuda10.1
export LD_LIBRARY_PATH=`pwd`/nccl_2.6.4-1+cuda10.0_x86_64/lib:$LD_LIBRARY_PATH
Install Horovod
HOROVOD_GPU_OPERATIONS=NCCL pip install horovod --no-cache-dir
git config --global user.email "atranitell@gmail.com" && git config --global user.name "jk"
Install CMake
# install cmake
# https://cmake.org/files/v3.14/
Train
# demo for verificaiton distributed traning
cd research/Classifier
# execute single node for mnist, note that batch size is set to 128
python Classifier.py --config configs/Classifier_Mnist_LeNet.py
# execute 4 node with 4 gpu, note that batch size should be set to 32
python -m tw.api.launch --np 4 --device cuda python Classifier.py --config configs/Classifier_Mnist_LeNet.py
# monitor the validation result, the test error should be similiar.
Usage
# dist train
python -m tw.api.launch --np 2 --dev cuda python research/classification/Classifier.py --config research/classification/configs/Classifier_ImageNet_AlexNet.py --task train
# dist eval
python -m tw.api.launch --np 2 --dev cuda python research/classification/Classifier.py --config research/classification/configs/Classifier_ImageNet_AlexNet.py --task test
# single train
python research/classification/Classifier.py --config research/classification/configs/Classifier_ImageNet_AlexNet.py --task train
# single eval
python research/classification/Classifier.py --config research/classification/configs/Classifier_ImageNet_AlexNet.py --task test
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
tw-3.0.1.tar.gz
(151.4 kB
view details)
File details
Details for the file tw-3.0.1.tar.gz
.
File metadata
- Download URL: tw-3.0.1.tar.gz
- Upload date:
- Size: 151.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/3.7.3 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.6.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | abefb631648940989f50a05fb139e862f00577df55f6df7ba353381157ea56dc |
|
MD5 | 713929e396728477bf6d660cefaafc1b |
|
BLAKE2b-256 | 9405522954a3fbdb699a17933b87128d5d55a8b7fc97ba0e55b2ad224c994f86 |