Skip to main content

TorchWrapper is a deep learning helper.

Project description

TensorWrapper

TensorWrapper is a extension library for PyTorch framework. It aims to supplement a few of common components: newest optimizer, opeartors, utils, drawer, common structure and etc.

Installation

# install 3rd pip depedency.
pip install cython matplotlib opencv-python numpy tensorboard future memory_profiler profilehooks tqdm scipy scikit-image
HOROVOD_GPU_OPERATIONS=NCCL pip install horovod

Distributed Train/Val

Install openmpi

# install openmpi 4.0 version
curl -O -L https://download.open-mpi.org/release/open-mpi/v4.0/openmpi-4.0.1.tar.gz
tar xvzf openmpi-4.0.1.tar.gz
./configure --prefix=/usr/local
make all && sudo make install
export PATH=/usr/local/bin:$PATH

# or via conda
conda install openmpi

Install NCCL

# download nccl library: https://developer.nvidia.com/nccl/nccl-legacy-downloads
# O/S agnostic local installer
# e.g. nccl_2.6.4-1+cuda10.0_x86_64.txz

# or using deb fashion
# https://developer.nvidia.com/compute/machine-learning/nccl/secure/v2.6/prod/nccl-repo-ubuntu1604-2.6.4-ga-cuda10.0_1-1_amd64.deb
sudo apt install libnccl2=2.6.4-1+cuda10.0 libnccl-dev=2.6.4-1+cuda10.0
sudo apt install libnccl2=2.6.4-1+cuda10.1 libnccl-dev=2.6.4-1+cuda10.1
export LD_LIBRARY_PATH=`pwd`/nccl_2.6.4-1+cuda10.0_x86_64/lib:$LD_LIBRARY_PATH

Install Horovod

HOROVOD_GPU_OPERATIONS=NCCL pip install horovod --no-cache-dir

git config --global user.email "atranitell@gmail.com" && git config --global user.name "jk"

Install CMake

# install cmake
# https://cmake.org/files/v3.14/

Train

# demo for verificaiton distributed traning
cd research/Classifier

# execute single node for mnist, note that batch size is set to 128
python Classifier.py --config configs/Classifier_Mnist_LeNet.py

# execute 4 node with 4 gpu, note that batch size should be set to 32
python -m tw.api.launch --np 4 --device cuda python Classifier.py --config configs/Classifier_Mnist_LeNet.py

# monitor the validation result, the test error should be similiar.

Usage

# dist train
python -m tw.api.launch --np 2 --dev cuda python research/classification/Classifier.py --config research/classification/configs/Classifier_ImageNet_AlexNet.py --task train

# dist eval
python -m tw.api.launch --np 2 --dev cuda python research/classification/Classifier.py --config research/classification/configs/Classifier_ImageNet_AlexNet.py --task test

# single train
python research/classification/Classifier.py --config research/classification/configs/Classifier_ImageNet_AlexNet.py --task train

# single eval
python research/classification/Classifier.py --config research/classification/configs/Classifier_ImageNet_AlexNet.py --task test

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tw-3.8.1.tar.gz (283.0 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page