Skip to main content

TorchWrapper is a deep learning helper.

Project description

TensorWrapper

TensorWrapper is a extension library for PyTorch framework. It aims to supplement a few of common components: newest optimizer, opeartors, utils, drawer, common structure and etc.

Installation

# install 3rd pip depedency.
pip install cython matplotlib opencv-python numpy tensorboard future memory_profiler profilehooks tqdm scipy scikit-image
HOROVOD_GPU_OPERATIONS=NCCL pip install horovod

Distributed Train/Val

Install openmpi

# install openmpi 4.0 version
curl -O -L https://download.open-mpi.org/release/open-mpi/v4.0/openmpi-4.0.1.tar.gz
tar xvzf openmpi-4.0.1.tar.gz
./configure --prefix=/usr/local
make all && sudo make install
export PATH=/usr/local/bin:$PATH

# or via conda
conda install openmpi

Install NCCL

# download nccl library: https://developer.nvidia.com/nccl/nccl-legacy-downloads
# O/S agnostic local installer
# e.g. nccl_2.6.4-1+cuda10.0_x86_64.txz

# or using deb fashion
# https://developer.nvidia.com/compute/machine-learning/nccl/secure/v2.6/prod/nccl-repo-ubuntu1604-2.6.4-ga-cuda10.0_1-1_amd64.deb
sudo apt install libnccl2=2.6.4-1+cuda10.0 libnccl-dev=2.6.4-1+cuda10.0
sudo apt install libnccl2=2.6.4-1+cuda10.1 libnccl-dev=2.6.4-1+cuda10.1
export LD_LIBRARY_PATH=`pwd`/nccl_2.6.4-1+cuda10.0_x86_64/lib:$LD_LIBRARY_PATH

Install Horovod

HOROVOD_GPU_OPERATIONS=NCCL pip install horovod --no-cache-dir

git config --global user.email "atranitell@gmail.com" && git config --global user.name "jk"

Install CMake

# install cmake
# https://cmake.org/files/v3.14/
conda create --file environment.yml
sudo apt-get install libsparsehash-dev

Train

# demo for verificaiton distributed traning
cd research/Classifier

# execute single node for mnist, note that batch size is set to 128
python Classifier.py --config configs/Classifier_Mnist_LeNet.py

# execute 4 node with 4 gpu, note that batch size should be set to 32
python -m tw.api.launch --np 4 --device cuda python Classifier.py --config configs/Classifier_Mnist_LeNet.py

# monitor the validation result, the test error should be similiar.

Usage

# dist train
python -m tw.api.launch --np 2 --dev cuda python research/classification/Classifier.py --config research/classification/configs/Classifier_ImageNet_AlexNet.py --task train

# dist eval
python -m tw.api.launch --np 2 --dev cuda python research/classification/Classifier.py --config research/classification/configs/Classifier_ImageNet_AlexNet.py --task test

# single train
python research/classification/Classifier.py --config research/classification/configs/Classifier_ImageNet_AlexNet.py --task train

# single eval
python research/classification/Classifier.py --config research/classification/configs/Classifier_ImageNet_AlexNet.py --task test

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tw-3.11.0.tar.gz (390.5 kB view details)

Uploaded Source

File details

Details for the file tw-3.11.0.tar.gz.

File metadata

  • Download URL: tw-3.11.0.tar.gz
  • Upload date:
  • Size: 390.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.13

File hashes

Hashes for tw-3.11.0.tar.gz
Algorithm Hash digest
SHA256 4e896716556519d5cac461e5d08e198b719742006fdfbb926738948c46d2f8bb
MD5 279bfe5616871e6f4d9d55dad458f64b
BLAKE2b-256 9de1d638bf434a6f0a2ff1522126a1f5c98d155a2ef8290bbca94089b438288b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page