Skip to main content

Easy Neural Network Experiments with pytorch

Project description

A very lightweight framework on top of PyTorch with full functionality.

Logo

PyPi version YourActionName Actions Status Python versions


  • Introduces two extra data augmentation handles in addition to PyTorch's data transforms.

    • Pooled run that allows to combine multiple datasets without moving from there original locations
    • Data specifications specifying data(and its augmentations) specifications.
  • Introduces two extra multi-processing handles for blazing fast training by extending the easytorch.ETDataset class:

    • Multi-threaded data pre-loading.
    • Disk caching for faster access.
from easytorch import ETDataset

class MyDataset(ETDataset):
    def load_index(self, dataset_name, file):
        """(Optional) Load/Process something and add to diskcache as:
                self.diskcahe.add(file, value)"""
        """This method runs in multiple processes by default"""
    
        self.indices.append([dataset_name, file])

    def __getitem__(self, index):
        dataset_name, file = self.indices[index]
        dataspec = self.dataspecs[dataset_name]
        """(Optional) Retrieve from diskcache as self.diskcache.get(file)"""

        image =  # Todo # Load file/Image. 
        label =  # Todo # Load corresponding label.
        
        # Extra preprocessing, if needed.
        # Apply transforms, if needed.

        return image, label

Installation

  1. pip install --upgrade pip
  2. Install latest pytorch and torchvision from Pytorch
  3. pip install easytorch

Lets start something simple like MNIST digit classification:

from easytorch import EasyTorch, ETTrainer, ConfusionMatrix, ETMeter
from torchvision import datasets, transforms
import torch.nn.functional as F
import torch
from examples.models import MNISTNet

transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.1307,), (0.3081,))
])


class MNISTTrainer(ETTrainer):
    def _init_nn_model(self):
        self.nn['model'] = MNISTNet()

    def iteration(self, batch):
        inputs, labels = batch[0].to(self.device['gpu']).float(), batch[1].to(self.device['gpu']).long()

        out = self.nn['model'](inputs)
        loss = F.nll_loss(out, labels)
        _, pred = torch.max(out, 1)
        
        meter = self.new_meter()
        meter.averages.add(loss.item(), len(inputs))
        meter.metrics['cfm'].add(pred, labels.float())

        return {'loss': loss, 'meter': meter, 'predictions': pred}

    def init_experiment_cache(self):
        self.cache['log_header'] = 'Loss|Accuracy,F1,Precision,Recall'
        self.cache.update(monitor_metric='f1', metric_direction='maximize')

    def new_meter(self):
        return ETMeter(
            cfm=ConfusionMatrix(num_classes=10)
        )


if __name__ == "__main__":
    train_dataset = datasets.MNIST('../data', train=True, download=True, transform=transform)
    val_dataset = datasets.MNIST('../data', train=False, transform=transform)

    dataloader_args = {'train': {'dataset': train_dataset}, 'validation': {'dataset': val_dataset}}
    runner = EasyTorch(phase='train', batch_size=512,
                       epochs=10, gpus=[0], dataloader_args=dataloader_args)
    runner.run(MNISTTrainer)

General use case:

1. Define your trainer

from easytorch import ETTrainer, Prf1a, ETMeter, AUCROCMetrics


class MyTrainer(ETTrainer):

    def _init_nn_model(self):
        self.nn['model'] = NeuralNetModel(out_size=self.args['num_class'])

    def iteration(self, batch):
        """Handle a single batch"""
        """Must have loss and meter"""
        return {'loss': ..., 'meter': ..., 'predictions': ...}

    def new_meter(self):
       return ETMeter(
            num_averages=1,
            prf1a=Prf1a(),
            auc=AUCROCMetrics()
        )

    def init_experiment_cache(self):
        """Will plot Loss in one plot, and Accuracy,F1_score in another."""
        self.cache['log_header'] = 'Loss|Accuracy,F1_score'
        
        """Model selection using validation set if present"""
        self.cache.update(monitor_metric='f1', metric_direction='maximize')
  • Method new_meter() returns ETMeter that takes any implementation of easytorch.meter.ETMetrics. Provided ones:
    • easytorch.metrics.Prf1a() for binary classification that computes accuracy,f1,precision,recall, overlap/IOU.
    • easytorch.metrics.ConfusionMatrix(num_classes=...) for multiclass classification that also computes global accuracy,f1,precision,recall.
    • easytorch.metrics.AUCROCMetrics for binary ROC-AUC score.

2. Define specification for your datasets:

import os

def get_label(x):
    return x.split('_')[0] + '_label.png'

sep = os.sep
MYDATA = {
    'name': 'mydata',
    'data_dir': 'MYDATA' + sep + 'images',
    'label_dir': 'MYDATA' + sep + 'labels',
    'label_getter': get_label
}

MyOTHERDATA = {
    'name': 'otherdata',
    'data_dir': 'OTHERDATA' + sep + 'images',
    'label_dir': 'OTHERDATA' + sep + 'labels',
    'label_getter': get_label
}
  • EasyTorch automatically splits the training data in 'data_dir' as specified (split_ratio, or num_folds in EasyTorch Module as below).
  • One can also provide custom splits(json files with train, validation, test data list) in the directory specified by split_dir in dataspec.
  • One can give a path to a .txt file with path list of images for test(inference) phase in split_dir field of dataspec.
  • Additional options in dataspecs:
    • Load from sub-folders, "sub_folders": ["class0", "class1", ... "class_K"]
    • Load recursively, "recursive": True
    • Filter by an extension, "extension": "png"

3. Entry point (say main.py)

from easytorch import EasyTorch

data_spcifications = [DATA_A, DATA_B]
runner = EasyTorch(data_spcifications,
                   phase="train", batch_size=4, epochs=21,
                   num_channel=1, num_class=2,
                   split_ratio=[0.6, 0.2, 0.2])  # or num_folds=5 (exclusive with split_ratio)

if __name__ == "__main__":
    runner.run(MyTrainer, MyDataset) # To train an individual models for each datasets. 
    runner.run_pooled(MyTrainer, MyDataset) # To train a single model combining both datasets.

Run from the command line:

python main.py -ph train -b 4 -e 21 -spl 0.6 0.2 0.2

Note: directly given(EasyTorch constructor) args precedes command line arguments. See below for a list of default arguments.


All the best! Cheers! 🎉

Cite the following papers if you use this library:

@article{deepdyn_10.3389/fcomp.2020.00035,
	title        = {Dynamic Deep Networks for Retinal Vessel Segmentation},
	author       = {Khanal, Aashis and Estrada, Rolando},
	year         = 2020,
	journal      = {Frontiers in Computer Science},
	volume       = 2,
	pages        = 35,
	doi          = {10.3389/fcomp.2020.00035},
	issn         = {2624-9898}
}

@misc{2202.02382,
        Author       = {Aashis Khanal and Saeid Motevali and Rolando Estrada},
        Title        = {Fully Automated Tree Topology Estimation and Artery-Vein Classification},
        Year         = {2022},
        Eprint       = {arXiv:2202.02382},
}

Feature Higlights:

  • Minimal configuration to setup any simple/complex experiment (Single GPU, DP, and DDP usage).
  • DataHandle that is always available, and decoupled from other modules enabling easy customization (ETDataHandle).
    • Use custom & complex data handling mechanism.
    • Load folder datasets.
    • Load recursively large datasets with multiple threads.
  • Full support to split images into patches and rejoin/merge them to get back the complete prediction image like in U-Net(Usually needed when input images are large, and of different shapes) (Thanks to sparse data loaders).
  • Limit data loading- Limit data to debug the pipeline without moving data from the original place (Thanks to load_limit)
  • Heterogeneous datasets handling-One can use many folders of dataset by just defining dataspecs and use in single experiment(Thanks to pooled run).
  • Automatic k-fold cross validation/Auto dataset split (Example: num_folds=10, or split_ratio=[0.6, 0.2, 0.2])
  • Simple lightweight logger/plotter.
    • Plot: set log_header = 'Loss,F1,Accuracy' to plot in same plot or set log_header = 'Loss|F1,Accuracy' to plot Loss in one plot, and F1,Accuracy in another plot.
    • Logs: all arguments/generated data will be saved in logs.json file after the experiment finishes.
  • Gradient accumulation, automatic logging/plotting, model checkpointing
  • Multiple metrics implementation at easytorch.metrics: Precision, Recall, Accuracy, Overlap, F1, ROC-AUC, Confusion matrix ..more features
  • For advanced training with multiple networks, and complex training steps, click here:
  • Implement custom metrics as here.

Default arguments[default-value]. Easily add custom arguments.

  • -ph/--phase [Required]
    • Which phase to run? 'train' (runs all train, validation, test steps) OR 'test' (runs only test step).
  • -b/--batch_size [4]
  • -ep/--epochs [11]
  • -lr/--learning_rate [0.001]
  • -gpus/--gpus [0]
    • List of gpus to be used. Eg. [0], [1], [0, 1]
  • -nw/--num_workers [0]
    • Number of workers for data loading so that cpu can keep-up with GPU speed when loading mini-batches.
  • -lim/--load-limit[None]
    • Specifies a limit on images/files to load for debug purpose for pipeline debugging.
  • -nf/--num_folds [None]
    • Number of folds in k-fold cross validation(Integer value like 5, 10).
  • -spl/--split_ratio [None]
    • Split ratio for train, validation, test set if two items given| train, test if three items given| train only if one item given.
  • ...see more (ddp args)

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

easytorch-3.4.9.tar.gz (42.6 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page