Just Assemble IT! - A LEGO-style & PyTorch-based Deep Learning Library

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

jai - Just Assemble It!

Author: Jia Geng

Email: jxg570@miami.edu | gjia0214@hotmail.com

PyPI: https://pypi.org/project/jai/

Introduction

Deep learning is fun. What not fun is the pipeline digging and rigging. Why can't we just enjoy the process of exploring all kinds of SOTA techniques with interesting dataset instead of wasting our coffee on boring things like implementing the sockets for it.

jai is a LEGO-style PyTorch-based Deep Learning Library. The main idea behind jai is to reduce the amount of time spent on building all sort of pipelines or sockets to plugin those fancy deep learning tricks. This project also tend to create some handy toolkits for Kaggle.

Dev. Plan

Implement anything popped up in my head when I got time and coffee...

Installation

pip install jai

The library is still in early stage. A lot more of functions and tools will be implemented and tested soon and in the future.

Library Walk Through

jai.dataset.py provides abstract dataset classes that inherit the PyTorch DataSet class. The difference is that jai.dataset supports data augmentation and processing.

jai.improc.py provides some handy image processing functions, which can be injected to the jai dataset as image preprocessing functions or to the augmentation classes as data augmentation functions.

jai.augments.py provides the augmentation classes that can be attached to the jai dataset classes. It (will) also provide implementations of some advanced augmentation techniques.

jai.trainer.py provide a trainer class that supports classic PyTorch style deep learning training pipeline. It has some specific requirements on the implementation of the dataset object.

jai.logger.py provides the result/performance logger classes. These logger classes can be attached to the trainer during the training stage and able to export, report all kinds of model performance related metrics.

jai.arch.py (will) provide handy way to modify popular and vanilla deep learning architectures to make the architecture compatible with the jai framework.

jai.kaggler (will) provide data pipelining solutions, toolbox for general or selected Kaggle project development. It will also collect some useful tools/models from the kagglers.

jai.sota (will) provide some state-of-the-art techniques such as optmizers, schedulers, etc. that is compatible with the jai framework.

Things Need to be Prepared before Use (not fully tested)

Learn how to use partial() as it is crucial for this library.
```
from functiontools import partial
```
Prepare/Implement you architecture and loss function. Some examples can be found in jai.kaggler.from_kagglers. Both need to be in torch.nn.Module style. If you only need to use the vanilla architectures, just grab a model from the torchvision.models and loss function from torch. E.g.
```
import torchvision.models as model
import torch.nn as nn

arch = model.resnet18()
loss = nn.CrossEntropyLoss()
```

Implement the dataset class. Some examples can be found in jai.kaggler.kaggle_data. The key thing is to inherit the jai.dataset.JaiDataset class and include the following code at the end of the __getitem()__ method. The JaiDataset constructor can receive two args for preprocessing and augmentation: tsfms= augments=

# do whatever necessary to get the input, and ground truth with input idx
# img_id is not necessary. but if you have it, the logger will be able to collect false classification during
evaluation
# img and t need to be converted to Tensor in correct dimensions
# img dim: CxHxW; y dim: Bx1 (single output) or BxK (multiple output if you need to predict different things) 

(whatever you implemented) ...
-> img_id, img, y  

# prepocess the image 
img = self.prepro(img)

# augment the image during training time
img = self.augment(img)

# The output need to be dictionary as follow
# id can be omit
return {"id": img_id, "x": img, "y": y}

Prepare preprocessing and augmentation. For preprocessing, just use a list to wrap the functions from jai.improc. The list must contain the to_tensor method at the end. The wrapped elements must be functions not the function calls. Most functions only takes an image input. For some functions that takes hyper-parameters, you need to use paritial(func) to specify the hyper-parameters.

E.g.
```
from jai.improc import * 

tsfms = [denoise, partial(threshold, low=15, adaptive_ksize=(13, 13), C=-10), centralize_object, 
         rescale, standardize, to_tensor]
```
For augmentation, create a jai.augments.FuncAugmentator object. The FuncAugmentator takes a starting probability and a max probability for applying augmentation during training time. It also takes an augmentation function that process the image. The func= also only takes in function instead of function call. And the function should only have one required arg, i.e., the input data. Use partial() to wrap the hyper-parameter. jai.augments.AugF (will) provide some advanced augmentation. E.g:
```
from jai.augments import * 

gridmask = FuncAugmentator(p_start=0.1, p_end=1, func=partial(AugF.grid_mask, d1=96, d2=244))
```
Prepare the optimizer and scheduler. The easiest way is just to grab the optimizer and scheduler from PyTorch. You can also implement your own. But make sure use the PyTorch style. Also, prepare them in partial function if you need to specify the hyperparameters!

E.g.
```
from torch.optim.adamw import AdamW
from torch.optim.lr_scheduler import CosineAnnealingLR

optimizer = partial(AdamW, betas=(0.9, 0.999))
scheduler = partial(CosineAnnealingLR, T_max=100)
```
Prepare the jai.dataset.Evaluator. This is for the purpose of generating logs using specified encoding and score system.
- names= is for hashing the predictors
- n_classes= indicates how many possible classes for each predictor.
- criteria= indicates which criteria will be used for caculating scores (precision, recall, accurarcy)
- avg= indicates how to average the scores across different classes (micro, macro)
- weights= is used when you have multiple output node in your models and how do you want to combine the scores for each predictor.
E.g. if your model is trying to predict the type of dog and whether the dog is walked by a human in an image.
```
from jai.dataset import *

# say your training data have 10 types of dog and binary output for whether human in it or not
# you want to use macro precision and more concerned about has_human
evaluator = Evaluator(names=['dog_type', 'has_human'], n_classes=[10, 2], criteria='precision', avg='macro', weights=[1, 2])
```
Prepare the Logger. You need to prepare a clean directory for receiving log files, a prefix string for identifying your trial, and a Evaluator resume=False will tell the lib that you are training a new model so it will create a batch of new log files. resume=True will tell the lib that you are continue training your model, it will write on the old log files keep='one_best and it will only export the best model and overwrite. keep='all_best' will export all encountered best models.

E.g.
```
from jai.logger import *

# keep all best models along the training process
logger = BasicLogger(log_dst, prefix, evaluator,resume=False, keep='all_best')
```

Just Assemble It!

Now we have all we need. Next is just assemble it!

We have

# model
model = model.resnet18()
loss = nn.CrossEntropyLoss()

# dataset
tsfms = [denoise, partial(threshold, low=15, adaptive_ksize=(13, 13), C=-10), centralize_object, rescale, standardize, to_tensor]
gridmask = FuncAugmentator(p_start=0.1, p_end=1, func=partial(AugF.grid_mask, d1=96, d2=244))
dataset = YourJaiDataset(*args, tsfms=tsfms, augments=gridmask)

# optimzer
optimizer = partial(AdamW, betas=(0.9, 0.999))
scheduler = partial(CosineAnnealingLR, T_max=100)

# logger and predictor encoder
class_dict = DataClassDict(names=['dog_type'], n_classes=[10])
logger = BasicLogger(log_dst, prefix, evaluator, resume=False, keep='all_best')

To Train Your Model:

from jai.trainer import *

train_set, eval_set = dataset.split(train_ratio=0.8, seed=2020)  # split to 0.8 : 0.2 with seed 2020
train_loader = DataLoader(train_set, batch_size=32, shuffle=True)
eval_loader = DataLoader(eval_Set, batch_size=32, shuffle=False)
trainer = BasicTrainer(model, optimizer, scheduler)

trainer.initialize()

trainer.train(train_loader, eval_loader, epochs=50, loss_func=loss, logger=logger)

Now you are:

training your deep learning model with AdamW and CosineAnnealing Scheduler
using image preprocessing and the GridMask augmentation
searching for the best model based on the evaluation performance
recording and exporting the training logs such as
- batch loss
- epoch loss and model train/eval accuracy
- confusion matrix of your best model(s)
- export model parameters and optimizer & scheduler state when find better model
- export the best model's failed detection during eval phase

After the training is done. You can do: logger.plot('loss') to check your training progress.

Just Re-Assemble It!

Often you might want to continue the training process. You can do it by

# read all the state dict (find it under your log_dst/model)
model_state = torch.load(model_path)
optimizer_state = torch.load(optimizer_path)
scheduler_state = torch.load(scheduler_path)

# load the check points
trainer.load_model_state(model_state)
trainer.initialize(optimizer_state, scheduler_state)

# prepare a logger with same log_dst but set the resume to True
logger = BasicLogger(log_dst, prefix, evaluator, resume=True, keep='all_best')

# train your model
trainer.train(train_loader, eval_loader, epochs=50, loss_func=loss, logger=logger)

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

This version

0.0.9.81

Mar 30, 2020

0.0.9.8

Mar 30, 2020

0.0.9.7

Mar 25, 2020

0.0.9.6

Mar 25, 2020

0.0.9.5

Feb 24, 2020

0.0.9.4

Feb 23, 2020

0.0.9.3

Feb 22, 2020

0.0.9.2

Feb 22, 2020

0.0.9.1

Feb 22, 2020

0.0.9

Feb 22, 2020

0.0.8.9

Feb 22, 2020

0.0.8.8

Feb 22, 2020

0.0.8.7

Feb 22, 2020

0.0.8.6

Feb 22, 2020

0.0.8.5

Feb 22, 2020

0.0.8.4

Feb 22, 2020

0.0.8.3

Feb 21, 2020

0.0.8.2

Feb 21, 2020

0.0.8.1

Feb 21, 2020

0.0.8

Feb 21, 2020

0.0.7

Feb 21, 2020

0.0.6

Feb 20, 2020

0.0.5

Feb 20, 2020

0.0.4

Feb 20, 2020

0.0.2

Feb 20, 2020

0.0.1

Feb 20, 2020

0.0.0

Feb 17, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

jai-0.0.9.81.tar.gz (28.2 kB view hashes)

Uploaded Mar 30, 2020 Source

Built Distribution

jai-0.0.9.81-py3-none-any.whl (28.9 kB view hashes)

Uploaded Mar 30, 2020 Python 3

Hashes for jai-0.0.9.81.tar.gz

Hashes for jai-0.0.9.81.tar.gz
Algorithm	Hash digest
SHA256	`616fa8816f291cfffedeaf1dc31a83ce3cfaebb88d89174e447f110c4154f6bc`
MD5	`4e93116b3e8b3328bf14342092a58db6`
BLAKE2b-256	`01096a528295cb2f0921489b8d6fa77948879539d4e7699047baaca0d7d73a89`

Hashes for jai-0.0.9.81-py3-none-any.whl

Hashes for jai-0.0.9.81-py3-none-any.whl
Algorithm	Hash digest
SHA256	`464c1bb14fdf0757de19cfe781be182f8a0c4ff08e4620c9780f6d36437a47e2`
MD5	`5afeeba2d2f8ebda87cbb2f7720ed8d2`
BLAKE2b-256	`a8f2271a9dd946bb66bd8d1c376981b582f14bf718bbbd1de34ce4a687d95336`