Skip to main content

DNN.Cool: Multi-task learning for Deep Neural Networks (DNN).

Project description

DNN Cool: Deep Neural Networks for Conditional objective oriented learning

A bunch of utilities for multi-task learning, not much for now. To get a better idea of the dnn_cool's goals, you can:

Installation is as usual:

pip install dnn_cool

Motivational story

Let's say that you are building a home security system. You have collected a bunch of images, and have annotated them. So you have a few columns:

  • camera_blocked - whether the camera is blocked
  • door_open - whether the door of the apartment is open (has meaning only when the camera is not blocked)
  • door_locked - whether the door of the apartment is locked (has meaning only when the door is closed)
  • person_present - whether a person is present (has meaning only when the door is opened)
  • face_x1, face_y1, face_w, face_h - the coordinates of the face of the person (has meaning only when a person is present)
  • body_x1, body_y1, body_w, body_h - the coordinates of the body the person (has meaning only when a person is present)
  • facial_characteristics - represents characteristics of the face, like color, etc. (has meaning only when a person is present)
  • shirt_type - represents a specific type of shirt (e.g white shirt, etc.)

Let's say that you also could not annotate all of the images, so we will put nan when there is no annotation.

Here's an example of the dataframe with the data you have obtained:

camera_blocked door_open person_present door_locked face_x1 face_y1 face_w face_h facial_characteristics body_x1 body_y1 body_w body_h shirt_type img
0 nan 0 0 1 nan nan nan nan nan nan nan nan nan nan 0.jpg
1 nan 1 1 nan 22 4 12 12 0,1 22 15 12 29 2 1.jpg
2 nan 0 0 0 nan nan nan nan nan nan nan nan nan nan 2.jpg
3 nan 1 1 nan 20 3 12 12 0,1 21 15 10 29 1 3.jpg
4 nan 1 0 nan nan nan nan nan nan nan nan nan nan nan 4.jpg
5 nan 1 1 nan 22 2 12 12 0,1 22 13 11 30 2 5.jpg
6 1 nan nan nan nan nan nan nan nan nan nan nan nan nan 6.jpg
7 0 1 1 nan 24 2 12 12 2 25 12 13 31 2 7.jpg
8 0 1 0 nan nan nan nan nan nan nan nan nan nan nan 8.jpg
9 0 1 1 nan 20 0 12 12 0,2 20 10 10 28 4 9.jpg
10 0 1 1 nan 24 0 12 12 0 25 13 11 31 3 10.jpg
11 0 1 1 nan 23 6 12 12 0,1 23 19 11 31 1 11.jpg
12 0 0 0 0 nan nan nan nan nan nan nan nan nan nan 12.jpg
13 0 1 1 nan 22 1 12 12 2 20 11 13 29 2 13.jpg
14 1 nan nan nan nan nan nan nan nan nan nan nan nan nan 14.jpg
15 0 1 0 nan nan nan nan nan nan nan nan nan nan nan 15.jpg
16 0 1 1 nan 22 0 12 12 0 22 10 11 28 1 16.jpg
17 0 1 0 nan nan nan nan nan nan nan nan nan nan nan 17.jpg
18 0 1 1 nan 24 1 12 12 22 11 13 28 2 18.jpg
19 0 0 0 0 nan nan nan nan nan nan nan nan nan nan 19.jpg

Now, how would you approach this? Let's summarize the options below

  1. Train a neural network for every small task - this will definitely work, but will be extremely heavy (you will have 14 different neural networks) - moreover, every model will extract similar low-level features. Will have huge memory and computational requirements, will end up very expensive.
  2. Use a pretrained face detector/body detector and train neural network per every other task - this suffers from the same issues before, there are still going to be a lot of neural networks, plus you don't know if the pretrained detector would be suitable for your use case, since detectors usually are pretrained for a large number of classes. Also, notice how after we have detected the face, we also want to predict the facial characteristics, and for the body we want to predict the shirt type - this means that we would have to train these separately, or we would have to understand the detector in detail to know how to modify it.
  3. Multi-task learning approach, where some initial features are extracted, then from a certain point on, there are branches for every task separately. This is the most lightweight and the best option in terms of quality (because it is a well known fact that multi-task learning improves generalization) but it is not so trivial to implement. Notice that for example when the camera is blocked, we should not backpropagate for the other tasks, since there is no way for us to know where the person would be etc. The model has to learn conditional objectives, which are not so easy to keep track of for several reasons.

This library dnn_cool aims to make the conditional objectives approach much easier, therefore is named Deep Neural Networks for Conditional Objectives oriented learning.

Assume that the variable df holds the annotated data. Then we would have to create a dnn_cool.project.Project, where first we have to specify which columns should be treated as output tasks, and what type of values each column is.

output_col = ['camera_blocked', 'door_open', 'person_present', 'door_locked',
              'face_x1', 'face_y1', 'face_w', 'face_h',
              'body_x1', 'body_y1', 'body_w', 'body_h']

converters = Converters()
type_guesser = converters.type
type_guesser.type_mapping['camera_blocked'] = 'binary'
type_guesser.type_mapping['door_open'] = 'binary'
type_guesser.type_mapping['person_present'] = 'binary'
type_guesser.type_mapping['door_locked'] = 'binary'
type_guesser.type_mapping['face_x1'] = 'continuous'
type_guesser.type_mapping['face_y1'] = 'continuous'
type_guesser.type_mapping['face_w'] = 'continuous'
type_guesser.type_mapping['face_h'] = 'continuous'
type_guesser.type_mapping['body_x1'] = 'continuous'
type_guesser.type_mapping['body_y1'] = 'continuous'
type_guesser.type_mapping['body_w'] = 'continuous'
type_guesser.type_mapping['body_h'] = 'continuous'
type_guesser.type_mapping['img'] = 'img'
type_guesser.type_mapping['shirt_type'] = 'category'
type_guesser.type_mapping['facial_characteristics'] = 'multilabel'

Then we have to specify how would we convert the values from a given dataframe column to actual tensor values. For example, the column with the image filename is converted into a tensor by reading the filename from disk, binary values are just converted to bool tensor (with missing values marked), and continous variables are normalized in [0, 1]. Note that we can also specify a converter directly for a given column (skipping its type), by using values_converter.col_mapping, but we will describe this in more detail later.

values_converter = converters.values
values_converter.type_mapping['img'] = imgs_from_disk_converter
values_converter.type_mapping['binary'] = binary_value_converter
values_converter.type_mapping['continuous'] = bounded_regression_converter

task_converter = converters.task

Now, we tell how a given type of values is converted to a Task. A Task is a collection of a Pytorch nn.Module, which holds the weights specific for the task, the activation of the task, the decoder of the task, and the loss function of the task.

task_converter.type_mapping['binary'] = binary_classification_task
task_converter.type_mapping['continuous'] = bounded_regression_task

project = Project(df,
                  input_col='img',
                  output_col=output_col,
                  converters=converters,
                  project_dir='./high-level-project')

Perfect! Now it's time to start to start adding TaskFlows - this is basically a function that describes the dependencies between the tasks. TaskFlow actually extends Task, so you can use TaskFlow inside a TaskFlow, etc. Let's first start with creating a TaskFlow for the face-related tasks:

@project.add_flow
def face_regression(flow, x, out):
    out += flow.face_x1(x.face_localization)
    out += flow.face_y1(x.face_localization)
    out += flow.face_w(x.face_localization)
    out += flow.face_h(x.face_localization)
    out += flow.facial_characteristics(x.features)
    return out

Here, flow is the object which holds all tasks so far, x is a Dict which holds features for the different branches, and out is the variable in which the final result is accummulated.

Now we can add a TaskFlow for body-related tasks as well:

@project.add_flow
def body_regression(flow, x, out):
    out += flow.body_x1(x.body_localization)
    out += flow.body_y1(x.body_localization)
    out += flow.body_w(x.body_localization)
    out += flow.body_h(x.body_localization)
    out += flow.shirt_type(x.features)
    return out

Since these two flows are already added, let's group them into a TaskFlow for person-related tasks:

@project.add_flow
def person_regression(flow, x, out):
    out += flow.face_regression(x)
    out += flow.body_regression(x)
    return out

And now let's implement the full flow of tasks:

@project.add_flow
def full_flow(flow, x, out):
    out += flow.camera_blocked(x.features)
    out += flow.door_open(x.features) | (~out.camera_blocked)
    out += flow.door_locked(x.features) | (~out.door_open)
    out += flow.person_present(x.features) | out.door_open
    out += flow.person_regression(x) | out.person_present
    return out

Here you can notice the | operator, which is used as a precondition (read it as given, e.g "Door open given that the camera is not blocked"). Let's now get the full flow for the project and have a look at some of its methods.

flow = project.get_full_flow()
flow.get_loss()                 # returns a loss function that uses the children's loss functions
flow.get_per_sample_loss()      # returns a loss function that will return a loss item for every sample; useful for interpreting results
flow.torch()                    # returns a `nn.Module` that uses the children's modules, according to the logic described in the task flow
flow.get_dataset()              # returns a Pytorch `Dataset` class, which includes the preconditions needed to know which weights should be updated
flow.get_metrics()              # returns a list of metrics, which consists of the metrics of its children
flow.get_treelib_explainer()    # returns an object that when called, draws a tree of the decision making, based on the task flow
flow.get_decoder()              # returns a decoder, which decodes all children tasks
flow.get_activation()           # returns an activation function, which invokes the activations of its children
flow.get_evaluator()            # returns an evaluator, which evaluates every task (given that its respective precondition is satisfied)

Now let's create some model and use the module provided by the flow.

class SecurityModule(nn.Module):

    def __init__(self):
        super().__init__()
        self.seq = nn.Sequential(
            nn.Conv2d(3, 64, kernel_size=5),
            nn.ReLU(inplace=True),
            nn.Conv2d(64, 128, kernel_size=5),
            nn.ReLU(inplace=True),
            nn.AvgPool2d(2),
            nn.Conv2d(128, 128, kernel_size=5),
            nn.ReLU(inplace=True),
            nn.Conv2d(128, 256, kernel_size=5),
            nn.AvgPool2d(2),
            nn.ReLU(inplace=True),
        )

        self.features_seq = nn.Sequential(
            nn.Conv2d(256, 256, kernel_size=5),
            nn.ReLU(inplace=True),
            nn.Conv2d(256, 256, kernel_size=5),
            nn.ReLU(inplace=True),
            nn.AdaptiveAvgPool2d(1),
            nn.Flatten(),
        )

        self.face_localization_seq = nn.Sequential(
            nn.Conv2d(256, 256, kernel_size=5),
            nn.ReLU(inplace=True),
            nn.Conv2d(256, 256, kernel_size=5),
            nn.ReLU(inplace=True),
            nn.AdaptiveAvgPool2d(1),
            nn.Flatten(),
        )

        self.body_localization_seq = nn.Sequential(
            nn.Conv2d(256, 256, kernel_size=5),
            nn.ReLU(inplace=True),
            nn.Conv2d(256, 256, kernel_size=5),
            nn.ReLU(inplace=True),
            nn.AdaptiveAvgPool2d(1),
            nn.Flatten(),
        )

        self.flow_module = flow.torch()

    def forward(self, x):
        res = {}
        common = self.seq(x['img'])
        res['features'] = self.features_seq(common)
        res['face_localization'] = self.face_localization_seq(common)
        res['body_localization'] = self.body_localization_seq(common)
        res['gt'] = x.get('gt')
        return self.flow_module(res)

As you can see, the model starts with shared features, then splits into 3 groups: features, face_localization and body_localization, after this it uses the flow.torch() to get the final result and passes ground truth for preconditions (only needed when training).

Now, to train the model you can use any framework you like (or use normal Pytorch training loop), but dnn_cool has a utility for working with Catalyst, so we will show how to use it:

model = SecurityModule()
runner = project.runner(model=model, runner_name='experiment_run')
runner.train(num_epochs=10)

Great, we have found the best weights of the model! But for binary classification tasks, multi-label classification tasks etc. we have to tune the thresholds after the sigmoid! To do that, just call:

runner.infer()                  # Dumps the predictions, targets and per-task interpretations in the directory for the run
tuned_params = runner.tune()    # Tunes the thresholds per task

This calls the tuners for every specific task and selects the best threshold for every task. Great, now let's evaluate the model with the best-found thresholds:

evaluation_df = runner.evaluate()
evaluation_df

Here's an example output on a synthetic dataset:

task_path metric_name metric_res num_samples
0 camera_blocked accuracy 1 996
1 camera_blocked f1_score 1 996
2 camera_blocked precision 1 996
3 camera_blocked recall 1 996
4 door_open accuracy 1 902
5 door_open f1_score 1 902
6 door_open precision 1 902
7 door_open recall 1 902
8 door_locked accuracy 1 201
9 door_locked f1_score 1 201
10 door_locked precision 1 201
11 door_locked recall 1 201
12 person_present accuracy 1 701
13 person_present f1_score 1 701
14 person_present precision 1 701
15 person_present recall 1 701
16 person_regression.face_regression.face_x1 mean_absolute_error 0.013935 611
17 person_regression.face_regression.face_y1 mean_absolute_error 0.0268986 611
18 person_regression.face_regression.face_w mean_absolute_error 0.0115682 611
19 person_regression.face_regression.face_h mean_absolute_error 0.0121426 611
20 person_regression.face_regression.facial_characteristics accuracy 0.996727 611
21 person_regression.body_regression.body_x1 mean_absolute_error 0.00877354 611
22 person_regression.body_regression.body_y1 mean_absolute_error 0.0188446 611
23 person_regression.body_regression.body_w mean_absolute_error 0.020874 611
24 person_regression.body_regression.body_h mean_absolute_error 0.0145986 611
25 person_regression.body_regression.shirt_type accuracy_1 1 611
26 person_regression.body_regression.shirt_type accuracy_3 1 611
27 person_regression.body_regression.shirt_type accuracy_5 1 611
28 person_regression.body_regression.shirt_type f1_score 1 611
29 person_regression.body_regression.shirt_type precision 1 611
30 person_regression.body_regression.shirt_type recall 1 611

Features

Main features are:

  • Support for task preconditioning - you can say that a task is a precondition to another task
  • Support for nested tasks - tasks which contain other tasks
  • Support for handling missing values
  • Treelib explanation generator, for example:
├── inp 1
│   └── camera_blocked | decoded: [False], activated: [0.], logits: [-117.757324]
│       └── door_open | decoded: [ True], activated: [1.], logits: [41.11258]
│           └── person_present | decoded: [ True], activated: [1.], logits: [60.38873]
│               └── person_regression
│                   ├── body_regression
│                   │   ├── body_h | decoded: [29.672623], activated: [0.46363473], logits: [-0.14571853]
│                   │   ├── body_w | decoded: [12.86382], activated: [0.20099719], logits: [-1.3800735]
│                   │   ├── body_x1 | decoded: [21.34288], activated: [0.3334825], logits: [-0.69247603]
│                   │   ├── body_y1 | decoded: [18.468979], activated: [0.2885778], logits: [-0.9023013]
│                   │   └── shirt_type | decoded: [6 1 0 4 2 5 3], activated: [4.1331367e-23 3.5493638e-17 3.1328378e-26 5.6903808e-30 2.4471377e-25
 2.8071076e-29 1.0000000e+00], logits: [-20.549513  -6.88627  -27.734364 -36.34787  -25.6788   -34.751904
  30.990908]
│                   └── face_regression
│                       ├── face_h | decoded: [11.265154], activated: [0.17601803], logits: [-1.5435623]
│                       ├── face_w | decoded: [12.225838], activated: [0.19102871], logits: [-1.4433397]
│                       ├── face_x1 | decoded: [21.98834], activated: [0.34356782], logits: [-0.64743483]
│                       ├── face_y1 | decoded: [3.2855165], activated: [0.0513362], logits: [-2.9166584]
│                       └── facial_characteristics | decoded: [ True False  True], activated: [9.9999940e-01 1.2074912e-12 9.9999833e-01], logits: [ 14.240071 -27.442476  13.27557 ]

but if the model thinks the camera is blocked, then the explanation would be:

├── inp 2
│   └── camera_blocked | decoded: [ True], activated: [1.], logits: [76.367676]

How does it work?

I will tell you soon!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dnn_cool-0.1.0a0.tar.gz (33.3 kB view hashes)

Uploaded Source

Built Distribution

dnn_cool-0.1.0a0-py3-none-any.whl (34.2 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page