This project aims to train neural networks by compound-protein interactions and provides interpretation of the learned model by interactively showing transformed chemical landscape and visualized SAR for chemicals of interest.
Project description
VISAR Tutorial
This project aims to train neural networks by compound-protein interactions and provides interpretation of the learned model by interactively showing transformed chemical landscape and visualized SAR for chemicals of interest.
In this notebook, we will show a typical workflow of using VISAR for training neural network QSAR models and analyzing the trained model.
Train single task regresion model
import os
from model_training_utils_v2 import ST_model_hyperparam_screen, ST_model_training
os.environ['CUDA_VISIBLE_DEVICES']='1'
model setup
# initialize parameters
task_names = ['T107', 'T108']
MT_dat_name = './data/MT_data_clean_June28.csv'
FP_type = 'Circular_2048'
params_dict = {
"n_tasks": [1],
"n_features": [2048], ## need modification given FP types
"activation": ['relu'],
"momentum": [.9],
"batch_size": [128],
"init": ['glorot_uniform'],
"learning_rate": [0.01],
"decay": [1e-6],
"nb_epoch": [30],
"dropouts": [.2, .4],
"nb_layers": [1],
"batchnorm": [False],
"layer_sizes": [(1024, 512),(1024,128) ,(512, 128),(512,64),(128,64),(64,32),
(1024,512,128), (512,128,64), (128,64,32)],
"penalty": [0.1]
}
# initialize model setup
import random
import time
random_seed = random.randint(0,1000)
local_time = time.localtime(time.time())
log_path = './logs/'
RUN_KEY = 'ST_%d_%d_%d_%d' % (local_time.tm_year, local_time.tm_mon,
local_time.tm_mday, random_seed)
os.system('mkdir %s%s' % (log_path, RUN_KEY))
print(RUN_KEY)
ST_2019_8_14_986
hyperparameter screening
# hyperparam screening using deepchem
log_output = ST_model_hyperparam_screen(MT_dat_name, task_names, FP_type, params_dict,
log_path = './logs/'+RUN_KEY)
----------------------------------------------
Extracted dataset shape: (2951, 3)
Loading raw samples now.
shard_size: 8192
About to start loading CSV from ./logs/ST_2019_8_14_986/temp.csv
Loading shard 1 of size 8192.
Featurizing sample 0
Featurizing sample 1000
Featurizing sample 2000
TIMING: featurizing shard 0 took 11.429 s
TIMING: dataset construction took 11.677 s
Loading dataset from disk.
Preparing dataset for T107 of rep 0...
Computing train/valid/test indices
TIMING: dataset construction took 0.315 s
Loading dataset from disk.
TIMING: dataset construction took 0.126 s
Loading dataset from disk.
TIMING: dataset construction took 0.140 s
Loading dataset from disk.
Hyperprameter screening ...
Fitting model 1/18
hyperparameters: {'penalty': 0.1, 'activation': 'relu', 'n_features': 2048, 'decay': 1e-06, 'batchnorm': False, 'learning_rate': 0.01, 'momentum': 0.9, 'batch_size': 128, 'nb_layers': 1, 'init': 'glorot_uniform', 'dropouts': 0.2, 'nb_epoch': 30, 'layer_sizes': (1024, 512), 'n_tasks': 1}
computed_metrics: [0.5481104503085001]
Model 1/18, Metric r2_score, Validation set 0: 0.548110
best_validation_score so far: 0.548110
Fitting model 2/18
hyperparameters: {'penalty': 0.1, 'activation': 'relu', 'n_features': 2048, 'decay': 1e-06, 'batchnorm': False, 'learning_rate': 0.01, 'momentum': 0.9, 'batch_size': 128, 'nb_layers': 1, 'init': 'glorot_uniform', 'dropouts': 0.4, 'nb_epoch': 30, 'layer_sizes': (1024, 512), 'n_tasks': 1}
computed_metrics: [0.6237788574566245]
Model 2/18, Metric r2_score, Validation set 1: 0.623779
best_validation_score so far: 0.623779
Fitting model 3/18
hyperparameters: {'penalty': 0.1, 'activation': 'relu', 'n_features': 2048, 'decay': 1e-06, 'batchnorm': False, 'learning_rate': 0.01, 'momentum': 0.9, 'batch_size': 128, 'nb_layers': 1, 'init': 'glorot_uniform', 'dropouts': 0.2, 'nb_epoch': 30, 'layer_sizes': (1024, 128), 'n_tasks': 1}
computed_metrics: [0.5835653981206139]
Model 3/18, Metric r2_score, Validation set 2: 0.583565
best_validation_score so far: 0.623779
Fitting model 4/18
hyperparameters: {'penalty': 0.1, 'activation': 'relu', 'n_features': 2048, 'decay': 1e-06, 'batchnorm': False, 'learning_rate': 0.01, 'momentum': 0.9, 'batch_size': 128, 'nb_layers': 1, 'init': 'glorot_uniform', 'dropouts': 0.4, 'nb_epoch': 30, 'layer_sizes': (1024, 128), 'n_tasks': 1}
computed_metrics: [0.6196603911294876]
Model 4/18, Metric r2_score, Validation set 3: 0.619660
best_validation_score so far: 0.623779
Fitting model 5/18
hyperparameters: {'penalty': 0.1, 'activation': 'relu', 'n_features': 2048, 'decay': 1e-06, 'batchnorm': False, 'learning_rate': 0.01, 'momentum': 0.9, 'batch_size': 128, 'nb_layers': 1, 'init': 'glorot_uniform', 'dropouts': 0.2, 'nb_epoch': 30, 'layer_sizes': (512, 128), 'n_tasks': 1}
computed_metrics: [0.6374032122095533]
Model 5/18, Metric r2_score, Validation set 4: 0.637403
best_validation_score so far: 0.637403
Fitting model 6/18
hyperparameters: {'penalty': 0.1, 'activation': 'relu', 'n_features': 2048, 'decay': 1e-06, 'batchnorm': False, 'learning_rate': 0.01, 'momentum': 0.9, 'batch_size': 128, 'nb_layers': 1, 'init': 'glorot_uniform', 'dropouts': 0.4, 'nb_epoch': 30, 'layer_sizes': (512, 128), 'n_tasks': 1}
computed_metrics: [0.6187663912157835]
Model 6/18, Metric r2_score, Validation set 5: 0.618766
best_validation_score so far: 0.637403
Fitting model 7/18
hyperparameters: {'penalty': 0.1, 'activation': 'relu', 'n_features': 2048, 'decay': 1e-06, 'batchnorm': False, 'learning_rate': 0.01, 'momentum': 0.9, 'batch_size': 128, 'nb_layers': 1, 'init': 'glorot_uniform', 'dropouts': 0.2, 'nb_epoch': 30, 'layer_sizes': (512, 64), 'n_tasks': 1}
computed_metrics: [0.6298660644966039]
Model 7/18, Metric r2_score, Validation set 6: 0.629866
best_validation_score so far: 0.637403
Fitting model 8/18
hyperparameters: {'penalty': 0.1, 'activation': 'relu', 'n_features': 2048, 'decay': 1e-06, 'batchnorm': False, 'learning_rate': 0.01, 'momentum': 0.9, 'batch_size': 128, 'nb_layers': 1, 'init': 'glorot_uniform', 'dropouts': 0.4, 'nb_epoch': 30, 'layer_sizes': (512, 64), 'n_tasks': 1}
computed_metrics: [0.6406574576550392]
Model 8/18, Metric r2_score, Validation set 7: 0.640657
best_validation_score so far: 0.640657
Fitting model 9/18
hyperparameters: {'penalty': 0.1, 'activation': 'relu', 'n_features': 2048, 'decay': 1e-06, 'batchnorm': False, 'learning_rate': 0.01, 'momentum': 0.9, 'batch_size': 128, 'nb_layers': 1, 'init': 'glorot_uniform', 'dropouts': 0.2, 'nb_epoch': 30, 'layer_sizes': (128, 64), 'n_tasks': 1}
computed_metrics: [0.6110877363928156]
Model 9/18, Metric r2_score, Validation set 8: 0.611088
best_validation_score so far: 0.640657
Fitting model 10/18
hyperparameters: {'penalty': 0.1, 'activation': 'relu', 'n_features': 2048, 'decay': 1e-06, 'batchnorm': False, 'learning_rate': 0.01, 'momentum': 0.9, 'batch_size': 128, 'nb_layers': 1, 'init': 'glorot_uniform', 'dropouts': 0.4, 'nb_epoch': 30, 'layer_sizes': (128, 64), 'n_tasks': 1}
computed_metrics: [0.602689374923115]
Model 10/18, Metric r2_score, Validation set 9: 0.602689
best_validation_score so far: 0.640657
Fitting model 11/18
hyperparameters: {'penalty': 0.1, 'activation': 'relu', 'n_features': 2048, 'decay': 1e-06, 'batchnorm': False, 'learning_rate': 0.01, 'momentum': 0.9, 'batch_size': 128, 'nb_layers': 1, 'init': 'glorot_uniform', 'dropouts': 0.2, 'nb_epoch': 30, 'layer_sizes': (64, 32), 'n_tasks': 1}
computed_metrics: [0.5979003156666713]
Model 11/18, Metric r2_score, Validation set 10: 0.597900
best_validation_score so far: 0.640657
Fitting model 12/18
hyperparameters: {'penalty': 0.1, 'activation': 'relu', 'n_features': 2048, 'decay': 1e-06, 'batchnorm': False, 'learning_rate': 0.01, 'momentum': 0.9, 'batch_size': 128, 'nb_layers': 1, 'init': 'glorot_uniform', 'dropouts': 0.4, 'nb_epoch': 30, 'layer_sizes': (64, 32), 'n_tasks': 1}
computed_metrics: [0.6163441339563296]
Model 12/18, Metric r2_score, Validation set 11: 0.616344
best_validation_score so far: 0.640657
Fitting model 13/18
hyperparameters: {'penalty': 0.1, 'activation': 'relu', 'n_features': 2048, 'decay': 1e-06, 'batchnorm': False, 'learning_rate': 0.01, 'momentum': 0.9, 'batch_size': 128, 'nb_layers': 1, 'init': 'glorot_uniform', 'dropouts': 0.2, 'nb_epoch': 30, 'layer_sizes': (1024, 512, 128), 'n_tasks': 1}
computed_metrics: [0.5797812103668558]
Model 13/18, Metric r2_score, Validation set 12: 0.579781
best_validation_score so far: 0.640657
Fitting model 14/18
hyperparameters: {'penalty': 0.1, 'activation': 'relu', 'n_features': 2048, 'decay': 1e-06, 'batchnorm': False, 'learning_rate': 0.01, 'momentum': 0.9, 'batch_size': 128, 'nb_layers': 1, 'init': 'glorot_uniform', 'dropouts': 0.4, 'nb_epoch': 30, 'layer_sizes': (1024, 512, 128), 'n_tasks': 1}
computed_metrics: [0.6082225807486198]
Model 14/18, Metric r2_score, Validation set 13: 0.608223
best_validation_score so far: 0.640657
Fitting model 15/18
hyperparameters: {'penalty': 0.1, 'activation': 'relu', 'n_features': 2048, 'decay': 1e-06, 'batchnorm': False, 'learning_rate': 0.01, 'momentum': 0.9, 'batch_size': 128, 'nb_layers': 1, 'init': 'glorot_uniform', 'dropouts': 0.2, 'nb_epoch': 30, 'layer_sizes': (512, 128, 64), 'n_tasks': 1}
computed_metrics: [0.6186375458180398]
Model 15/18, Metric r2_score, Validation set 14: 0.618638
best_validation_score so far: 0.640657
Fitting model 16/18
hyperparameters: {'penalty': 0.1, 'activation': 'relu', 'n_features': 2048, 'decay': 1e-06, 'batchnorm': False, 'learning_rate': 0.01, 'momentum': 0.9, 'batch_size': 128, 'nb_layers': 1, 'init': 'glorot_uniform', 'dropouts': 0.4, 'nb_epoch': 30, 'layer_sizes': (512, 128, 64), 'n_tasks': 1}
computed_metrics: [0.6552503861859487]
Model 16/18, Metric r2_score, Validation set 15: 0.655250
best_validation_score so far: 0.655250
Fitting model 17/18
hyperparameters: {'penalty': 0.1, 'activation': 'relu', 'n_features': 2048, 'decay': 1e-06, 'batchnorm': False, 'learning_rate': 0.01, 'momentum': 0.9, 'batch_size': 128, 'nb_layers': 1, 'init': 'glorot_uniform', 'dropouts': 0.2, 'nb_epoch': 30, 'layer_sizes': (128, 64, 32), 'n_tasks': 1}
computed_metrics: [0.6310180282170208]
Model 17/18, Metric r2_score, Validation set 16: 0.631018
best_validation_score so far: 0.655250
Fitting model 18/18
hyperparameters: {'penalty': 0.1, 'activation': 'relu', 'n_features': 2048, 'decay': 1e-06, 'batchnorm': False, 'learning_rate': 0.01, 'momentum': 0.9, 'batch_size': 128, 'nb_layers': 1, 'init': 'glorot_uniform', 'dropouts': 0.4, 'nb_epoch': 30, 'layer_sizes': (128, 64, 32), 'n_tasks': 1}
computed_metrics: [0.6084419728173178]
Model 18/18, Metric r2_score, Validation set 17: 0.608442
best_validation_score so far: 0.655250
computed_metrics: [0.9163785773039165]
Best hyperparameters: ((512, 128, 64), 0.1, 'relu', 2048, 1e-06, 0.01, 0.9, 128, 1, 'glorot_uniform', 30, 0.4, 1, False)
train_score: 0.916379
validation_score: 0.655250
Generate performace report ...
Preparing dataset for T107 of rep 1...
Computing train/valid/test indices
TIMING: dataset construction took 0.320 s
Loading dataset from disk.
TIMING: dataset construction took 0.149 s
Loading dataset from disk.
TIMING: dataset construction took 0.121 s
Loading dataset from disk.
Hyperprameter screening ...
Fitting model 1/18
hyperparameters: {'penalty': 0.1, 'activation': 'relu', 'n_features': 2048, 'decay': 1e-06, 'batchnorm': False, 'learning_rate': 0.01, 'momentum': 0.9, 'batch_size': 128, 'nb_layers': 1, 'init': 'glorot_uniform', 'dropouts': 0.2, 'nb_epoch': 30, 'layer_sizes': (1024, 512), 'n_tasks': 1}
computed_metrics: [0.6141334216423628]
Model 1/18, Metric r2_score, Validation set 0: 0.614133
best_validation_score so far: 0.614133
Fitting model 2/18
hyperparameters: {'penalty': 0.1, 'activation': 'relu', 'n_features': 2048, 'decay': 1e-06, 'batchnorm': False, 'learning_rate': 0.01, 'momentum': 0.9, 'batch_size': 128, 'nb_layers': 1, 'init': 'glorot_uniform', 'dropouts': 0.4, 'nb_epoch': 30, 'layer_sizes': (1024, 512), 'n_tasks': 1}
computed_metrics: [0.6456750069826567]
Model 2/18, Metric r2_score, Validation set 1: 0.645675
best_validation_score so far: 0.645675
Fitting model 3/18
hyperparameters: {'penalty': 0.1, 'activation': 'relu', 'n_features': 2048, 'decay': 1e-06, 'batchnorm': False, 'learning_rate': 0.01, 'momentum': 0.9, 'batch_size': 128, 'nb_layers': 1, 'init': 'glorot_uniform', 'dropouts': 0.2, 'nb_epoch': 30, 'layer_sizes': (1024, 128), 'n_tasks': 1}
computed_metrics: [0.6689664106003053]
Model 3/18, Metric r2_score, Validation set 2: 0.668966
best_validation_score so far: 0.668966
Fitting model 4/18
hyperparameters: {'penalty': 0.1, 'activation': 'relu', 'n_features': 2048, 'decay': 1e-06, 'batchnorm': False, 'learning_rate': 0.01, 'momentum': 0.9, 'batch_size': 128, 'nb_layers': 1, 'init': 'glorot_uniform', 'dropouts': 0.4, 'nb_epoch': 30, 'layer_sizes': (1024, 128), 'n_tasks': 1}
computed_metrics: [0.6880955534249764]
Model 4/18, Metric r2_score, Validation set 3: 0.688096
best_validation_score so far: 0.688096
Fitting model 5/18
hyperparameters: {'penalty': 0.1, 'activation': 'relu', 'n_features': 2048, 'decay': 1e-06, 'batchnorm': False, 'learning_rate': 0.01, 'momentum': 0.9, 'batch_size': 128, 'nb_layers': 1, 'init': 'glorot_uniform', 'dropouts': 0.2, 'nb_epoch': 30, 'layer_sizes': (512, 128), 'n_tasks': 1}
computed_metrics: [0.6884896776847982]
Model 5/18, Metric r2_score, Validation set 4: 0.688490
best_validation_score so far: 0.688490
Fitting model 6/18
hyperparameters: {'penalty': 0.1, 'activation': 'relu', 'n_features': 2048, 'decay': 1e-06, 'batchnorm': False, 'learning_rate': 0.01, 'momentum': 0.9, 'batch_size': 128, 'nb_layers': 1, 'init': 'glorot_uniform', 'dropouts': 0.4, 'nb_epoch': 30, 'layer_sizes': (512, 128), 'n_tasks': 1}
computed_metrics: [0.6844000694925106]
Model 6/18, Metric r2_score, Validation set 5: 0.684400
best_validation_score so far: 0.688490
Fitting model 7/18
hyperparameters: {'penalty': 0.1, 'activation': 'relu', 'n_features': 2048, 'decay': 1e-06, 'batchnorm': False, 'learning_rate': 0.01, 'momentum': 0.9, 'batch_size': 128, 'nb_layers': 1, 'init': 'glorot_uniform', 'dropouts': 0.2, 'nb_epoch': 30, 'layer_sizes': (512, 64), 'n_tasks': 1}
computed_metrics: [0.6963359115130865]
Model 7/18, Metric r2_score, Validation set 6: 0.696336
best_validation_score so far: 0.696336
Fitting model 8/18
hyperparameters: {'penalty': 0.1, 'activation': 'relu', 'n_features': 2048, 'decay': 1e-06, 'batchnorm': False, 'learning_rate': 0.01, 'momentum': 0.9, 'batch_size': 128, 'nb_layers': 1, 'init': 'glorot_uniform', 'dropouts': 0.4, 'nb_epoch': 30, 'layer_sizes': (512, 64), 'n_tasks': 1}
computed_metrics: [0.6850005173519169]
Model 8/18, Metric r2_score, Validation set 7: 0.685001
best_validation_score so far: 0.696336
Fitting model 9/18
hyperparameters: {'penalty': 0.1, 'activation': 'relu', 'n_features': 2048, 'decay': 1e-06, 'batchnorm': False, 'learning_rate': 0.01, 'momentum': 0.9, 'batch_size': 128, 'nb_layers': 1, 'init': 'glorot_uniform', 'dropouts': 0.2, 'nb_epoch': 30, 'layer_sizes': (128, 64), 'n_tasks': 1}
computed_metrics: [0.6529657912138516]
Model 9/18, Metric r2_score, Validation set 8: 0.652966
best_validation_score so far: 0.696336
Fitting model 10/18
hyperparameters: {'penalty': 0.1, 'activation': 'relu', 'n_features': 2048, 'decay': 1e-06, 'batchnorm': False, 'learning_rate': 0.01, 'momentum': 0.9, 'batch_size': 128, 'nb_layers': 1, 'init': 'glorot_uniform', 'dropouts': 0.4, 'nb_epoch': 30, 'layer_sizes': (128, 64), 'n_tasks': 1}
computed_metrics: [0.6943907300870579]
Model 10/18, Metric r2_score, Validation set 9: 0.694391
best_validation_score so far: 0.696336
Fitting model 11/18
hyperparameters: {'penalty': 0.1, 'activation': 'relu', 'n_features': 2048, 'decay': 1e-06, 'batchnorm': False, 'learning_rate': 0.01, 'momentum': 0.9, 'batch_size': 128, 'nb_layers': 1, 'init': 'glorot_uniform', 'dropouts': 0.2, 'nb_epoch': 30, 'layer_sizes': (64, 32), 'n_tasks': 1}
computed_metrics: [0.6882941803739605]
Model 11/18, Metric r2_score, Validation set 10: 0.688294
best_validation_score so far: 0.696336
Fitting model 12/18
hyperparameters: {'penalty': 0.1, 'activation': 'relu', 'n_features': 2048, 'decay': 1e-06, 'batchnorm': False, 'learning_rate': 0.01, 'momentum': 0.9, 'batch_size': 128, 'nb_layers': 1, 'init': 'glorot_uniform', 'dropouts': 0.4, 'nb_epoch': 30, 'layer_sizes': (64, 32), 'n_tasks': 1}
computed_metrics: [0.6688188952045332]
Model 12/18, Metric r2_score, Validation set 11: 0.668819
best_validation_score so far: 0.696336
Fitting model 13/18
hyperparameters: {'penalty': 0.1, 'activation': 'relu', 'n_features': 2048, 'decay': 1e-06, 'batchnorm': False, 'learning_rate': 0.01, 'momentum': 0.9, 'batch_size': 128, 'nb_layers': 1, 'init': 'glorot_uniform', 'dropouts': 0.2, 'nb_epoch': 30, 'layer_sizes': (1024, 512, 128), 'n_tasks': 1}
computed_metrics: [0.663746368808691]
Model 13/18, Metric r2_score, Validation set 12: 0.663746
best_validation_score so far: 0.696336
Fitting model 14/18
hyperparameters: {'penalty': 0.1, 'activation': 'relu', 'n_features': 2048, 'decay': 1e-06, 'batchnorm': False, 'learning_rate': 0.01, 'momentum': 0.9, 'batch_size': 128, 'nb_layers': 1, 'init': 'glorot_uniform', 'dropouts': 0.4, 'nb_epoch': 30, 'layer_sizes': (1024, 512, 128), 'n_tasks': 1}
computed_metrics: [0.6436014936686325]
Model 14/18, Metric r2_score, Validation set 13: 0.643601
best_validation_score so far: 0.696336
Fitting model 15/18
hyperparameters: {'penalty': 0.1, 'activation': 'relu', 'n_features': 2048, 'decay': 1e-06, 'batchnorm': False, 'learning_rate': 0.01, 'momentum': 0.9, 'batch_size': 128, 'nb_layers': 1, 'init': 'glorot_uniform', 'dropouts': 0.2, 'nb_epoch': 30, 'layer_sizes': (512, 128, 64), 'n_tasks': 1}
computed_metrics: [0.7199618702370533]
Model 15/18, Metric r2_score, Validation set 14: 0.719962
best_validation_score so far: 0.719962
Fitting model 16/18
hyperparameters: {'penalty': 0.1, 'activation': 'relu', 'n_features': 2048, 'decay': 1e-06, 'batchnorm': False, 'learning_rate': 0.01, 'momentum': 0.9, 'batch_size': 128, 'nb_layers': 1, 'init': 'glorot_uniform', 'dropouts': 0.4, 'nb_epoch': 30, 'layer_sizes': (512, 128, 64), 'n_tasks': 1}
computed_metrics: [0.7155456170754025]
Model 16/18, Metric r2_score, Validation set 15: 0.715546
best_validation_score so far: 0.719962
Fitting model 17/18
hyperparameters: {'penalty': 0.1, 'activation': 'relu', 'n_features': 2048, 'decay': 1e-06, 'batchnorm': False, 'learning_rate': 0.01, 'momentum': 0.9, 'batch_size': 128, 'nb_layers': 1, 'init': 'glorot_uniform', 'dropouts': 0.2, 'nb_epoch': 30, 'layer_sizes': (128, 64, 32), 'n_tasks': 1}
computed_metrics: [0.6898074732971973]
Model 17/18, Metric r2_score, Validation set 16: 0.689807
best_validation_score so far: 0.719962
Fitting model 18/18
hyperparameters: {'penalty': 0.1, 'activation': 'relu', 'n_features': 2048, 'decay': 1e-06, 'batchnorm': False, 'learning_rate': 0.01, 'momentum': 0.9, 'batch_size': 128, 'nb_layers': 1, 'init': 'glorot_uniform', 'dropouts': 0.4, 'nb_epoch': 30, 'layer_sizes': (128, 64, 32), 'n_tasks': 1}
computed_metrics: [0.6894147493435725]
Model 18/18, Metric r2_score, Validation set 17: 0.689415
best_validation_score so far: 0.719962
computed_metrics: [0.9520767295678436]
Best hyperparameters: ((512, 128, 64), 0.1, 'relu', 2048, 1e-06, 0.01, 0.9, 128, 1, 'glorot_uniform', 30, 0.2, 1, False)
train_score: 0.952077
validation_score: 0.719962
Generate performace report ...
Preparing dataset for T107 of rep 2...
Computing train/valid/test indices
TIMING: dataset construction took 0.293 s
Loading dataset from disk.
TIMING: dataset construction took 0.115 s
Loading dataset from disk.
TIMING: dataset construction took 0.116 s
Loading dataset from disk.
Hyperprameter screening ...
Fitting model 1/18
hyperparameters: {'penalty': 0.1, 'activation': 'relu', 'n_features': 2048, 'decay': 1e-06, 'batchnorm': False, 'learning_rate': 0.01, 'momentum': 0.9, 'batch_size': 128, 'nb_layers': 1, 'init': 'glorot_uniform', 'dropouts': 0.2, 'nb_epoch': 30, 'layer_sizes': (1024, 512), 'n_tasks': 1}
computed_metrics: [0.4930277822893324]
Model 1/18, Metric r2_score, Validation set 0: 0.493028
best_validation_score so far: 0.493028
Fitting model 2/18
hyperparameters: {'penalty': 0.1, 'activation': 'relu', 'n_features': 2048, 'decay': 1e-06, 'batchnorm': False, 'learning_rate': 0.01, 'momentum': 0.9, 'batch_size': 128, 'nb_layers': 1, 'init': 'glorot_uniform', 'dropouts': 0.4, 'nb_epoch': 30, 'layer_sizes': (1024, 512), 'n_tasks': 1}
computed_metrics: [0.505910666768453]
Model 2/18, Metric r2_score, Validation set 1: 0.505911
best_validation_score so far: 0.505911
Fitting model 3/18
hyperparameters: {'penalty': 0.1, 'activation': 'relu', 'n_features': 2048, 'decay': 1e-06, 'batchnorm': False, 'learning_rate': 0.01, 'momentum': 0.9, 'batch_size': 128, 'nb_layers': 1, 'init': 'glorot_uniform', 'dropouts': 0.2, 'nb_epoch': 30, 'layer_sizes': (1024, 128), 'n_tasks': 1}
computed_metrics: [0.531299026515801]
Model 3/18, Metric r2_score, Validation set 2: 0.531299
best_validation_score so far: 0.531299
Fitting model 4/18
hyperparameters: {'penalty': 0.1, 'activation': 'relu', 'n_features': 2048, 'decay': 1e-06, 'batchnorm': False, 'learning_rate': 0.01, 'momentum': 0.9, 'batch_size': 128, 'nb_layers': 1, 'init': 'glorot_uniform', 'dropouts': 0.4, 'nb_epoch': 30, 'layer_sizes': (1024, 128), 'n_tasks': 1}
computed_metrics: [0.5740658340870408]
Model 4/18, Metric r2_score, Validation set 3: 0.574066
best_validation_score so far: 0.574066
Fitting model 5/18
hyperparameters: {'penalty': 0.1, 'activation': 'relu', 'n_features': 2048, 'decay': 1e-06, 'batchnorm': False, 'learning_rate': 0.01, 'momentum': 0.9, 'batch_size': 128, 'nb_layers': 1, 'init': 'glorot_uniform', 'dropouts': 0.2, 'nb_epoch': 30, 'layer_sizes': (512, 128), 'n_tasks': 1}
computed_metrics: [0.5305247291640403]
Model 5/18, Metric r2_score, Validation set 4: 0.530525
best_validation_score so far: 0.574066
Fitting model 6/18
hyperparameters: {'penalty': 0.1, 'activation': 'relu', 'n_features': 2048, 'decay': 1e-06, 'batchnorm': False, 'learning_rate': 0.01, 'momentum': 0.9, 'batch_size': 128, 'nb_layers': 1, 'init': 'glorot_uniform', 'dropouts': 0.4, 'nb_epoch': 30, 'layer_sizes': (512, 128), 'n_tasks': 1}
computed_metrics: [0.5153835975720726]
Model 6/18, Metric r2_score, Validation set 5: 0.515384
best_validation_score so far: 0.574066
Fitting model 7/18
hyperparameters: {'penalty': 0.1, 'activation': 'relu', 'n_features': 2048, 'decay': 1e-06, 'batchnorm': False, 'learning_rate': 0.01, 'momentum': 0.9, 'batch_size': 128, 'nb_layers': 1, 'init': 'glorot_uniform', 'dropouts': 0.2, 'nb_epoch': 30, 'layer_sizes': (512, 64), 'n_tasks': 1}
computed_metrics: [0.5228210218689161]
Model 7/18, Metric r2_score, Validation set 6: 0.522821
best_validation_score so far: 0.574066
Fitting model 8/18
hyperparameters: {'penalty': 0.1, 'activation': 'relu', 'n_features': 2048, 'decay': 1e-06, 'batchnorm': False, 'learning_rate': 0.01, 'momentum': 0.9, 'batch_size': 128, 'nb_layers': 1, 'init': 'glorot_uniform', 'dropouts': 0.4, 'nb_epoch': 30, 'layer_sizes': (512, 64), 'n_tasks': 1}
computed_metrics: [0.5418859521801785]
Model 8/18, Metric r2_score, Validation set 7: 0.541886
best_validation_score so far: 0.574066
Fitting model 9/18
hyperparameters: {'penalty': 0.1, 'activation': 'relu', 'n_features': 2048, 'decay': 1e-06, 'batchnorm': False, 'learning_rate': 0.01, 'momentum': 0.9, 'batch_size': 128, 'nb_layers': 1, 'init': 'glorot_uniform', 'dropouts': 0.2, 'nb_epoch': 30, 'layer_sizes': (128, 64), 'n_tasks': 1}
computed_metrics: [0.4595043813250017]
Model 9/18, Metric r2_score, Validation set 8: 0.459504
best_validation_score so far: 0.574066
Fitting model 10/18
hyperparameters: {'penalty': 0.1, 'activation': 'relu', 'n_features': 2048, 'decay': 1e-06, 'batchnorm': False, 'learning_rate': 0.01, 'momentum': 0.9, 'batch_size': 128, 'nb_layers': 1, 'init': 'glorot_uniform', 'dropouts': 0.4, 'nb_epoch': 30, 'layer_sizes': (128, 64), 'n_tasks': 1}
computed_metrics: [0.49186165422505523]
Model 10/18, Metric r2_score, Validation set 9: 0.491862
best_validation_score so far: 0.574066
Fitting model 11/18
hyperparameters: {'penalty': 0.1, 'activation': 'relu', 'n_features': 2048, 'decay': 1e-06, 'batchnorm': False, 'learning_rate': 0.01, 'momentum': 0.9, 'batch_size': 128, 'nb_layers': 1, 'init': 'glorot_uniform', 'dropouts': 0.2, 'nb_epoch': 30, 'layer_sizes': (64, 32), 'n_tasks': 1}
computed_metrics: [0.5167497160336099]
Model 11/18, Metric r2_score, Validation set 10: 0.516750
best_validation_score so far: 0.574066
Fitting model 12/18
hyperparameters: {'penalty': 0.1, 'activation': 'relu', 'n_features': 2048, 'decay': 1e-06, 'batchnorm': False, 'learning_rate': 0.01, 'momentum': 0.9, 'batch_size': 128, 'nb_layers': 1, 'init': 'glorot_uniform', 'dropouts': 0.4, 'nb_epoch': 30, 'layer_sizes': (64, 32), 'n_tasks': 1}
computed_metrics: [0.5196863583860438]
Model 12/18, Metric r2_score, Validation set 11: 0.519686
best_validation_score so far: 0.574066
Fitting model 13/18
hyperparameters: {'penalty': 0.1, 'activation': 'relu', 'n_features': 2048, 'decay': 1e-06, 'batchnorm': False, 'learning_rate': 0.01, 'momentum': 0.9, 'batch_size': 128, 'nb_layers': 1, 'init': 'glorot_uniform', 'dropouts': 0.2, 'nb_epoch': 30, 'layer_sizes': (1024, 512, 128), 'n_tasks': 1}
computed_metrics: [0.5085581237282972]
Model 13/18, Metric r2_score, Validation set 12: 0.508558
best_validation_score so far: 0.574066
Fitting model 14/18
hyperparameters: {'penalty': 0.1, 'activation': 'relu', 'n_features': 2048, 'decay': 1e-06, 'batchnorm': False, 'learning_rate': 0.01, 'momentum': 0.9, 'batch_size': 128, 'nb_layers': 1, 'init': 'glorot_uniform', 'dropouts': 0.4, 'nb_epoch': 30, 'layer_sizes': (1024, 512, 128), 'n_tasks': 1}
computed_metrics: [0.5375331002631859]
Model 14/18, Metric r2_score, Validation set 13: 0.537533
best_validation_score so far: 0.574066
Fitting model 15/18
hyperparameters: {'penalty': 0.1, 'activation': 'relu', 'n_features': 2048, 'decay': 1e-06, 'batchnorm': False, 'learning_rate': 0.01, 'momentum': 0.9, 'batch_size': 128, 'nb_layers': 1, 'init': 'glorot_uniform', 'dropouts': 0.2, 'nb_epoch': 30, 'layer_sizes': (512, 128, 64), 'n_tasks': 1}
computed_metrics: [0.5469272539992392]
Model 15/18, Metric r2_score, Validation set 14: 0.546927
best_validation_score so far: 0.574066
Fitting model 16/18
hyperparameters: {'penalty': 0.1, 'activation': 'relu', 'n_features': 2048, 'decay': 1e-06, 'batchnorm': False, 'learning_rate': 0.01, 'momentum': 0.9, 'batch_size': 128, 'nb_layers': 1, 'init': 'glorot_uniform', 'dropouts': 0.4, 'nb_epoch': 30, 'layer_sizes': (512, 128, 64), 'n_tasks': 1}
computed_metrics: [0.5650936130734651]
Model 16/18, Metric r2_score, Validation set 15: 0.565094
best_validation_score so far: 0.574066
Fitting model 17/18
hyperparameters: {'penalty': 0.1, 'activation': 'relu', 'n_features': 2048, 'decay': 1e-06, 'batchnorm': False, 'learning_rate': 0.01, 'momentum': 0.9, 'batch_size': 128, 'nb_layers': 1, 'init': 'glorot_uniform', 'dropouts': 0.2, 'nb_epoch': 30, 'layer_sizes': (128, 64, 32), 'n_tasks': 1}
computed_metrics: [0.48516907723167413]
Model 17/18, Metric r2_score, Validation set 16: 0.485169
best_validation_score so far: 0.574066
Fitting model 18/18
hyperparameters: {'penalty': 0.1, 'activation': 'relu', 'n_features': 2048, 'decay': 1e-06, 'batchnorm': False, 'learning_rate': 0.01, 'momentum': 0.9, 'batch_size': 128, 'nb_layers': 1, 'init': 'glorot_uniform', 'dropouts': 0.4, 'nb_epoch': 30, 'layer_sizes': (128, 64, 32), 'n_tasks': 1}
computed_metrics: [0.5617374658837493]
Model 18/18, Metric r2_score, Validation set 17: 0.561737
best_validation_score so far: 0.574066
computed_metrics: [0.8890936408066037]
Best hyperparameters: ((1024, 128), 0.1, 'relu', 2048, 1e-06, 0.01, 0.9, 128, 1, 'glorot_uniform', 30, 0.4, 1, False)
train_score: 0.889094
validation_score: 0.574066
Generate performace report ...
----------------------------------------------
Extracted dataset shape: (2063, 3)
Loading raw samples now.
shard_size: 8192
About to start loading CSV from ./logs/ST_2019_8_14_986/temp.csv
Loading shard 1 of size 8192.
Featurizing sample 0
Featurizing sample 1000
Featurizing sample 2000
TIMING: featurizing shard 0 took 8.051 s
TIMING: dataset construction took 8.207 s
Loading dataset from disk.
Preparing dataset for T108 of rep 0...
Computing train/valid/test indices
TIMING: dataset construction took 0.209 s
Loading dataset from disk.
TIMING: dataset construction took 0.084 s
Loading dataset from disk.
TIMING: dataset construction took 0.084 s
Loading dataset from disk.
Hyperprameter screening ...
Fitting model 1/18
hyperparameters: {'penalty': 0.1, 'activation': 'relu', 'n_features': 2048, 'decay': 1e-06, 'batchnorm': False, 'learning_rate': 0.01, 'momentum': 0.9, 'batch_size': 128, 'nb_layers': 1, 'init': 'glorot_uniform', 'dropouts': 0.2, 'nb_epoch': 30, 'layer_sizes': (1024, 512), 'n_tasks': 1}
computed_metrics: [0.4868601404823667]
Model 1/18, Metric r2_score, Validation set 0: 0.486860
best_validation_score so far: 0.486860
Fitting model 2/18
hyperparameters: {'penalty': 0.1, 'activation': 'relu', 'n_features': 2048, 'decay': 1e-06, 'batchnorm': False, 'learning_rate': 0.01, 'momentum': 0.9, 'batch_size': 128, 'nb_layers': 1, 'init': 'glorot_uniform', 'dropouts': 0.4, 'nb_epoch': 30, 'layer_sizes': (1024, 512), 'n_tasks': 1}
computed_metrics: [0.5342579656611066]
Model 2/18, Metric r2_score, Validation set 1: 0.534258
best_validation_score so far: 0.534258
Fitting model 3/18
hyperparameters: {'penalty': 0.1, 'activation': 'relu', 'n_features': 2048, 'decay': 1e-06, 'batchnorm': False, 'learning_rate': 0.01, 'momentum': 0.9, 'batch_size': 128, 'nb_layers': 1, 'init': 'glorot_uniform', 'dropouts': 0.2, 'nb_epoch': 30, 'layer_sizes': (1024, 128), 'n_tasks': 1}
computed_metrics: [0.6270370260083612]
Model 3/18, Metric r2_score, Validation set 2: 0.627037
best_validation_score so far: 0.627037
Fitting model 4/18
hyperparameters: {'penalty': 0.1, 'activation': 'relu', 'n_features': 2048, 'decay': 1e-06, 'batchnorm': False, 'learning_rate': 0.01, 'momentum': 0.9, 'batch_size': 128, 'nb_layers': 1, 'init': 'glorot_uniform', 'dropouts': 0.4, 'nb_epoch': 30, 'layer_sizes': (1024, 128), 'n_tasks': 1}
computed_metrics: [0.6428942871629781]
Model 4/18, Metric r2_score, Validation set 3: 0.642894
best_validation_score so far: 0.642894
Fitting model 5/18
hyperparameters: {'penalty': 0.1, 'activation': 'relu', 'n_features': 2048, 'decay': 1e-06, 'batchnorm': False, 'learning_rate': 0.01, 'momentum': 0.9, 'batch_size': 128, 'nb_layers': 1, 'init': 'glorot_uniform', 'dropouts': 0.2, 'nb_epoch': 30, 'layer_sizes': (512, 128), 'n_tasks': 1}
computed_metrics: [0.6158495290980834]
Model 5/18, Metric r2_score, Validation set 4: 0.615850
best_validation_score so far: 0.642894
Fitting model 6/18
hyperparameters: {'penalty': 0.1, 'activation': 'relu', 'n_features': 2048, 'decay': 1e-06, 'batchnorm': False, 'learning_rate': 0.01, 'momentum': 0.9, 'batch_size': 128, 'nb_layers': 1, 'init': 'glorot_uniform', 'dropouts': 0.4, 'nb_epoch': 30, 'layer_sizes': (512, 128), 'n_tasks': 1}
computed_metrics: [0.639602824852064]
Model 6/18, Metric r2_score, Validation set 5: 0.639603
best_validation_score so far: 0.642894
Fitting model 7/18
hyperparameters: {'penalty': 0.1, 'activation': 'relu', 'n_features': 2048, 'decay': 1e-06, 'batchnorm': False, 'learning_rate': 0.01, 'momentum': 0.9, 'batch_size': 128, 'nb_layers': 1, 'init': 'glorot_uniform', 'dropouts': 0.2, 'nb_epoch': 30, 'layer_sizes': (512, 64), 'n_tasks': 1}
computed_metrics: [0.5983536776309266]
Model 7/18, Metric r2_score, Validation set 6: 0.598354
best_validation_score so far: 0.642894
Fitting model 8/18
hyperparameters: {'penalty': 0.1, 'activation': 'relu', 'n_features': 2048, 'decay': 1e-06, 'batchnorm': False, 'learning_rate': 0.01, 'momentum': 0.9, 'batch_size': 128, 'nb_layers': 1, 'init': 'glorot_uniform', 'dropouts': 0.4, 'nb_epoch': 30, 'layer_sizes': (512, 64), 'n_tasks': 1}
computed_metrics: [0.6262399422329217]
Model 8/18, Metric r2_score, Validation set 7: 0.626240
best_validation_score so far: 0.642894
Fitting model 9/18
hyperparameters: {'penalty': 0.1, 'activation': 'relu', 'n_features': 2048, 'decay': 1e-06, 'batchnorm': False, 'learning_rate': 0.01, 'momentum': 0.9, 'batch_size': 128, 'nb_layers': 1, 'init': 'glorot_uniform', 'dropouts': 0.2, 'nb_epoch': 30, 'layer_sizes': (128, 64), 'n_tasks': 1}
computed_metrics: [0.5898805159564117]
Model 9/18, Metric r2_score, Validation set 8: 0.589881
best_validation_score so far: 0.642894
Fitting model 10/18
hyperparameters: {'penalty': 0.1, 'activation': 'relu', 'n_features': 2048, 'decay': 1e-06, 'batchnorm': False, 'learning_rate': 0.01, 'momentum': 0.9, 'batch_size': 128, 'nb_layers': 1, 'init': 'glorot_uniform', 'dropouts': 0.4, 'nb_epoch': 30, 'layer_sizes': (128, 64), 'n_tasks': 1}
computed_metrics: [0.6361380166768911]
Model 10/18, Metric r2_score, Validation set 9: 0.636138
best_validation_score so far: 0.642894
Fitting model 11/18
hyperparameters: {'penalty': 0.1, 'activation': 'relu', 'n_features': 2048, 'decay': 1e-06, 'batchnorm': False, 'learning_rate': 0.01, 'momentum': 0.9, 'batch_size': 128, 'nb_layers': 1, 'init': 'glorot_uniform', 'dropouts': 0.2, 'nb_epoch': 30, 'layer_sizes': (64, 32), 'n_tasks': 1}
computed_metrics: [0.590457088062357]
Model 11/18, Metric r2_score, Validation set 10: 0.590457
best_validation_score so far: 0.642894
Fitting model 12/18
hyperparameters: {'penalty': 0.1, 'activation': 'relu', 'n_features': 2048, 'decay': 1e-06, 'batchnorm': False, 'learning_rate': 0.01, 'momentum': 0.9, 'batch_size': 128, 'nb_layers': 1, 'init': 'glorot_uniform', 'dropouts': 0.4, 'nb_epoch': 30, 'layer_sizes': (64, 32), 'n_tasks': 1}
computed_metrics: [0.577278038754506]
Model 12/18, Metric r2_score, Validation set 11: 0.577278
best_validation_score so far: 0.642894
Fitting model 13/18
hyperparameters: {'penalty': 0.1, 'activation': 'relu', 'n_features': 2048, 'decay': 1e-06, 'batchnorm': False, 'learning_rate': 0.01, 'momentum': 0.9, 'batch_size': 128, 'nb_layers': 1, 'init': 'glorot_uniform', 'dropouts': 0.2, 'nb_epoch': 30, 'layer_sizes': (1024, 512, 128), 'n_tasks': 1}
computed_metrics: [0.5676634412049765]
Model 13/18, Metric r2_score, Validation set 12: 0.567663
best_validation_score so far: 0.642894
Fitting model 14/18
hyperparameters: {'penalty': 0.1, 'activation': 'relu', 'n_features': 2048, 'decay': 1e-06, 'batchnorm': False, 'learning_rate': 0.01, 'momentum': 0.9, 'batch_size': 128, 'nb_layers': 1, 'init': 'glorot_uniform', 'dropouts': 0.4, 'nb_epoch': 30, 'layer_sizes': (1024, 512, 128), 'n_tasks': 1}
computed_metrics: [0.4004630563531284]
Model 14/18, Metric r2_score, Validation set 13: 0.400463
best_validation_score so far: 0.642894
Fitting model 15/18
hyperparameters: {'penalty': 0.1, 'activation': 'relu', 'n_features': 2048, 'decay': 1e-06, 'batchnorm': False, 'learning_rate': 0.01, 'momentum': 0.9, 'batch_size': 128, 'nb_layers': 1, 'init': 'glorot_uniform', 'dropouts': 0.2, 'nb_epoch': 30, 'layer_sizes': (512, 128, 64), 'n_tasks': 1}
computed_metrics: [0.606689529089742]
Model 15/18, Metric r2_score, Validation set 14: 0.606690
best_validation_score so far: 0.642894
Fitting model 16/18
hyperparameters: {'penalty': 0.1, 'activation': 'relu', 'n_features': 2048, 'decay': 1e-06, 'batchnorm': False, 'learning_rate': 0.01, 'momentum': 0.9, 'batch_size': 128, 'nb_layers': 1, 'init': 'glorot_uniform', 'dropouts': 0.4, 'nb_epoch': 30, 'layer_sizes': (512, 128, 64), 'n_tasks': 1}
computed_metrics: [0.6226891620843279]
Model 16/18, Metric r2_score, Validation set 15: 0.622689
best_validation_score so far: 0.642894
Fitting model 17/18
hyperparameters: {'penalty': 0.1, 'activation': 'relu', 'n_features': 2048, 'decay': 1e-06, 'batchnorm': False, 'learning_rate': 0.01, 'momentum': 0.9, 'batch_size': 128, 'nb_layers': 1, 'init': 'glorot_uniform', 'dropouts': 0.2, 'nb_epoch': 30, 'layer_sizes': (128, 64, 32), 'n_tasks': 1}
computed_metrics: [0.5759372539391608]
Model 17/18, Metric r2_score, Validation set 16: 0.575937
best_validation_score so far: 0.642894
Fitting model 18/18
hyperparameters: {'penalty': 0.1, 'activation': 'relu', 'n_features': 2048, 'decay': 1e-06, 'batchnorm': False, 'learning_rate': 0.01, 'momentum': 0.9, 'batch_size': 128, 'nb_layers': 1, 'init': 'glorot_uniform', 'dropouts': 0.4, 'nb_epoch': 30, 'layer_sizes': (128, 64, 32), 'n_tasks': 1}
computed_metrics: [0.6011259388892263]
Model 18/18, Metric r2_score, Validation set 17: 0.601126
best_validation_score so far: 0.642894
computed_metrics: [0.878111896316526]
Best hyperparameters: ((1024, 128), 0.1, 'relu', 2048, 1e-06, 0.01, 0.9, 128, 1, 'glorot_uniform', 30, 0.4, 1, False)
train_score: 0.878112
validation_score: 0.642894
Generate performace report ...
Preparing dataset for T108 of rep 1...
Computing train/valid/test indices
TIMING: dataset construction took 0.234 s
Loading dataset from disk.
TIMING: dataset construction took 0.088 s
Loading dataset from disk.
TIMING: dataset construction took 0.083 s
Loading dataset from disk.
Hyperprameter screening ...
Fitting model 1/18
hyperparameters: {'penalty': 0.1, 'activation': 'relu', 'n_features': 2048, 'decay': 1e-06, 'batchnorm': False, 'learning_rate': 0.01, 'momentum': 0.9, 'batch_size': 128, 'nb_layers': 1, 'init': 'glorot_uniform', 'dropouts': 0.2, 'nb_epoch': 30, 'layer_sizes': (1024, 512), 'n_tasks': 1}
computed_metrics: [0.4761562050362087]
Model 1/18, Metric r2_score, Validation set 0: 0.476156
best_validation_score so far: 0.476156
Fitting model 2/18
hyperparameters: {'penalty': 0.1, 'activation': 'relu', 'n_features': 2048, 'decay': 1e-06, 'batchnorm': False, 'learning_rate': 0.01, 'momentum': 0.9, 'batch_size': 128, 'nb_layers': 1, 'init': 'glorot_uniform', 'dropouts': 0.4, 'nb_epoch': 30, 'layer_sizes': (1024, 512), 'n_tasks': 1}
computed_metrics: [0.5603185611901793]
Model 2/18, Metric r2_score, Validation set 1: 0.560319
best_validation_score so far: 0.560319
Fitting model 3/18
hyperparameters: {'penalty': 0.1, 'activation': 'relu', 'n_features': 2048, 'decay': 1e-06, 'batchnorm': False, 'learning_rate': 0.01, 'momentum': 0.9, 'batch_size': 128, 'nb_layers': 1, 'init': 'glorot_uniform', 'dropouts': 0.2, 'nb_epoch': 30, 'layer_sizes': (1024, 128), 'n_tasks': 1}
computed_metrics: [0.6139116143568223]
Model 3/18, Metric r2_score, Validation set 2: 0.613912
best_validation_score so far: 0.613912
Fitting model 4/18
hyperparameters: {'penalty': 0.1, 'activation': 'relu', 'n_features': 2048, 'decay': 1e-06, 'batchnorm': False, 'learning_rate': 0.01, 'momentum': 0.9, 'batch_size': 128, 'nb_layers': 1, 'init': 'glorot_uniform', 'dropouts': 0.4, 'nb_epoch': 30, 'layer_sizes': (1024, 128), 'n_tasks': 1}
computed_metrics: [0.5952160917993221]
Model 4/18, Metric r2_score, Validation set 3: 0.595216
best_validation_score so far: 0.613912
Fitting model 5/18
hyperparameters: {'penalty': 0.1, 'activation': 'relu', 'n_features': 2048, 'decay': 1e-06, 'batchnorm': False, 'learning_rate': 0.01, 'momentum': 0.9, 'batch_size': 128, 'nb_layers': 1, 'init': 'glorot_uniform', 'dropouts': 0.2, 'nb_epoch': 30, 'layer_sizes': (512, 128), 'n_tasks': 1}
computed_metrics: [0.6196258627688307]
Model 5/18, Metric r2_score, Validation set 4: 0.619626
best_validation_score so far: 0.619626
Fitting model 6/18
hyperparameters: {'penalty': 0.1, 'activation': 'relu', 'n_features': 2048, 'decay': 1e-06, 'batchnorm': False, 'learning_rate': 0.01, 'momentum': 0.9, 'batch_size': 128, 'nb_layers': 1, 'init': 'glorot_uniform', 'dropouts': 0.4, 'nb_epoch': 30, 'layer_sizes': (512, 128), 'n_tasks': 1}
computed_metrics: [0.6157923884866285]
Model 6/18, Metric r2_score, Validation set 5: 0.615792
best_validation_score so far: 0.619626
Fitting model 7/18
hyperparameters: {'penalty': 0.1, 'activation': 'relu', 'n_features': 2048, 'decay': 1e-06, 'batchnorm': False, 'learning_rate': 0.01, 'momentum': 0.9, 'batch_size': 128, 'nb_layers': 1, 'init': 'glorot_uniform', 'dropouts': 0.2, 'nb_epoch': 30, 'layer_sizes': (512, 64), 'n_tasks': 1}
computed_metrics: [0.6619856420412034]
Model 7/18, Metric r2_score, Validation set 6: 0.661986
best_validation_score so far: 0.661986
Fitting model 8/18
hyperparameters: {'penalty': 0.1, 'activation': 'relu', 'n_features': 2048, 'decay': 1e-06, 'batchnorm': False, 'learning_rate': 0.01, 'momentum': 0.9, 'batch_size': 128, 'nb_layers': 1, 'init': 'glorot_uniform', 'dropouts': 0.4, 'nb_epoch': 30, 'layer_sizes': (512, 64), 'n_tasks': 1}
computed_metrics: [0.6730749328874235]
Model 8/18, Metric r2_score, Validation set 7: 0.673075
best_validation_score so far: 0.673075
Fitting model 9/18
hyperparameters: {'penalty': 0.1, 'activation': 'relu', 'n_features': 2048, 'decay': 1e-06, 'batchnorm': False, 'learning_rate': 0.01, 'momentum': 0.9, 'batch_size': 128, 'nb_layers': 1, 'init': 'glorot_uniform', 'dropouts': 0.2, 'nb_epoch': 30, 'layer_sizes': (128, 64), 'n_tasks': 1}
computed_metrics: [0.6280516778814411]
Model 9/18, Metric r2_score, Validation set 8: 0.628052
best_validation_score so far: 0.673075
Fitting model 10/18
hyperparameters: {'penalty': 0.1, 'activation': 'relu', 'n_features': 2048, 'decay': 1e-06, 'batchnorm': False, 'learning_rate': 0.01, 'momentum': 0.9, 'batch_size': 128, 'nb_layers': 1, 'init': 'glorot_uniform', 'dropouts': 0.4, 'nb_epoch': 30, 'layer_sizes': (128, 64), 'n_tasks': 1}
computed_metrics: [0.6754045394217645]
Model 10/18, Metric r2_score, Validation set 9: 0.675405
best_validation_score so far: 0.675405
Fitting model 11/18
hyperparameters: {'penalty': 0.1, 'activation': 'relu', 'n_features': 2048, 'decay': 1e-06, 'batchnorm': False, 'learning_rate': 0.01, 'momentum': 0.9, 'batch_size': 128, 'nb_layers': 1, 'init': 'glorot_uniform', 'dropouts': 0.2, 'nb_epoch': 30, 'layer_sizes': (64, 32), 'n_tasks': 1}
computed_metrics: [0.6099613250901108]
Model 11/18, Metric r2_score, Validation set 10: 0.609961
best_validation_score so far: 0.675405
Fitting model 12/18
hyperparameters: {'penalty': 0.1, 'activation': 'relu', 'n_features': 2048, 'decay': 1e-06, 'batchnorm': False, 'learning_rate': 0.01, 'momentum': 0.9, 'batch_size': 128, 'nb_layers': 1, 'init': 'glorot_uniform', 'dropouts': 0.4, 'nb_epoch': 30, 'layer_sizes': (64, 32), 'n_tasks': 1}
computed_metrics: [0.6657732916243124]
Model 12/18, Metric r2_score, Validation set 11: 0.665773
best_validation_score so far: 0.675405
Fitting model 13/18
hyperparameters: {'penalty': 0.1, 'activation': 'relu', 'n_features': 2048, 'decay': 1e-06, 'batchnorm': False, 'learning_rate': 0.01, 'momentum': 0.9, 'batch_size': 128, 'nb_layers': 1, 'init': 'glorot_uniform', 'dropouts': 0.2, 'nb_epoch': 30, 'layer_sizes': (1024, 512, 128), 'n_tasks': 1}
computed_metrics: [0.5409865108703251]
Model 13/18, Metric r2_score, Validation set 12: 0.540987
best_validation_score so far: 0.675405
Fitting model 14/18
hyperparameters: {'penalty': 0.1, 'activation': 'relu', 'n_features': 2048, 'decay': 1e-06, 'batchnorm': False, 'learning_rate': 0.01, 'momentum': 0.9, 'batch_size': 128, 'nb_layers': 1, 'init': 'glorot_uniform', 'dropouts': 0.4, 'nb_epoch': 30, 'layer_sizes': (1024, 512, 128), 'n_tasks': 1}
computed_metrics: [0.5270717282697137]
Model 14/18, Metric r2_score, Validation set 13: 0.527072
best_validation_score so far: 0.675405
Fitting model 15/18
hyperparameters: {'penalty': 0.1, 'activation': 'relu', 'n_features': 2048, 'decay': 1e-06, 'batchnorm': False, 'learning_rate': 0.01, 'momentum': 0.9, 'batch_size': 128, 'nb_layers': 1, 'init': 'glorot_uniform', 'dropouts': 0.2, 'nb_epoch': 30, 'layer_sizes': (512, 128, 64), 'n_tasks': 1}
computed_metrics: [0.6137628239191688]
Model 15/18, Metric r2_score, Validation set 14: 0.613763
best_validation_score so far: 0.675405
Fitting model 16/18
hyperparameters: {'penalty': 0.1, 'activation': 'relu', 'n_features': 2048, 'decay': 1e-06, 'batchnorm': False, 'learning_rate': 0.01, 'momentum': 0.9, 'batch_size': 128, 'nb_layers': 1, 'init': 'glorot_uniform', 'dropouts': 0.4, 'nb_epoch': 30, 'layer_sizes': (512, 128, 64), 'n_tasks': 1}
computed_metrics: [0.6438303000670302]
Model 16/18, Metric r2_score, Validation set 15: 0.643830
best_validation_score so far: 0.675405
Fitting model 17/18
hyperparameters: {'penalty': 0.1, 'activation': 'relu', 'n_features': 2048, 'decay': 1e-06, 'batchnorm': False, 'learning_rate': 0.01, 'momentum': 0.9, 'batch_size': 128, 'nb_layers': 1, 'init': 'glorot_uniform', 'dropouts': 0.2, 'nb_epoch': 30, 'layer_sizes': (128, 64, 32), 'n_tasks': 1}
computed_metrics: [0.6337577807823557]
Model 17/18, Metric r2_score, Validation set 16: 0.633758
best_validation_score so far: 0.675405
Fitting model 18/18
hyperparameters: {'penalty': 0.1, 'activation': 'relu', 'n_features': 2048, 'decay': 1e-06, 'batchnorm': False, 'learning_rate': 0.01, 'momentum': 0.9, 'batch_size': 128, 'nb_layers': 1, 'init': 'glorot_uniform', 'dropouts': 0.4, 'nb_epoch': 30, 'layer_sizes': (128, 64, 32), 'n_tasks': 1}
computed_metrics: [0.6327424936791699]
Model 18/18, Metric r2_score, Validation set 17: 0.632742
best_validation_score so far: 0.675405
computed_metrics: [0.9318955372212835]
Best hyperparameters: ((128, 64), 0.1, 'relu', 2048, 1e-06, 0.01, 0.9, 128, 1, 'glorot_uniform', 30, 0.4, 1, False)
train_score: 0.931896
validation_score: 0.675405
Generate performace report ...
Preparing dataset for T108 of rep 2...
Computing train/valid/test indices
TIMING: dataset construction took 0.202 s
Loading dataset from disk.
TIMING: dataset construction took 0.085 s
Loading dataset from disk.
TIMING: dataset construction took 0.084 s
Loading dataset from disk.
Hyperprameter screening ...
Fitting model 1/18
hyperparameters: {'penalty': 0.1, 'activation': 'relu', 'n_features': 2048, 'decay': 1e-06, 'batchnorm': False, 'learning_rate': 0.01, 'momentum': 0.9, 'batch_size': 128, 'nb_layers': 1, 'init': 'glorot_uniform', 'dropouts': 0.2, 'nb_epoch': 30, 'layer_sizes': (1024, 512), 'n_tasks': 1}
computed_metrics: [0.4635391750402559]
Model 1/18, Metric r2_score, Validation set 0: 0.463539
best_validation_score so far: 0.463539
Fitting model 2/18
hyperparameters: {'penalty': 0.1, 'activation': 'relu', 'n_features': 2048, 'decay': 1e-06, 'batchnorm': False, 'learning_rate': 0.01, 'momentum': 0.9, 'batch_size': 128, 'nb_layers': 1, 'init': 'glorot_uniform', 'dropouts': 0.4, 'nb_epoch': 30, 'layer_sizes': (1024, 512), 'n_tasks': 1}
computed_metrics: [0.4592714074438822]
Model 2/18, Metric r2_score, Validation set 1: 0.459271
best_validation_score so far: 0.463539
Fitting model 3/18
hyperparameters: {'penalty': 0.1, 'activation': 'relu', 'n_features': 2048, 'decay': 1e-06, 'batchnorm': False, 'learning_rate': 0.01, 'momentum': 0.9, 'batch_size': 128, 'nb_layers': 1, 'init': 'glorot_uniform', 'dropouts': 0.2, 'nb_epoch': 30, 'layer_sizes': (1024, 128), 'n_tasks': 1}
computed_metrics: [0.5656331184616337]
Model 3/18, Metric r2_score, Validation set 2: 0.565633
best_validation_score so far: 0.565633
Fitting model 4/18
hyperparameters: {'penalty': 0.1, 'activation': 'relu', 'n_features': 2048, 'decay': 1e-06, 'batchnorm': False, 'learning_rate': 0.01, 'momentum': 0.9, 'batch_size': 128, 'nb_layers': 1, 'init': 'glorot_uniform', 'dropouts': 0.4, 'nb_epoch': 30, 'layer_sizes': (1024, 128), 'n_tasks': 1}
computed_metrics: [0.5585518225147716]
Model 4/18, Metric r2_score, Validation set 3: 0.558552
best_validation_score so far: 0.565633
Fitting model 5/18
hyperparameters: {'penalty': 0.1, 'activation': 'relu', 'n_features': 2048, 'decay': 1e-06, 'batchnorm': False, 'learning_rate': 0.01, 'momentum': 0.9, 'batch_size': 128, 'nb_layers': 1, 'init': 'glorot_uniform', 'dropouts': 0.2, 'nb_epoch': 30, 'layer_sizes': (512, 128), 'n_tasks': 1}
computed_metrics: [0.573778460798537]
Model 5/18, Metric r2_score, Validation set 4: 0.573778
best_validation_score so far: 0.573778
Fitting model 6/18
hyperparameters: {'penalty': 0.1, 'activation': 'relu', 'n_features': 2048, 'decay': 1e-06, 'batchnorm': False, 'learning_rate': 0.01, 'momentum': 0.9, 'batch_size': 128, 'nb_layers': 1, 'init': 'glorot_uniform', 'dropouts': 0.4, 'nb_epoch': 30, 'layer_sizes': (512, 128), 'n_tasks': 1}
computed_metrics: [0.5614257800104939]
Model 6/18, Metric r2_score, Validation set 5: 0.561426
best_validation_score so far: 0.573778
Fitting model 7/18
hyperparameters: {'penalty': 0.1, 'activation': 'relu', 'n_features': 2048, 'decay': 1e-06, 'batchnorm': False, 'learning_rate': 0.01, 'momentum': 0.9, 'batch_size': 128, 'nb_layers': 1, 'init': 'glorot_uniform', 'dropouts': 0.2, 'nb_epoch': 30, 'layer_sizes': (512, 64), 'n_tasks': 1}
computed_metrics: [0.4895738450317698]
Model 7/18, Metric r2_score, Validation set 6: 0.489574
best_validation_score so far: 0.573778
Fitting model 8/18
hyperparameters: {'penalty': 0.1, 'activation': 'relu', 'n_features': 2048, 'decay': 1e-06, 'batchnorm': False, 'learning_rate': 0.01, 'momentum': 0.9, 'batch_size': 128, 'nb_layers': 1, 'init': 'glorot_uniform', 'dropouts': 0.4, 'nb_epoch': 30, 'layer_sizes': (512, 64), 'n_tasks': 1}
computed_metrics: [0.5554727008272755]
Model 8/18, Metric r2_score, Validation set 7: 0.555473
best_validation_score so far: 0.573778
Fitting model 9/18
hyperparameters: {'penalty': 0.1, 'activation': 'relu', 'n_features': 2048, 'decay': 1e-06, 'batchnorm': False, 'learning_rate': 0.01, 'momentum': 0.9, 'batch_size': 128, 'nb_layers': 1, 'init': 'glorot_uniform', 'dropouts': 0.2, 'nb_epoch': 30, 'layer_sizes': (128, 64), 'n_tasks': 1}
computed_metrics: [0.5324274902691197]
Model 9/18, Metric r2_score, Validation set 8: 0.532427
best_validation_score so far: 0.573778
Fitting model 10/18
hyperparameters: {'penalty': 0.1, 'activation': 'relu', 'n_features': 2048, 'decay': 1e-06, 'batchnorm': False, 'learning_rate': 0.01, 'momentum': 0.9, 'batch_size': 128, 'nb_layers': 1, 'init': 'glorot_uniform', 'dropouts': 0.4, 'nb_epoch': 30, 'layer_sizes': (128, 64), 'n_tasks': 1}
computed_metrics: [0.5710144564196316]
Model 10/18, Metric r2_score, Validation set 9: 0.571014
best_validation_score so far: 0.573778
Fitting model 11/18
hyperparameters: {'penalty': 0.1, 'activation': 'relu', 'n_features': 2048, 'decay': 1e-06, 'batchnorm': False, 'learning_rate': 0.01, 'momentum': 0.9, 'batch_size': 128, 'nb_layers': 1, 'init': 'glorot_uniform', 'dropouts': 0.2, 'nb_epoch': 30, 'layer_sizes': (64, 32), 'n_tasks': 1}
computed_metrics: [0.5176407099870516]
Model 11/18, Metric r2_score, Validation set 10: 0.517641
best_validation_score so far: 0.573778
Fitting model 12/18
hyperparameters: {'penalty': 0.1, 'activation': 'relu', 'n_features': 2048, 'decay': 1e-06, 'batchnorm': False, 'learning_rate': 0.01, 'momentum': 0.9, 'batch_size': 128, 'nb_layers': 1, 'init': 'glorot_uniform', 'dropouts': 0.4, 'nb_epoch': 30, 'layer_sizes': (64, 32), 'n_tasks': 1}
computed_metrics: [0.5586115495820354]
Model 12/18, Metric r2_score, Validation set 11: 0.558612
best_validation_score so far: 0.573778
Fitting model 13/18
hyperparameters: {'penalty': 0.1, 'activation': 'relu', 'n_features': 2048, 'decay': 1e-06, 'batchnorm': False, 'learning_rate': 0.01, 'momentum': 0.9, 'batch_size': 128, 'nb_layers': 1, 'init': 'glorot_uniform', 'dropouts': 0.2, 'nb_epoch': 30, 'layer_sizes': (1024, 512, 128), 'n_tasks': 1}
computed_metrics: [0.5543710979710084]
Model 13/18, Metric r2_score, Validation set 12: 0.554371
best_validation_score so far: 0.573778
Fitting model 14/18
hyperparameters: {'penalty': 0.1, 'activation': 'relu', 'n_features': 2048, 'decay': 1e-06, 'batchnorm': False, 'learning_rate': 0.01, 'momentum': 0.9, 'batch_size': 128, 'nb_layers': 1, 'init': 'glorot_uniform', 'dropouts': 0.4, 'nb_epoch': 30, 'layer_sizes': (1024, 512, 128), 'n_tasks': 1}
computed_metrics: [0.6054202284109258]
Model 14/18, Metric r2_score, Validation set 13: 0.605420
best_validation_score so far: 0.605420
Fitting model 15/18
hyperparameters: {'penalty': 0.1, 'activation': 'relu', 'n_features': 2048, 'decay': 1e-06, 'batchnorm': False, 'learning_rate': 0.01, 'momentum': 0.9, 'batch_size': 128, 'nb_layers': 1, 'init': 'glorot_uniform', 'dropouts': 0.2, 'nb_epoch': 30, 'layer_sizes': (512, 128, 64), 'n_tasks': 1}
computed_metrics: [0.5903675109864684]
Model 15/18, Metric r2_score, Validation set 14: 0.590368
best_validation_score so far: 0.605420
Fitting model 16/18
hyperparameters: {'penalty': 0.1, 'activation': 'relu', 'n_features': 2048, 'decay': 1e-06, 'batchnorm': False, 'learning_rate': 0.01, 'momentum': 0.9, 'batch_size': 128, 'nb_layers': 1, 'init': 'glorot_uniform', 'dropouts': 0.4, 'nb_epoch': 30, 'layer_sizes': (512, 128, 64), 'n_tasks': 1}
computed_metrics: [0.6108050572454018]
Model 16/18, Metric r2_score, Validation set 15: 0.610805
best_validation_score so far: 0.610805
Fitting model 17/18
hyperparameters: {'penalty': 0.1, 'activation': 'relu', 'n_features': 2048, 'decay': 1e-06, 'batchnorm': False, 'learning_rate': 0.01, 'momentum': 0.9, 'batch_size': 128, 'nb_layers': 1, 'init': 'glorot_uniform', 'dropouts': 0.2, 'nb_epoch': 30, 'layer_sizes': (128, 64, 32), 'n_tasks': 1}
computed_metrics: [0.522967609358389]
Model 17/18, Metric r2_score, Validation set 16: 0.522968
best_validation_score so far: 0.610805
Fitting model 18/18
hyperparameters: {'penalty': 0.1, 'activation': 'relu', 'n_features': 2048, 'decay': 1e-06, 'batchnorm': False, 'learning_rate': 0.01, 'momentum': 0.9, 'batch_size': 128, 'nb_layers': 1, 'init': 'glorot_uniform', 'dropouts': 0.4, 'nb_epoch': 30, 'layer_sizes': (128, 64, 32), 'n_tasks': 1}
computed_metrics: [0.6275553205084103]
Model 18/18, Metric r2_score, Validation set 17: 0.627555
best_validation_score so far: 0.627555
computed_metrics: [0.9142279933418767]
Best hyperparameters: ((128, 64, 32), 0.1, 'relu', 2048, 1e-06, 0.01, 0.9, 128, 1, 'glorot_uniform', 30, 0.4, 1, False)
train_score: 0.914228
validation_score: 0.627555
Generate performace report ...
# manually pick the training parameters, referring to hyperparam_log saved in RUN_KEY directory
best_hyperparams = {'T107': [(512,64,1), 0.4],
'T108': [(512,128,1), 0.2]
}
model training
MT_dat_name = './data/MT_data_clean_Feb28.csv'
FP_type = 'Circular_2048'
RUN_KEY = 'ST_2019_8_14_986'
# model training
output_df = ST_model_training(MT_dat_name, FP_type,
best_hyperparams, result_path = './logs/'+RUN_KEY)
----------------------------------------------
Extracted dataset shape: (2063, 3)
Loading raw samples now.
shard_size: 8192
About to start loading CSV from ./logs/ST_2019_8_14_986/temp.csv
Loading shard 1 of size 8192.
Featurizing sample 0
Featurizing sample 1000
Featurizing sample 2000
TIMING: featurizing shard 0 took 8.254 s
TIMING: dataset construction took 8.417 s
Loading dataset from disk.
Preparing dataset for T108 of rep 0...
Computing train/valid/test indices
TIMING: dataset construction took 0.211 s
Loading dataset from disk.
TIMING: dataset construction took 0.098 s
Loading dataset from disk.
Model training ...
WARNING:tensorflow:From /root/anaconda3/envs/deepchem/lib/python3.5/site-packages/keras/backend/tensorflow_backend.py:1108: calling reduce_mean (from tensorflow.python.ops.math_ops) with keep_dims is deprecated and will be removed in a future version.
Instructions for updating:
keep_dims is deprecated, use keepdims instead
Epoch 1/5
1650/1650 [==============================] - 0s - loss: 4.5879
Epoch 2/5
1650/1650 [==============================] - 0s - loss: 1.1412
Epoch 3/5
1650/1650 [==============================] - 0s - loss: 0.7993
Epoch 4/5
1650/1650 [==============================] - 0s - loss: 0.6275
Epoch 5/5
1650/1650 [==============================] - 0s - loss: 0.5385
Epoch 1/5
1650/1650 [==============================] - 0s - loss: 0.0605
Epoch 2/5
1650/1650 [==============================] - 0s - loss: 0.0595
Epoch 3/5
1650/1650 [==============================] - 0s - loss: 0.0688
Epoch 4/5
1650/1650 [==============================] - 0s - loss: 0.0618
Epoch 5/5
1650/1650 [==============================] - 0s - loss: 0.0603
Epoch 1/5
1650/1650 [==============================] - 0s - loss: 0.0463
Epoch 2/5
1650/1650 [==============================] - 0s - loss: 0.0485
Epoch 3/5
1650/1650 [==============================] - 0s - loss: 0.0494
Epoch 4/5
1650/1650 [==============================] - 0s - loss: 0.0474
Epoch 5/5
1650/1650 [==============================] - 0s - loss: 0.0476
Epoch 1/5
1650/1650 [==============================] - 0s - loss: 0.0399
Epoch 2/5
1650/1650 [==============================] - 0s - loss: 0.0407
Epoch 3/5
1650/1650 [==============================] - 0s - loss: 0.0406
Epoch 4/5
1650/1650 [==============================] - 0s - loss: 0.0408
Epoch 5/5
1650/1650 [==============================] - 0s - loss: 0.0365
Epoch 1/5
1650/1650 [==============================] - 0s - loss: 0.0359
Epoch 2/5
1650/1650 [==============================] - 0s - loss: 0.0370
Epoch 3/5
1650/1650 [==============================] - 0s - loss: 0.0334
Epoch 4/5
1650/1650 [==============================] - 0s - loss: 0.0368
Epoch 5/5
1650/1650 [==============================] - 0s - loss: 0.0380
Epoch 1/5
1650/1650 [==============================] - 0s - loss: 0.0320
Epoch 2/5
1650/1650 [==============================] - 0s - loss: 0.0338
Epoch 3/5
1650/1650 [==============================] - 0s - loss: 0.0335
Epoch 4/5
1650/1650 [==============================] - 0s - loss: 0.0336
Epoch 5/5
1650/1650 [==============================] - 0s - loss: 0.0327
Training baseline models ...
Saving metrics ...
Preparing dataset for T108 of rep 1...
Computing train/valid/test indices
TIMING: dataset construction took 0.187 s
Loading dataset from disk.
TIMING: dataset construction took 0.103 s
Loading dataset from disk.
Model training ...
Epoch 1/5
1650/1650 [==============================] - 0s - loss: 4.3973
Epoch 2/5
1650/1650 [==============================] - 0s - loss: 1.1658
Epoch 3/5
1650/1650 [==============================] - 0s - loss: 0.8354
Epoch 4/5
1650/1650 [==============================] - 0s - loss: 0.6408
Epoch 5/5
1650/1650 [==============================] - 0s - loss: 0.5431
Epoch 1/5
1650/1650 [==============================] - 0s - loss: 0.0624
Epoch 2/5
1650/1650 [==============================] - 0s - loss: 0.0663
Epoch 3/5
1650/1650 [==============================] - 0s - loss: 0.0614
Epoch 4/5
1650/1650 [==============================] - 0s - loss: 0.0589
Epoch 5/5
1650/1650 [==============================] - 0s - loss: 0.0613
Epoch 1/5
1650/1650 [==============================] - 0s - loss: 0.0494
Epoch 2/5
1650/1650 [==============================] - 0s - loss: 0.0490
Epoch 3/5
1650/1650 [==============================] - 0s - loss: 0.0460
Epoch 4/5
1650/1650 [==============================] - 0s - loss: 0.0514
Epoch 5/5
1650/1650 [==============================] - 0s - loss: 0.0473
Epoch 1/5
1650/1650 [==============================] - 0s - loss: 0.0440
Epoch 2/5
1650/1650 [==============================] - 0s - loss: 0.0419
Epoch 3/5
1650/1650 [==============================] - 0s - loss: 0.0411
Epoch 4/5
1650/1650 [==============================] - 0s - loss: 0.0422
Epoch 5/5
1650/1650 [==============================] - 0s - loss: 0.0417
Epoch 1/5
1650/1650 [==============================] - 0s - loss: 0.0348
Epoch 2/5
1650/1650 [==============================] - 0s - loss: 0.0359
Epoch 3/5
1650/1650 [==============================] - 0s - loss: 0.0364
Epoch 4/5
1650/1650 [==============================] - 0s - loss: 0.0380
Epoch 5/5
1650/1650 [==============================] - 0s - loss: 0.0353
Epoch 1/5
1650/1650 [==============================] - 0s - loss: 0.0328
Epoch 2/5
1650/1650 [==============================] - 0s - loss: 0.0319
Epoch 3/5
1650/1650 [==============================] - 0s - loss: 0.0323
Epoch 4/5
1650/1650 [==============================] - 0s - loss: 0.0348
Epoch 5/5
1650/1650 [==============================] - 0s - loss: 0.0314
Training baseline models ...
Saving metrics ...
Preparing dataset for T108 of rep 2...
Computing train/valid/test indices
TIMING: dataset construction took 0.195 s
Loading dataset from disk.
TIMING: dataset construction took 0.098 s
Loading dataset from disk.
Model training ...
Epoch 1/5
1650/1650 [==============================] - 0s - loss: 6.0135
Epoch 2/5
1650/1650 [==============================] - 0s - loss: 1.3239
Epoch 3/5
1650/1650 [==============================] - 0s - loss: 0.8725
Epoch 4/5
1650/1650 [==============================] - 0s - loss: 0.7057
Epoch 5/5
1650/1650 [==============================] - 0s - loss: 0.5775
Epoch 1/5
1650/1650 [==============================] - 0s - loss: 0.0595
Epoch 2/5
1650/1650 [==============================] - 0s - loss: 0.0606
Epoch 3/5
1650/1650 [==============================] - 0s - loss: 0.0582
Epoch 4/5
1650/1650 [==============================] - 0s - loss: 0.0632
Epoch 5/5
1650/1650 [==============================] - 0s - loss: 0.0626
Epoch 1/5
1650/1650 [==============================] - 0s - loss: 0.0482
Epoch 2/5
1650/1650 [==============================] - 0s - loss: 0.0468
Epoch 3/5
1650/1650 [==============================] - 0s - loss: 0.0476
Epoch 4/5
1650/1650 [==============================] - 0s - loss: 0.0470
Epoch 5/5
1650/1650 [==============================] - 0s - loss: 0.0477
Epoch 1/5
1650/1650 [==============================] - 0s - loss: 0.0349
Epoch 2/5
1650/1650 [==============================] - 0s - loss: 0.0360
Epoch 3/5
1650/1650 [==============================] - 0s - loss: 0.0362
Epoch 4/5
1650/1650 [==============================] - 0s - loss: 0.0352
Epoch 5/5
1650/1650 [==============================] - 0s - loss: 0.0357
Epoch 1/5
1650/1650 [==============================] - 0s - loss: 0.0301
Epoch 2/5
1650/1650 [==============================] - 0s - loss: 0.0302
Epoch 3/5
1650/1650 [==============================] - 0s - loss: 0.0336
Epoch 4/5
1650/1650 [==============================] - 0s - loss: 0.0299
Epoch 5/5
1650/1650 [==============================] - 0s - loss: 0.0300
Epoch 1/5
1650/1650 [==============================] - 0s - loss: 0.0278
Epoch 2/5
1650/1650 [==============================] - 0s - loss: 0.0263
Epoch 3/5
1650/1650 [==============================] - 0s - loss: 0.0283
Epoch 4/5
1650/1650 [==============================] - 0s - loss: 0.0284
Epoch 5/5
1650/1650 [==============================] - 0s - loss: 0.0278
Training baseline models ...
Saving metrics ...
Generate performace report ...
----------------------------------------------
Extracted dataset shape: (2951, 3)
Loading raw samples now.
shard_size: 8192
About to start loading CSV from ./logs/ST_2019_8_14_986/temp.csv
Loading shard 1 of size 8192.
Featurizing sample 0
Featurizing sample 1000
Featurizing sample 2000
TIMING: featurizing shard 0 took 11.116 s
TIMING: dataset construction took 11.333 s
Loading dataset from disk.
Preparing dataset for T107 of rep 0...
Computing train/valid/test indices
TIMING: dataset construction took 0.281 s
Loading dataset from disk.
TIMING: dataset construction took 0.151 s
Loading dataset from disk.
Model training ...
Epoch 1/5
2360/2360 [==============================] - 1s - loss: 5.0688
Epoch 2/5
2360/2360 [==============================] - 0s - loss: 1.6524
Epoch 3/5
2360/2360 [==============================] - 0s - loss: 1.2350
Epoch 4/5
2360/2360 [==============================] - 0s - loss: 1.0929
Epoch 5/5
2360/2360 [==============================] - 0s - loss: 0.9332
Epoch 1/5
2360/2360 [==============================] - 0s - loss: 0.1423
Epoch 2/5
2360/2360 [==============================] - 0s - loss: 0.1396
Epoch 3/5
2360/2360 [==============================] - 0s - loss: 0.1356
Epoch 4/5
2360/2360 [==============================] - 0s - loss: 0.1390
Epoch 5/5
2360/2360 [==============================] - 0s - loss: 0.1316
Epoch 1/5
2360/2360 [==============================] - 0s - loss: 0.0837
Epoch 2/5
2360/2360 [==============================] - 0s - loss: 0.0798
Epoch 3/5
2360/2360 [==============================] - 0s - loss: 0.0770
Epoch 4/5
2360/2360 [==============================] - 0s - loss: 0.0780
Epoch 5/5
2360/2360 [==============================] - 0s - loss: 0.0785
Epoch 1/5
2360/2360 [==============================] - 0s - loss: 0.0667
Epoch 2/5
2360/2360 [==============================] - 0s - loss: 0.0675
Epoch 3/5
2360/2360 [==============================] - 0s - loss: 0.0642
Epoch 4/5
2360/2360 [==============================] - 0s - loss: 0.0671
Epoch 5/5
2360/2360 [==============================] - 0s - loss: 0.0677
Epoch 1/5
2360/2360 [==============================] - 0s - loss: 0.0647
Epoch 2/5
2360/2360 [==============================] - 0s - loss: 0.0652
Epoch 3/5
2360/2360 [==============================] - 0s - loss: 0.0613
Epoch 4/5
2360/2360 [==============================] - 0s - loss: 0.0630
Epoch 5/5
2360/2360 [==============================] - 0s - loss: 0.0589
Epoch 1/5
2360/2360 [==============================] - 0s - loss: 0.0618
Epoch 2/5
2360/2360 [==============================] - 0s - loss: 0.0586
Epoch 3/5
2360/2360 [==============================] - 0s - loss: 0.0642
Epoch 4/5
2360/2360 [==============================] - 0s - loss: 0.0614
Epoch 5/5
2360/2360 [==============================] - 0s - loss: 0.0619
Training baseline models ...
Saving metrics ...
Preparing dataset for T107 of rep 1...
Computing train/valid/test indices
TIMING: dataset construction took 0.276 s
Loading dataset from disk.
TIMING: dataset construction took 0.139 s
Loading dataset from disk.
Model training ...
Epoch 1/5
2360/2360 [==============================] - 1s - loss: 5.2055
Epoch 2/5
2360/2360 [==============================] - 0s - loss: 1.5749
Epoch 3/5
2360/2360 [==============================] - 0s - loss: 1.1795
Epoch 4/5
2360/2360 [==============================] - 0s - loss: 1.0271
Epoch 5/5
2360/2360 [==============================] - 0s - loss: 0.9604
Epoch 1/5
2360/2360 [==============================] - 0s - loss: 0.1215
Epoch 2/5
2360/2360 [==============================] - 0s - loss: 0.1220
Epoch 3/5
2360/2360 [==============================] - 0s - loss: 0.1236
Epoch 4/5
2360/2360 [==============================] - 0s - loss: 0.1186
Epoch 5/5
2360/2360 [==============================] - 0s - loss: 0.1264
Epoch 1/5
2360/2360 [==============================] - 0s - loss: 0.0737
Epoch 2/5
2360/2360 [==============================] - 0s - loss: 0.0689
Epoch 3/5
2360/2360 [==============================] - 0s - loss: 0.0699
Epoch 4/5
2360/2360 [==============================] - 0s - loss: 0.0697
Epoch 5/5
2360/2360 [==============================] - 0s - loss: 0.0697
Epoch 1/5
2360/2360 [==============================] - 0s - loss: 0.0600
Epoch 2/5
2360/2360 [==============================] - 0s - loss: 0.0597
Epoch 3/5
2360/2360 [==============================] - 0s - loss: 0.0590
Epoch 4/5
2360/2360 [==============================] - 0s - loss: 0.0643
Epoch 5/5
2360/2360 [==============================] - 0s - loss: 0.0590
Epoch 1/5
2360/2360 [==============================] - 0s - loss: 0.0592
Epoch 2/5
2360/2360 [==============================] - 0s - loss: 0.0580
Epoch 3/5
2360/2360 [==============================] - 0s - loss: 0.0579
Epoch 4/5
2360/2360 [==============================] - 0s - loss: 0.0554
Epoch 5/5
2360/2360 [==============================] - 0s - loss: 0.0595
Epoch 1/5
2360/2360 [==============================] - 0s - loss: 0.0566
Epoch 2/5
2360/2360 [==============================] - 0s - loss: 0.0574
Epoch 3/5
2360/2360 [==============================] - 0s - loss: 0.0563
Epoch 4/5
2360/2360 [==============================] - 0s - loss: 0.0582
Epoch 5/5
2360/2360 [==============================] - 0s - loss: 0.0548
Training baseline models ...
Saving metrics ...
Preparing dataset for T107 of rep 2...
Computing train/valid/test indices
TIMING: dataset construction took 0.274 s
Loading dataset from disk.
TIMING: dataset construction took 0.140 s
Loading dataset from disk.
Model training ...
Epoch 1/5
2360/2360 [==============================] - 1s - loss: 5.5089
Epoch 2/5
2360/2360 [==============================] - 0s - loss: 1.7023
Epoch 3/5
2360/2360 [==============================] - 0s - loss: 1.3171
Epoch 4/5
2360/2360 [==============================] - 0s - loss: 1.1166
Epoch 5/5
2360/2360 [==============================] - 0s - loss: 0.9882
Epoch 1/5
2360/2360 [==============================] - 0s - loss: 0.1386
Epoch 2/5
2360/2360 [==============================] - 0s - loss: 0.1187
Epoch 3/5
2360/2360 [==============================] - 0s - loss: 0.1288
Epoch 4/5
2360/2360 [==============================] - 0s - loss: 0.1254
Epoch 5/5
2360/2360 [==============================] - 0s - loss: 0.1222
Epoch 1/5
2360/2360 [==============================] - 0s - loss: 0.0751
Epoch 2/5
2360/2360 [==============================] - 0s - loss: 0.0730
Epoch 3/5
2360/2360 [==============================] - 0s - loss: 0.0685
Epoch 4/5
2360/2360 [==============================] - 0s - loss: 0.0707
Epoch 5/5
2360/2360 [==============================] - 0s - loss: 0.0743
Epoch 1/5
2360/2360 [==============================] - 0s - loss: 0.0622
Epoch 2/5
2360/2360 [==============================] - 0s - loss: 0.0587
Epoch 3/5
2360/2360 [==============================] - 0s - loss: 0.0625
Epoch 4/5
2360/2360 [==============================] - 0s - loss: 0.0604
Epoch 5/5
2360/2360 [==============================] - 0s - loss: 0.0575
Epoch 1/5
2360/2360 [==============================] - 0s - loss: 0.0586
Epoch 2/5
2360/2360 [==============================] - 0s - loss: 0.0577
Epoch 3/5
2360/2360 [==============================] - 0s - loss: 0.0571
Epoch 4/5
2360/2360 [==============================] - 0s - loss: 0.0587
Epoch 5/5
2360/2360 [==============================] - 0s - loss: 0.0589
Epoch 1/5
2360/2360 [==============================] - 0s - loss: 0.0543
Epoch 2/5
2360/2360 [==============================] - 0s - loss: 0.0565
Epoch 3/5
2360/2360 [==============================] - 0s - loss: 0.0528
Epoch 4/5
2360/2360 [==============================] - 0s - loss: 0.0548
Epoch 5/5
2360/2360 [==============================] - 0s - loss: 0.0576
Training baseline models ...
Saving metrics ...
Generate performace report ...
from VISAR_model_utils_v2 import generate_performance_plot_ST
import seaborn as sns
plot_df = generate_performance_plot_ST('./logs/ST_2019_8_14_986/performance_metrics.csv')
g = sns.catplot(x = 'task', y = 'value', hue = 'method',
col = 'tt', row = 'performance',
data = plot_df, kind = 'bar')
process trained results for VISAR analysis
from VISAR_model_utils_v2 import generate_RUNKEY_dataframe_ST
RUN_KEY = 'ST_2019_8_14_986'
log_path = './logs/'
prev_model = log_path + RUN_KEY + '/T107_rep2_50.hdf5'
output_prefix = 'T107_rep2_50_'
task_list = ['T107']
add_features = None
dataset_file = log_path + RUN_KEY + '/temp.csv'
FP_type = 'Circular_2048'
generate_RUNKEY_dataframe_ST(prev_model, output_prefix, task_list, dataset_file, FP_type, add_features,
n_layer = 1)
------------- Loading dataset --------------------
Extracted dataset shape: (2951, 3)
Loading raw samples now.
shard_size: 8192
About to start loading CSV from ./logs/ST_2019_8_14_986/temp.csv
Loading shard 1 of size 8192.
Featurizing sample 0
Featurizing sample 1000
Featurizing sample 2000
TIMING: featurizing shard 0 took 11.564 s
TIMING: dataset construction took 11.787 s
Loading dataset from disk.
------------- Loading previous trained models ------------------
WARNING:tensorflow:From /root/anaconda3/envs/deepchem/lib/python3.5/site-packages/keras/backend/tensorflow_backend.py:1108: calling reduce_mean (from tensorflow.python.ops.math_ops) with keep_dims is deprecated and will be removed in a future version.
Instructions for updating:
keep_dims is deprecated, use keepdims instead
------------- Prepare information for chemicals ------------------
------------- Prepare information for minibatches ------------------
------------- Prepare information for tasks ------------------
------- Generate color labels with default K of 5 --------
-------------- Saving datasets ----------------
import pandas as pd
import numpy as np
n_tasks = 1
batch_df = pd.read_csv('T107_rep2_50_batch_df.csv')
X = np.asarray(np.matrix(batch_df)[:,0:n_tasks]).reshape(-1)
order_x = np.asarray(np.argsort(X)).reshape(-1)
fit_data = X[order_x].T
[min_value, max_value] = [fit_data.min(), fit_data.max()]
rows_new = ['T107']
cols = batch_df['Label_id'].tolist()
cols_new = [str(cols[idx]) for idx in order_x]
plot_dat = pd.DataFrame(np.asmatrix(fit_data))
plot_dat['task'] = rows_new
plot_dat = plot_dat.set_index('task')
plot_dat.columns = cols_new
plot_dat.columns.name = 'label'
plot_df = pd.DataFrame(plot_dat.stack(), columns = ['value']).reset_index()
plot_df.head()
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
task | label | value | |
---|---|---|---|
0 | T107 | 15 | 0.0265525 |
1 | T107 | 77 | 0.644419 |
2 | T107 | 14 | 0.792068 |
3 | T107 | 47 | 0.978062 |
4 | T107 | 94 | 0.987944 |
batch_selected = 5
plot_df.index[plot_df['label'] == str(batch_selected)].tolist()[0]
[95]
Next:
- copy output files (including output_compound_df, output_batch_df, output_task_df) to a data directory, and clear the VISAR_webapp static directory if neccessary;
- start the app in prompt window by 'bokeh serve --show VISAR_webapp'
Train robust multitask regressor model
import os
import deepchem as dc
import numpy as np
import pandas as pd
import tensorflow as tf
os.environ['CUDA_VISIBLE_DEVICES']='1'
model setup
from model_training_utils_v2 import prepare_dataset
import os
import pandas as pd
RUN_KEY = 'Serotonin_Aug14'
log_path = './logs/' + RUN_KEY
os.system('mkdir %s' % log_path)
dataset_file = '%s/raw_data.csv' % (log_path)
MT_dat_name = './data/MT_data_clean_June28.csv'
FP_type = 'Circular_2048'
task_list = ['T51', 'T106','T107','T227', 'T108'] # 5HT-1a/1b/2a/2b/2c
#add_features = ['MW', 'logP', 'BertzCT', 'TPSA']
n_features = 2048
layer_sizes = [512, 64]
bypass_layer_sizes=[128]
bypass_dropouts = [.5]
dropout = 0.5
lr = 0.0005
model training
from keras.layers import Dense, Input
from keras.layers.core import Dropout
from keras.models import Model
from keras.optimizers import Adam
from keras.callbacks import ModelCheckpoint
from model_training_utils_v2 import prepare_dataset
def RobustMT_model_training(MT_dat_name, FP_type, task_list, log_path, epoch_num = 10,
n_features, layer_sizes, bypass_layer_sizes, bypass_dropouts, dropout, lr,
N_test = 500.0, add_features = None, n_epoch = 40):
dataset_file = '%s/raw_data.csv' % (log_path)
if len(task_list) > 1:
model_flag = 'MT'
else:
model_flag = 'ST'
dataset, df = prepare_dataset(MT_dat_name, task_list, dataset_file, FP_type,
smiles_field = 'canonical_smiles',
add_features = add_features,
id_field = 'chembl_id', model_flag = model_flag)
# calculate the ratio of missing values in the multitask setting
weights = dataset.w
true_cnt = sum(sum(weights))
missing_ratio = 1 - (true_cnt / (weights.shape[0] * weights.shape[1]))
print('Missing ratio of the dataset is %.2f' % missing_ratio)
print('Number of valid samples is %d' % int(true_cnt))
# split dataset
frac_train = 1 - N_test / dataset.X.shape[0]
splitter = dc.splits.RandomSplitter(dataset_file)
train_dataset, test_dataset = splitter.train_test_split(dataset, frac_train = frac_train)
metric = dc.metrics.Metric(
dc.metrics.r2_score, np.mean, mode = 'regression')
# model training
n_tasks = len(task_list)
model = dc.models.RobustMultitaskRegressor(n_tasks = n_tasks, n_features = n_features, layer_sizes = layers_sizes,
bypass_layer_sizes=bypass_layer_sizes, bypass_dropouts = bypass_dropouts,
dropout = dropout, learning_rate = lr)
model.save_file = log_path + '/model'
train_evaluation = []
test_evaluation = []
for iteration in range(epoch_num):
model.fit(train_dataset, n_epoch = n_epoch, max_checkpoints_to_keep = 1, checkpoint_interval=20)
print('======== Iteration %d ======' % iteration)
print("Evaluating model")
train_scores = model.evaluate(train_dataset, [metric], [], per_task_metrics=metric)
train_evaluation.append(train_scores[1]["mean-r2_score"])
print("Training R2 score: %f" % train_scores[0]["mean-r2_score"])
test_scores = model.evaluate(test_dataset, [metric], [], per_task_metrics=metric)
test_evaluation.append(test_scores[1]["mean-r2_score"])
print("Test R2 score: %f" % test_scores[0]["mean-r2_score"])
# save evaluation scores (need test!)
train_df = pd.DataFrame(np.array(train_evaluation))
test_df = pd.DataFrame(np.array(test_evaluation))
train_df.to_csv(model.save_file + '_train_log.csv')
test_df.to_csv(model.save_file + '_test_log.csv')
return
RobustMT_model_training(MT_dat_name, FP_type, task_list, log_path, epoch_num = 10, N_test = 500.0, add_features = None, n_epoch = 40)
Extracted dataset shape: (6451, 7)
Loading raw samples now.
shard_size: 8192
About to start loading CSV from ./logs/Serotonin_Aug14/raw_data.csv
Loading shard 1 of size 8192.
Featurizing sample 0
Featurizing sample 1000
Featurizing sample 2000
Featurizing sample 3000
Featurizing sample 4000
Featurizing sample 5000
Featurizing sample 6000
TIMING: featurizing shard 0 took 25.922 s
TIMING: dataset construction took 26.403 s
Loading dataset from disk.
Missing ratio of the dataset is 0.69
Number of valid samples is 10088
Computing train/valid/test indices
TIMING: dataset construction took 0.634 s
Loading dataset from disk.
TIMING: dataset construction took 0.258 s
Loading dataset from disk.
======== Iteration 0 ======
Evaluating model
computed_metrics: [0.759661163147638, 0.8487385832755133, 0.7927909892421446, 0.7862606713724265, 0.7956715188464687]
Training R2 score: 0.796625
computed_metrics: [0.5752939701171875, 0.6945875534170977, 0.6066652533645858, 0.37844448486926474, 0.6625924044094114]
Test R2 score: 0.583517
======== Iteration 1 ======
Evaluating model
computed_metrics: [0.8158505208724209, 0.8933779909691956, 0.861604485570545, 0.863322781629866, 0.8612575332340614]
Training R2 score: 0.859083
computed_metrics: [0.6115972878457685, 0.679947886147258, 0.6161329373560849, 0.34145438760506797, 0.6974210306560764]
Test R2 score: 0.589311
======== Iteration 2 ======
Evaluating model
computed_metrics: [0.8548260530951476, 0.9267161642164854, 0.8862006766204017, 0.9050664020898176, 0.9002187486642536]
Training R2 score: 0.894606
computed_metrics: [0.6032906919973067, 0.6598824576132678, 0.618635857697012, 0.3220733219525531, 0.7186496734891725]
Test R2 score: 0.584506
======== Iteration 3 ======
Evaluating model
computed_metrics: [0.864321517262086, 0.9397616613245016, 0.9095721444843354, 0.9238929926683908, 0.9167479937892477]
Training R2 score: 0.910859
computed_metrics: [0.6027677773255737, 0.6413089451338263, 0.5980630288323545, 0.32247924099147907, 0.7210531590609532]
Test R2 score: 0.577134
======== Iteration 4 ======
Evaluating model
computed_metrics: [0.8953533936281479, 0.9534278679065692, 0.9215748228809671, 0.9290959487772327, 0.9301747706191565]
Training R2 score: 0.925925
computed_metrics: [0.6155427222670634, 0.6820633227742317, 0.6073417748208587, 0.3246368070900857, 0.7268722175042581]
Test R2 score: 0.591291
======== Iteration 5 ======
Evaluating model
computed_metrics: [0.9029854417257765, 0.9604575943054542, 0.9331062337742813, 0.9284163299055174, 0.9388175501780647]
Training R2 score: 0.932757
computed_metrics: [0.6069236074254009, 0.714621862394654, 0.5836234677856831, 0.28948690011193723, 0.7211811441306561]
Test R2 score: 0.583167
======== Iteration 6 ======
Evaluating model
computed_metrics: [0.9171476271850806, 0.9653620693031328, 0.9429793121516797, 0.9437126330935732, 0.943749722922384]
Training R2 score: 0.942590
computed_metrics: [0.6105522209362191, 0.7095486611172115, 0.5890714681689975, 0.31531457007535124, 0.709942904059075]
Test R2 score: 0.586886
======== Iteration 7 ======
Evaluating model
computed_metrics: [0.926968619508657, 0.9638583261828477, 0.9480784696063547, 0.9463554566177619, 0.9519935893626453]
Training R2 score: 0.947451
computed_metrics: [0.6206203893849938, 0.7097037146131558, 0.5916103154555746, 0.2890225797452647, 0.7278600725600175]
Test R2 score: 0.587763
======== Iteration 8 ======
Evaluating model
computed_metrics: [0.9344494765668044, 0.9707340451387145, 0.95448494227031, 0.9407190704283092, 0.9540568289914396]
Training R2 score: 0.950889
computed_metrics: [0.6148857396669338, 0.7325904076146383, 0.5878479022455009, 0.2916763885710445, 0.7200587355168194]
Test R2 score: 0.589412
======== Iteration 9 ======
Evaluating model
computed_metrics: [0.9366193912320537, 0.9579353393749691, 0.9569519113034609, 0.9525346894456956, 0.9535840932986404]
Training R2 score: 0.951525
computed_metrics: [0.6153497824407556, 0.6753012257001534, 0.5834952261590962, 0.3384884190099773, 0.7063542992615486]
Test R2 score: 0.583798
# visualize the evaluation scores along the training process
import seaborn as sns
import pandas as pd
from VISAR_model_utils_v2 import generate_performance_plot_RobustMT
plot_df = generate_performance_plot_RobustMT(train_file = './logs/Serotonin_Aug14/model_train_log.csv',
test_file = './logs/Serotonin_Aug14/model_test_log.csv',
task_list = ['T51', 'T106','T107','T227', 'T108'])
import matplotlib.pyplot as plt
g = sns.FacetGrid(plot_df, col = 'tt', hue = 'tasks')
g = (g.map(plt.plot, 'step', 'R2', marker = '.')).add_legend()
process trained results for VISAR analysis
from VISAR_model_utils_v2 import generate_RUNKEY_dataframe_RobustMT
prev_model = './logs/Serotonin_Aug14/model-1200'
RUNKEY = './logs/Serotonin_Aug14/'
task_list = ['T51', 'T106','T107','T227', 'T108'] # 5HT-1a/1b/2a/2b/2c
#add_features = ['MW','logP','BertzCT','TPSA']
dataset_file = '%s/raw_data.csv' % (RUNKEY)
MT_dat_name = './data/MT_data_clean_June28.csv'
FP_type = 'Circular_2048'
model_flag = 'MT'
n_features = 2048
layer_sizes = [512, 64]
bypass_layer_sizes=[128]
bypass_dropouts = [.5]
dropout = 0.5
learning_rate = 0.001
n_layer = 2
n_bypass = 2
add_features = None
output_prefix = RUNKEY + '/RobustMT_serotonin_output_'
generate_RUNKEY_dataframe_RobustMT(prev_model, output_prefix, task_list, dataset_file, FP_type, add_features,
n_features, layer_sizes, bypass_layer_sizes, model_flag, n_bypass, n_layer = n_layer)
------------- Loading dataset --------------------
Extracted dataset shape: (6451, 7)
Loading raw samples now.
shard_size: 8192
About to start loading CSV from ./logs/Serotonin_Aug14//raw_data.csv
Loading shard 1 of size 8192.
Featurizing sample 0
Featurizing sample 1000
Featurizing sample 2000
Featurizing sample 3000
Featurizing sample 4000
Featurizing sample 5000
Featurizing sample 6000
TIMING: featurizing shard 0 took 24.339 s
TIMING: dataset construction took 24.805 s
Loading dataset from disk.
------------- Loading previous trained models ------------------
INFO:tensorflow:Restoring parameters from ./logs/Serotonin_Aug14/model-1200
------------- Prepare information for chemicals ------------------
INFO:tensorflow:Restoring parameters from ./logs/Serotonin_Aug14/model-1200
WARNING:tensorflow:From /root/anaconda3/envs/deepchem/lib/python3.5/site-packages/keras/backend/tensorflow_backend.py:1108: calling reduce_mean (from tensorflow.python.ops.math_ops) with keep_dims is deprecated and will be removed in a future version.
Instructions for updating:
keep_dims is deprecated, use keepdims instead
------------- Prepare information for minibatches ------------------
------------- Prepare information for tasks ------------------
INFO:tensorflow:Restoring parameters from ./logs/Serotonin_Aug14/model-1200
INFO:tensorflow:Restoring parameters from ./logs/Serotonin_Aug14/model-1200
INFO:tensorflow:Restoring parameters from ./logs/Serotonin_Aug14/model-1200
INFO:tensorflow:Restoring parameters from ./logs/Serotonin_Aug14/model-1200
INFO:tensorflow:Restoring parameters from ./logs/Serotonin_Aug14/model-1200
INFO:tensorflow:Restoring parameters from ./logs/Serotonin_Aug14/model-1200
------- Generate color labels with default K of 5 --------
-------------- Saving datasets ----------------
Next:
- copy output files (including output_compound_df, output_batch_df, output_task_df) to VISAR_webapp data directory, and clear the static directory if neccessary;
- start the app in prompt window by 'bokeh serve --show VISAR_webapp'
pharmacophore model analysis for selected batches
import os
import numpy as np
import pandas as pd
generate sdf file of selected batches
compound_df = pd.read_csv('T107_rep2_50_compound_df.csv')
selected_batch = 0
select_df = compound_df.loc[compound_df['label'] == selected_batch]
select_df.head()
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
T107 | chembl_id | canonical_smiles | x | y | pred_T107 | label | batch_label_color | batch_label | label_color | |
---|---|---|---|---|---|---|---|---|---|---|
1800 | 3.753245 | CHEMBL3664821 | O=C(CC1CC1)N[C@@H]2CC[C@@H](CCN3CCC(CC3)c4cccc... | -39.537960 | -6.644773 | 3.695881 | 0 | #1f77b4 | 2 | #393b79 |
1852 | 3.662531 | CHEMBL3664872 | O=C(N[C@@H]1CC[C@@H](CCN2CCC(CC2)c3cccc4OCOc34... | -39.001064 | -1.747338 | 3.654126 | 0 | #1f77b4 | 2 | #393b79 |
2306 | 3.787004 | CHEMBL3697951 | CC(=O)N[C@@H]1CC[C@@H](CCN2CCN(CC2)c3cccc4OCOc... | -39.745975 | 2.005099 | 3.765032 | 0 | #1f77b4 | 2 | #393b79 |
2307 | 3.673449 | CHEMBL3697952 | COCCC(=O)N[C@@H]1CC[C@@H](CCN2CCN(CC2)c3cccc4O... | -37.736317 | 2.435640 | 3.635353 | 0 | #1f77b4 | 2 | #393b79 |
2311 | 3.780144 | CHEMBL3697956 | O=C(CC1CCCO1)N[C@@H]2CC[C@@H](CCN3CCN(CC3)c4cc... | -40.699070 | 0.664541 | 3.700971 | 0 | #1f77b4 | 2 | #393b79 |
from rdkit import Chem
from rdkit.Chem import PandasTools
def df2sdf(df, output_sdf_name,
smiles_field = 'canonical_smiles', id_field = 'chembl_id',
selected_batch = None):
'''
pack pd.DataFrame to sdf_file
'''
if not selected_batch is None:
df = df.loc[df['label'] == selected_batch]
PandasTools.AddMoleculeColumnToFrame(df,smiles_field,'ROMol')
PandasTools.WriteSDF(df, output_sdf_name, idName=id_field, properties=df.columns)
return
df2sdf(select_df, 'T107_rep2_50_batch0.sdf', selected_batch = 0)
Building pharmacophore models using Align-it
prepare 3D coordinates for ligands
from rdkit import Chem
from rdkit.Chem import AllChem
from rdkit.Chem import Draw
raw_sdf_file = 'T107_rep2_50_batch0.sdf'
ms = [x for x in Chem.SDMolSupplier(raw_sdf_file)]
n_conf = 5
w = Chem.SDWriter('T107_rep2_50_batch0_rdkit_conf.sdf')
for i in range(n_conf):
ms_addH = [Chem.AddHs(m) for m in ms]
for m in ms_addH:
AllChem.EmbedMolecule(m)
AllChem.MMFFOptimizeMoleculeConfs(m)
w.write(m)
from prepared 3D ligands to representative pharmacophores
from align_it_utils import proceed_pharmacophore
import os
home_dir = './data/'
result_dir = home_dir + 'Label13_rdkit_phars/'
sdf_file = '/Users/dingqy14/Desktop/writing/P2_data_Dec27/code_utils_v1/data/Label13_rdkit_conf.sdf' # absolute path is prefered!
output_name = 'Cluster13_'
proceed_pharmacophore(home_dir, sdf_file, result_dir, output_name)
You can also building pharmacophore models using TeachOpenCADD platform, or analyze the selected ligands with other informatic tools, eg. DataWarrior, Schrodinger
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.