Skip to main content

gcn for prediction of protein interactions

Project description

gcn for prediction of protein interactions

利用各种图神经网络进行link prediction of protein interactions。

Guide

Intro

目前主要实现基于【data/yeast/yeast.edgelist】下的蛋白质数据进行link prediction。

Model

模型

模型主要使用图神经网络,如gae、vgae等

  • 1.GCNModelVAE(src/vgae):图卷积自编码和变分图卷积自编码(config中可配置使用自编码或变分自编码),利用gae/vgae作为编码器,InnerProductDecoder作解码器。 Variational Graph Auto-Encoders

    image

  • 2.GCNModelARGA(src/arga):对抗正则化图自编码,利用gae/vgae作为生成器;一个三层前馈网络作判别器。 Adversarially Regularized Graph Autoencoder for Graph Embedding

    image

  • 3.GATModelVAE(src/graph_att_gae):基于图注意力的图卷积自编码和变分图卷积自编码(config中可配置使用自编码或变分自编码),利用gae/vgae作为编码器,InnerProductDecoder作解码器。这是我在以上【1】方法的基础上加入了一层图注意力层,关于图注意力可见【Reference】中的【GRAPH ATTENTION NETWORKS】。

  • 4.GATModelGAN(src/graph_att_gan):基于图注意力的对抗正则化图自编码,利用gae/vgae作为生成器;一个三层前馈网络作判别器,这是我在以上【2】方法的基础上加入了一层图注意力层,关于图注意力可见【Reference】中的【GRAPH ATTENTION NETWORKS】。

  • 5.NHGATModelVAE(src/graph_nheads_att_gae):基于图多头注意力的图卷积自编码和变分图卷积自编码(config中可配置使用自编码或变分自编码),利用gae/vgae作为编码器,InnerProductDecoder作解码器。此方法是在【3】方法的基础上将图注意力层改为多头注意力层。

  • 6.NHGATModelGAN(src/graph_nheads_att_gan):基于图多头注意力的对抗正则化图自编码,利用gae/vgae作为生成器;一个三层前馈网络作判别器,此方法在【4】方法的基础上将图注意力层改为多头注意力层。

Usage

  • 相关参数的配置config见每个模型文件夹中的config.cfg文件,训练和预测时会加载此文件。

  • 训练及预测

    1.GCNModelVAE(src/vgae)

    (1).训练

    from src.vgae.train import Train
    train = Train()
    train.train_model('config.cfg')
    
      Epoch: 0001 train_loss =  1.84734 val_roc_score =  0.76573 average_precision_score =  0.68083 time= 0.80005
      Epoch: 0002 train_loss =  1.83824 val_roc_score =  0.87289 average_precision_score =  0.86317 time= 0.80361
      Epoch: 0003 train_loss =  1.80761 val_roc_score =  0.87641 average_precision_score =  0.86590 time= 0.80121
      Epoch: 0004 train_loss =  1.77976 val_roc_score =  0.87737 average_precision_score =  0.86656 time= 0.79843
      Epoch: 0005 train_loss =  1.76685 val_roc_score =  0.87759 average_precision_score =  0.86664 time= 0.79843
      Epoch: 0006 train_loss =  1.71661 val_roc_score =  0.87767 average_precision_score =  0.86667 time= 0.80479
      Epoch: 0007 train_loss =  1.67656 val_roc_score =  0.87775 average_precision_score =  0.86670 time= 0.80509
      Epoch: 0008 train_loss =  1.62324 val_roc_score =  0.87785 average_precision_score =  0.86679 time= 0.80446
      Epoch: 0009 train_loss =  1.57730 val_roc_score =  0.87781 average_precision_score =  0.86680 time= 0.80424
      Epoch: 0010 train_loss =  1.51882 val_roc_score =  0.87789 average_precision_score =  0.86675 time= 0.80852
      Epoch: 0011 train_loss =  1.46346 val_roc_score =  0.87792 average_precision_score =  0.86678 time= 0.80625
      Epoch: 0012 train_loss =  1.37688 val_roc_score =  0.87795 average_precision_score =  0.86684 time= 0.80474
      Epoch: 0013 train_loss =  1.31243 val_roc_score =  0.87795 average_precision_score =  0.86685 time= 0.80574
      Epoch: 0014 train_loss =  1.25133 val_roc_score =  0.87791 average_precision_score =  0.86677 time= 0.80267
      Epoch: 0015 train_loss =  1.19762 val_roc_score =  0.87802 average_precision_score =  0.86693 time= 0.80540
      Epoch: 0016 train_loss =  1.15079 val_roc_score =  0.87812 average_precision_score =  0.86698 time= 0.80784
      Epoch: 0017 train_loss =  1.09600 val_roc_score =  0.87802 average_precision_score =  0.86688 time= 0.79920
      Epoch: 0018 train_loss =  1.05011 val_roc_score =  0.87820 average_precision_score =  0.86711 time= 0.80777
      Epoch: 0019 train_loss =  1.00610 val_roc_score =  0.87840 average_precision_score =  0.86714 time= 0.80412
      Epoch: 0020 train_loss =  0.95014 val_roc_score =  0.87838 average_precision_score =  0.86713 time= 0.80210
      
      test roc score: 0.8814614254330005
      test ap score: 0.8708329314774368
    

    (2).预测

    from src.vgae.predict import Predict
    
    predict = Predict()
    predict.load_model_adj('config_cfg')
    # 会返回原始的图邻接矩阵和经过模型编码后的hidden embedding经过内积解码的邻接矩阵,可以对这两个矩阵进行比对,得出link prediction.
    adj_orig, adj_rec = predict.predict()
    
    2.GCNModelARGA(src/arga)

    (1).训练

    from src.arga.train import Train
    train = Train()
    train.train_model('config.cfg')
    
      Epoch: 0001 train_loss =  2.08252 val_roc_score =  0.75422 average_precision_score =  0.66179 time= 0.80230
      Epoch: 0002 train_loss =  2.03940 val_roc_score =  0.86953 average_precision_score =  0.85636 time= 0.79571
      Epoch: 0003 train_loss =  2.00348 val_roc_score =  0.87872 average_precision_score =  0.86847 time= 0.79245
      Epoch: 0004 train_loss =  1.97120 val_roc_score =  0.87997 average_precision_score =  0.86995 time= 0.79640
      Epoch: 0005 train_loss =  1.93477 val_roc_score =  0.88017 average_precision_score =  0.87027 time= 0.79548
      Epoch: 0006 train_loss =  1.89215 val_roc_score =  0.88046 average_precision_score =  0.87038 time= 0.79972
      Epoch: 0007 train_loss =  1.84537 val_roc_score =  0.88072 average_precision_score =  0.87058 time= 0.79561
      Epoch: 0008 train_loss =  1.78754 val_roc_score =  0.88063 average_precision_score =  0.87049 time= 0.79802
      Epoch: 0009 train_loss =  1.72469 val_roc_score =  0.88053 average_precision_score =  0.87043 time= 0.79486
      Epoch: 0010 train_loss =  1.65402 val_roc_score =  0.88063 average_precision_score =  0.87049 time= 0.79423
      Epoch: 0011 train_loss =  1.57884 val_roc_score =  0.88052 average_precision_score =  0.87045 time= 0.79348
      Epoch: 0012 train_loss =  1.49870 val_roc_score =  0.88049 average_precision_score =  0.87046 time= 0.79649
      Epoch: 0013 train_loss =  1.42083 val_roc_score =  0.88056 average_precision_score =  0.87046 time= 0.79063
      Epoch: 0014 train_loss =  1.34764 val_roc_score =  0.88060 average_precision_score =  0.87056 time= 0.79889
      Epoch: 0015 train_loss =  1.27635 val_roc_score =  0.88038 average_precision_score =  0.87043 time= 0.79485
      Epoch: 0016 train_loss =  1.20521 val_roc_score =  0.88050 average_precision_score =  0.87058 time= 0.79927
      Epoch: 0017 train_loss =  1.13763 val_roc_score =  0.88035 average_precision_score =  0.87045 time= 0.79072
      Epoch: 0018 train_loss =  1.07326 val_roc_score =  0.88035 average_precision_score =  0.87049 time= 0.79284
      Epoch: 0019 train_loss =  1.01548 val_roc_score =  0.88023 average_precision_score =  0.87044 time= 0.78869
      Epoch: 0020 train_loss =  0.96069 val_roc_score =  0.88014 average_precision_score =  0.87037 time= 0.79441
     
      test roc score: 0.8798092171308727
      test ap score: 0.8700487009596252
    

    (2).预测

    from src.arga.predict import Predict
    
    predict = Predict()
    predict.load_model_adj('config_cfg')
    # 会返回原始的图邻接矩阵和经过模型编码后的hidden embedding经过内积解码的邻接矩阵,可以对这两个矩阵进行比对,得出link prediction.
    adj_orig, adj_rec = predict.predict()
    
    3.GATModelVAE(src/graph_att_gae)

    (1).训练

    from src.graph_att_gae.train import Train
    train = Train()
    train.train_model('config.cfg')
    
    Epoch: 0001 train_loss =  1.83611 val_roc_score =  0.73571 average_precision_score =  0.62940 time= 0.81406
    Epoch: 0002 train_loss =  1.83237 val_roc_score =  0.87094 average_precision_score =  0.85831 time= 0.81499
    Epoch: 0003 train_loss =  1.82761 val_roc_score =  0.87429 average_precision_score =  0.86431 time= 0.81297
    Epoch: 0004 train_loss =  1.78672 val_roc_score =  0.87509 average_precision_score =  0.86525 time= 0.80870
    Epoch: 0005 train_loss =  1.76815 val_roc_score =  0.87523 average_precision_score =  0.86550 time= 0.81497
    Epoch: 0006 train_loss =  1.72495 val_roc_score =  0.87523 average_precision_score =  0.86551 time= 0.81070
    Epoch: 0007 train_loss =  1.69047 val_roc_score =  0.87593 average_precision_score =  0.86601 time= 0.80948
    Epoch: 0008 train_loss =  1.63153 val_roc_score =  0.87573 average_precision_score =  0.86593 time= 0.80709
    Epoch: 0009 train_loss =  1.57143 val_roc_score =  0.87551 average_precision_score =  0.86580 time= 0.80653
    Epoch: 0010 train_loss =  1.50240 val_roc_score =  0.87587 average_precision_score =  0.86594 time= 0.81233
    Epoch: 0011 train_loss =  1.44139 val_roc_score =  0.87567 average_precision_score =  0.86589 time= 0.80861
    Epoch: 0012 train_loss =  1.37266 val_roc_score =  0.87557 average_precision_score =  0.86571 time= 0.80932
    Epoch: 0013 train_loss =  1.32811 val_roc_score =  0.87578 average_precision_score =  0.86597 time= 0.80686
    Epoch: 0014 train_loss =  1.30064 val_roc_score =  0.87607 average_precision_score =  0.86603 time= 0.80962
    Epoch: 0015 train_loss =  1.25788 val_roc_score =  0.87592 average_precision_score =  0.86611 time= 0.80796
    Epoch: 0016 train_loss =  1.23810 val_roc_score =  0.87607 average_precision_score =  0.86617 time= 0.80750
    Epoch: 0017 train_loss =  1.18570 val_roc_score =  0.87594 average_precision_score =  0.86613 time= 0.80911
    Epoch: 0018 train_loss =  1.14961 val_roc_score =  0.87607 average_precision_score =  0.86626 time= 0.81035
    Epoch: 0019 train_loss =  1.10372 val_roc_score =  0.87593 average_precision_score =  0.86598 time= 0.81094
    Epoch: 0020 train_loss =  1.05262 val_roc_score =  0.87605 average_precision_score =  0.86613 time= 0.81442
    
    test roc score: 0.8758194438300309
    test ap score: 0.8629482273490456
    

    (2).预测

    from src.graph_att_gae.predict import Predict
    
    predict = Predict()
    predict.load_model_adj('config_cfg')
    # 会返回原始的图邻接矩阵和经过模型编码后的hidden embedding经过内积解码的邻接矩阵,可以对这两个矩阵进行比对,得出link prediction.
    adj_orig, adj_rec = predict.predict()
    
    4.GATModelGAN(src/graph_att_gan)

    (1).训练

    from src.graph_att_gan.train import Train
    train = Train()
    train.train_model('config.cfg')
    
    Epoch: 0001 train_loss =  3.24637 val_roc_score =  0.77403 average_precision_score =  0.68203 time= 0.81267
    Epoch: 0002 train_loss =  3.21157 val_roc_score =  0.87269 average_precision_score =  0.86088 time= 0.81181
    Epoch: 0003 train_loss =  3.15047 val_roc_score =  0.87391 average_precision_score =  0.86203 time= 0.81182
    Epoch: 0004 train_loss =  3.08302 val_roc_score =  0.87457 average_precision_score =  0.86271 time= 0.81055
    Epoch: 0005 train_loss =  3.03024 val_roc_score =  0.87410 average_precision_score =  0.86226 time= 0.81125
    Epoch: 0006 train_loss =  2.95011 val_roc_score =  0.87450 average_precision_score =  0.86264 time= 0.81162
    Epoch: 0007 train_loss =  2.82191 val_roc_score =  0.87460 average_precision_score =  0.86275 time= 0.81088
    Epoch: 0008 train_loss =  2.73079 val_roc_score =  0.87442 average_precision_score =  0.86256 time= 0.80648
    Epoch: 0009 train_loss =  2.61711 val_roc_score =  0.87454 average_precision_score =  0.86268 time= 0.81021
    Epoch: 0010 train_loss =  2.50720 val_roc_score =  0.87480 average_precision_score =  0.86288 time= 0.80921
    Epoch: 0011 train_loss =  2.42761 val_roc_score =  0.87506 average_precision_score =  0.86298 time= 0.81137
    Epoch: 0012 train_loss =  2.36874 val_roc_score =  0.87497 average_precision_score =  0.86282 time= 0.81466
    Epoch: 0013 train_loss =  2.29911 val_roc_score =  0.87504 average_precision_score =  0.86291 time= 0.81193
    Epoch: 0014 train_loss =  2.21190 val_roc_score =  0.87526 average_precision_score =  0.86297 time= 0.80965
    Epoch: 0015 train_loss =  2.12611 val_roc_score =  0.87511 average_precision_score =  0.86290 time= 0.81013
    Epoch: 0016 train_loss =  2.03527 val_roc_score =  0.87528 average_precision_score =  0.86314 time= 0.81365
    Epoch: 0017 train_loss =  1.96965 val_roc_score =  0.87524 average_precision_score =  0.86309 time= 0.81125
    Epoch: 0018 train_loss =  1.90381 val_roc_score =  0.87515 average_precision_score =  0.86312 time= 0.80971
    Epoch: 0019 train_loss =  1.85955 val_roc_score =  0.87487 average_precision_score =  0.86288 time= 0.80996
    Epoch: 0020 train_loss =  1.81664 val_roc_score =  0.87483 average_precision_score =  0.86293 time= 0.81270
    
    test roc score: 0.8826745834179653
    test ap score: 0.8715261230395998
    

    (2).预测

    from src.graph_att_gan.predict import Predict
    
    predict = Predict()
    predict.load_model_adj('config_cfg')
    # 会返回原始的图邻接矩阵和经过模型编码后的hidden embedding经过内积解码的邻接矩阵,可以对这两个矩阵进行比对,得出link prediction.
    adj_orig, adj_rec = predict.predict()
    
    5.NHGATModelVAE(src/graph_nheads_att_gae)

    (1).训练

    from src.graph_nheads_att_gae.train import Train
    train = Train()
    train.train_model('config.cfg')
    
    Epoch: 0001 train_loss =  1.85570 val_roc_score =  0.80750 average_precision_score =  0.72917 time= 0.84645
    Epoch: 0002 train_loss =  1.78607 val_roc_score =  0.88103 average_precision_score =  0.87114 time= 0.84186
    Epoch: 0003 train_loss =  1.68021 val_roc_score =  0.88117 average_precision_score =  0.87144 time= 0.84135
    Epoch: 0004 train_loss =  1.52555 val_roc_score =  0.88115 average_precision_score =  0.87141 time= 0.84212
    Epoch: 0005 train_loss =  1.38254 val_roc_score =  0.88070 average_precision_score =  0.87098 time= 0.83917
    Epoch: 0006 train_loss =  1.40003 val_roc_score =  0.88106 average_precision_score =  0.87134 time= 0.84185
    Epoch: 0007 train_loss =  1.31239 val_roc_score =  0.88081 average_precision_score =  0.87110 time= 0.83766
    Epoch: 0008 train_loss =  1.17827 val_roc_score =  0.88102 average_precision_score =  0.87134 time= 0.84063
    Epoch: 0009 train_loss =  1.08710 val_roc_score =  0.88086 average_precision_score =  0.87126 time= 0.84173
    Epoch: 0010 train_loss =  1.01816 val_roc_score =  0.88136 average_precision_score =  0.87162 time= 0.84121
    Epoch: 0011 train_loss =  0.95128 val_roc_score =  0.88128 average_precision_score =  0.87133 time= 0.84128
    Epoch: 0012 train_loss =  0.87212 val_roc_score =  0.88127 average_precision_score =  0.87142 time= 0.84218
    Epoch: 0013 train_loss =  0.80497 val_roc_score =  0.88134 average_precision_score =  0.87154 time= 0.84077
    Epoch: 0014 train_loss =  0.75538 val_roc_score =  0.88088 average_precision_score =  0.87120 time= 0.83701
    Epoch: 0015 train_loss =  0.70903 val_roc_score =  0.88063 average_precision_score =  0.87073 time= 0.83698
    Epoch: 0016 train_loss =  0.68525 val_roc_score =  0.88035 average_precision_score =  0.87055 time= 0.83837
    Epoch: 0017 train_loss =  0.66079 val_roc_score =  0.87995 average_precision_score =  0.87053 time= 0.83806
    Epoch: 0018 train_loss =  0.65187 val_roc_score =  0.87924 average_precision_score =  0.86958 time= 0.84210
    Epoch: 0019 train_loss =  0.64572 val_roc_score =  0.87929 average_precision_score =  0.86995 time= 0.84069
    Epoch: 0020 train_loss =  0.64103 val_roc_score =  0.87951 average_precision_score =  0.87026 time= 0.83967
    
    test roc score: 0.877033361471422
    test ap score: 0.867286248500891
    

    (2).预测

    from src.graph_nheads_att_gae.predict import Predict
    
    predict = Predict()
    predict.load_model_adj('config_cfg')
    # 会返回原始的图邻接矩阵和经过模型编码后的hidden embedding经过内积解码的邻接矩阵,可以对这两个矩阵进行比对,得出link prediction.
    adj_orig, adj_rec = predict.predict()
    
    6.NHGATModelGAN(src/graph_nheads_att_gan)

    (1).训练

    from src.graph_nheads_att_gan.train import Train
    train = Train()
    train.train_model('config.cfg')
    
    Epoch: 0001 train_loss =  3.24091 val_roc_score =  0.77050 average_precision_score =  0.66992 time= 0.85475
    Epoch: 0002 train_loss =  3.18022 val_roc_score =  0.87671 average_precision_score =  0.86657 time= 0.84643
    Epoch: 0003 train_loss =  3.09047 val_roc_score =  0.87715 average_precision_score =  0.86704 time= 0.84354
    Epoch: 0004 train_loss =  2.95696 val_roc_score =  0.87695 average_precision_score =  0.86698 time= 0.84279
    Epoch: 0005 train_loss =  2.87052 val_roc_score =  0.87747 average_precision_score =  0.86741 time= 0.84714
    Epoch: 0006 train_loss =  2.88739 val_roc_score =  0.87742 average_precision_score =  0.86727 time= 0.84777
    Epoch: 0007 train_loss =  2.78251 val_roc_score =  0.87757 average_precision_score =  0.86748 time= 0.84134
    Epoch: 0008 train_loss =  2.65458 val_roc_score =  0.87766 average_precision_score =  0.86745 time= 0.84429
    Epoch: 0009 train_loss =  2.60484 val_roc_score =  0.87798 average_precision_score =  0.86780 time= 0.84680
    Epoch: 0010 train_loss =  2.56642 val_roc_score =  0.87806 average_precision_score =  0.86766 time= 0.84952
    Epoch: 0011 train_loss =  2.49832 val_roc_score =  0.87826 average_precision_score =  0.86771 time= 0.84535
    Epoch: 0012 train_loss =  2.38511 val_roc_score =  0.87799 average_precision_score =  0.86763 time= 0.84903
    Epoch: 0013 train_loss =  2.28920 val_roc_score =  0.87781 average_precision_score =  0.86762 time= 0.84161
    Epoch: 0014 train_loss =  2.23039 val_roc_score =  0.87791 average_precision_score =  0.86761 time= 0.84422
    Epoch: 0015 train_loss =  2.14044 val_roc_score =  0.87782 average_precision_score =  0.86750 time= 0.84063
    Epoch: 0016 train_loss =  2.05134 val_roc_score =  0.87774 average_precision_score =  0.86754 time= 0.84043
    Epoch: 0017 train_loss =  1.95402 val_roc_score =  0.87745 average_precision_score =  0.86740 time= 0.84461
    Epoch: 0018 train_loss =  1.89405 val_roc_score =  0.87714 average_precision_score =  0.86720 time= 0.84435
    Epoch: 0019 train_loss =  1.83182 val_roc_score =  0.87690 average_precision_score =  0.86693 time= 0.84567
    Epoch: 0020 train_loss =  1.74144 val_roc_score =  0.87683 average_precision_score =  0.86717 time= 0.84130
    
    test roc score: 0.8767371798715641
    test ap score: 0.8680650766563964
    

    (2).预测

    from src.graph_nheads_att_gan.predict import Predict
    
    predict = Predict()
    predict.load_model_adj('config_cfg')
    # 会返回原始的图邻接矩阵和经过模型编码后的hidden embedding经过内积解码的邻接矩阵,可以对这两个矩阵进行比对,得出link prediction.
    adj_orig, adj_rec = predict.predict()
    

Dataset

数据来自酵母蛋白质相互作用yeast 。 数据集的格式如下,具体可见data

 YLR418C	YOL145C
 YOL145C	YLR418C
 YLR418C	YOR123C
 YOR123C	YLR418C
 ......         ......

Install

  • 安装:pip install GCN4LP
  • 下载源码:
git clone https://github.com/jiangnanboy/gcn_for_prediction_of_protein_interactions.git
cd gcn_for_prediction_of_protein_interactions
python setup.py install

通过以上两种方法的任何一种完成安装都可以。如果不想安装,可以下载github源码包

Cite

如果你在研究中使用了GCN4LP,请按如下格式引用:

@software{GCN4LP,
  author = {Shi Yan},
  title = {GCN4LP: gcn for prediction of protein interactions},
  year = {2021},
  url = {https://github.com/jiangnanboy/gcn_for_prediction_of_protein_interactions},
}

Reference

Project details


Release history Release notifications | RSS feed

This version

0.1

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

GCN4LP-0.1-py3-none-any.whl (45.4 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page