Skip to main content

An Open Source Library for uncertain Knowledge Reasoning

Project description

unKR

uncertain Knowledge Graph

第一阶段:复现UKGE

  1. 数据处理
  2. 模型构建
  3. 整合跑通

第二阶段:复现其他模型

# Model Paper 🐮🐴 Notes
1. UKGE To be continue... zst
2. PASSLEAF To be continue... zst
3. BEUrRE To be continue... lyc 🧈
4. FocesE To be continue... lyc
5. UKGsE To be continue... lw
6. GTransE To be continue... lw
7. GMUC To be continue... csl 💡
8. GMUC+ To be continue... csl 💡
9. UPGAT To be continue... xjy

模型实现进度及问题

记录实现其它模型时,遇到的暂时无法解决的问题,以便组内复查 ……

BEUrRE

  1. 可以再讨论一下BaseLitModel里关于指标的逻辑,每个模型不需要所有指标都测量,例如耗时的NDCG可以默认关闭。
  2. Pyrcharm会自动识别src为源文件,因此能找到src.unKR,使用其他编译器或服务器则无法识别,因此可以考虑将unKR移出替代src文件夹?
  3. 为什么训练中会出现Hit@10指标上升,但MAE和MSE指标同时也在上升的情况?

GMUC & GMUCp

  1. MSE,MAE置信度预测结果达到论文水平,但是链路预测的mrr,hits@1等指标异常,与原文差距较大
  2. GMUCp源码中的loss可以降到1以下,但在本框架中始终卡在4点多,暂未找到原因,因此导致链路预测效果差
  3. GMUCp越训练,效果反而越差,初始的仅训练一个epoch,hits10能达到0.2左右,而训练了600epoch,预测结果不到0.1

总结:在该框架下训练模型,须适当调小学习率,方能得到与论文中接近的结果。

PASSELAF

1. 论文指标的数量级存疑 2. RotatE加进来后的表现明显相比另外两个指标而言较差,后续需查明原因

GeneratePSL

此部分以linqs团队开发的PSL程序为核心,为unKR用户提供了一个适用于我们数据格式的PSL接口,并允许用户自定义关系与谓词,不过目前上传的示例中,受内存限制,obs略小于实际值,生成的softlogic也较小,但是已能完整地运行整个流程,后续会跑出来一个完整的示例。

第三阶段:模型结果

NL27k

置信度 预测
模型 Hits@1 Hits@5 Hits@10 MR MRR WMR WMRR nDCG MSE MAE
UKGE 0.3650 0.5390 0.6110 155.9530 0.4510 134.0200 0.4810 0.7538 0.0310 0.0720
UKGE_PSL 0.4350 0.6270 0.7050 154.2640 0.5280 131.4950 0.5610 0.8267 0.0290 0.0680
PASSLEAF_DistMult 0.3240 0.5040 0.5840 256.2660 0.4110 222.6580 0.4630 0.7393 0.0220 0.0530
PASSLEAF_ComplEx 0.3280 0.5060 0.5850 267.5960 0.4150 232.4220 0.4660 0.7450 0.0230 0.0520
PASSLEAF_RotatE 0.5620 0.7540 0.8010 128.8960 0.6510 111.2940 0.6970 0.8983 0.0330 0.0820
BEUrRE 0.2430 0.5210 0.5990 597.7600 0.2890 466.1090 0.3179 0.7083 0.0872 0.2179
FocusE(DistMult) 0.7459 0.8919 0.9419 408.2340 0.8180 366.8210 0.8429 0.9332 / /
UKGsE 0.0030 0.0110 0.0230 1822.4110 0.0120 1018.4010 0.0410 0.3201 0.1100 0.2630
GTransE 0.1740 0.3920 0.4620 1230.9091 0.2790 1159.0580 0.3000 0.6026 / /
GMUC 0.3589 0.5189 0.5960 67.2689 0.4399 67.1389 0.4420 0.6349 0.0280 0.1209
GMUC+ 0.3499 0.5490 0.6480 44.2350 0.4480 44.0069 0.4490 0.6309 0.0149 0.0970
UPGAT

CN15k

置信度 预测
模型 Hits@1 Hits@5 Hits@10 MR MRR WMR WMRR nDCG MSE MAE
UKGE 0.0800 0.1650 0.2090 2044.1680 0.1240 1876.6320 0.1370 0.6140 0.2160 0.3830
UKGE_PSL 0.0800 0.1740 0.2240 1906.1050 0.1290 1738.6910 0.1420 0.6211 0.2190 0.3830
PASSLEAF_DistMult 0.0590 0.1510 0.1930 1629.4900 0.1060 1484.8370 0.1200 0.5771 0.1800 0.3390
PASSLEAF_ComplEx 0.0740 0.1650 0.2040 1897.3080 0.1210 1727.7020 0.1360 0.5988 0.1750 0.3400
PASSLEAF_RotatE 0.0780 0.2150 0.2570 1631.1870 0.1440 1446.1200 0.1620 0.7010
BEUrRE 0.0439 0.2409 0.2920 1136.7280 0.1390 1055.7650 0.1500 0.7309 0.1800 0.3529
FocusE(DistMult) 0.1640 0.2806 0.3429 2025.5419 0.2259 1890.2019 0.2489 0.8325 / /
UKGsE 0.0030 0.0110 0.0210 1858.2120 0.0110 1068.9890 0.0370 0.3124 0.1060 0.2570
GTransE 0.0490 0.1710 0.2240 1090.5320 0.1110 959.9320 0.1190 0.4907 / /
GMUC 0.0120 0.0890 0.1539 94.0910 0.0630 93.8160 0.0619 0.2070 0.0299 0.1599
UPGAT

PPI5k

置信度 预测
模型 Hits@1 Hits@5 Hits@10 MR MRR WMR WMRR nDCG MSE MAE
UKGE 0.6090 0.9090 0.9560 22.1730 0.7430 16.2180 0.7680 0.9836 0.0030 0.0240
UKGE_PSL 0.6150 0.9180 0.9630 20.9030 0.7500 15.4580 0.7740 0.9839 0.0030 0.0230
PASSLEAF_DistMult 0.3560 0.7900 0.9010 11.5460 0.5480 11.2380 0.5840 0.9544 0.0030 0.0220
PASSLEAF_ComplEx 0.4440 0.8350 0.9300 9.0550 0.6200 8.3860 0.6530 0.9654 0.0030 0.0220
PASSLEAF_RotatE 0.3670 0.8780 0.9410 7.7170 0.6000 6.9570 0.6210 0.9611 0.0040 0.0290
BEUrRE 0.0 0.8190 0.9190 9.8590 0.3689 7.5050 0.4050 0.9160 0.0170 0.0820
FocusE(DistMult) 0.9359 0.9840 0.9929 4.6939 0.9610 5.3559 0.9679 0.9483 / /
UKGsE 0.1980 0.4480 0.6080 58.6620 0.3300 39.7310 0.3940 0.9124 0.0080 0.0530
GTransE 0.0120 0.2070 0.3120 160.7230 0.1210 109.0610 0.1780 0.7669 / /
UPGAT

评测任务及指标

  1. 置信度预测任务:

    1)MSE为均方误差(预测值与真实值的绝对平方误差的平均值) $${ MSE }=\frac{1}{m} \sum_{i=1}^m\left(y_i-\hat{y}_i\right)^2$$

    2)MAE为平均绝对误差(预测值与真实值的绝对误差的平均值) $${ MAE }=\frac{1}{m} \sum_{i=1}^m\left|\hat{y}_i-y_i\right|$$

  2. 链接预测任务:

    1)Hits@k指标越大越好(该指标是指在链接预测中排名小于k的三元组的平均占比) $${Hits} @ k=\frac{1}{|S|} \sum_{i=1}^{|S|} \mathbb{I}\left({rank}_i \leqslant k\right)$$

    2)MR指标越小越好(排名越靠前rank越小,求和也就更小) $${MR}=\frac{1}{|S|} \sum_{i=1}^{|S|} {rank}_i$$

    3)MRR指标越大越好(即预测排名越靠前,倒数就越大,求和结果越大越好) $${MRR}=\frac{1}{|S|} \sum_{i=1}^{|S|} \frac{1}{{rank}_i}$$

    4)nDCG为归一化折损累计增益

    Gain: 表示一个列表中所有item的相关性分数,rel(i)表示item(i)相关性得分 $$Gain={rel}(i)$$

    Cumulative Gain(CG): 表示对K个item的Gain进行累加 $$C G_k=\sum_{i=1}^k {rel}(i)$$

    Discounted Cumulative Gain(DCG): 考虑排序顺序的因素,使得排名靠前的item增益更高,对排名靠后的item进行折损 $$D C G_k=\sum_{i=1}^k \frac{{rel}(i)}{\log _2(i+1)}$$

    $$I D C G_k=\sum_{i=1}^{|REL_k|} \frac{{rel}(i)}{\log _2(i+1)}$$

    Normalized DCG(nDCG): 归一化折损累计增益 $$n D C G_k=\frac{{DCG}_k} {{IDCG}_k}$$

    5)WMR指标越小越好(排名越靠前rank越小,求和也就更小) $$WMR=\frac{1}{\sum_i c_i}\sum_i c_i\cdot rank_i$$

    6)WMRR指标越大越好(即预测排名越靠前,倒数就越大,求和结果越大越好) $$WMRR=\frac{1}{\sum_i c_i}\sum_i c_i\cdot \frac{1}{rank_i}$$

数据集

Dataset Original Dataset Entities Relations Uncertain Relation Facts
CN15K ConceptNet 15000 36 241158
NL27K NELL 27221 404 175412
PPI5K STRING 4999 7 271666

参考⼯作

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

unKR-0.1.tar.gz (61.9 kB view details)

Uploaded Source

File details

Details for the file unKR-0.1.tar.gz.

File metadata

  • Download URL: unKR-0.1.tar.gz
  • Upload date:
  • Size: 61.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.18

File hashes

Hashes for unKR-0.1.tar.gz
Algorithm Hash digest
SHA256 861392fda263a0e2bd4925f9d7a97fcc75294c3e5ea49dd472a2c7b154a99e13
MD5 0beec3351647ca11d829433f5f6f56d3
BLAKE2b-256 615d9bebf9a39de7a6793d19e7210db0af1cc4ac02b7c33de605b03aa01ac2f9

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page