An Open Source Library for uncertain Knowledge Reasoning
Project description
unKR
uncertain Knowledge Graph
第一阶段:复现UKGE
- 数据处理
- 模型构建
- 整合跑通
第二阶段:复现其他模型
# | Model | Paper | 🐮🐴 | Notes |
---|---|---|---|---|
1. | UKGE | To be continue... | zst | |
2. | PASSLEAF | To be continue... | zst | |
3. | BEUrRE | To be continue... | lyc | 🧈 |
4. | FocesE | To be continue... | lyc | |
5. | UKGsE | To be continue... | lw | |
6. | GTransE | To be continue... | lw | |
7. | GMUC | To be continue... | csl | 💡 |
8. | GMUC+ | To be continue... | csl | 💡 |
9. | UPGAT | To be continue... | xjy |
模型实现进度及问题
记录实现其它模型时,遇到的暂时无法解决的问题,以便组内复查 ……
BEUrRE
- 可以再讨论一下BaseLitModel里关于指标的逻辑,每个模型不需要所有指标都测量,例如耗时的NDCG可以默认关闭。
- Pyrcharm会自动识别src为源文件,因此能找到src.unKR,使用其他编译器或服务器则无法识别,因此可以考虑将unKR移出替代src文件夹?
- 为什么训练中会出现Hit@10指标上升,但MAE和MSE指标同时也在上升的情况?
GMUC & GMUCp
MSE,MAE置信度预测结果达到论文水平,但是链路预测的mrr,hits@1等指标异常,与原文差距较大GMUCp源码中的loss可以降到1以下,但在本框架中始终卡在4点多,暂未找到原因,因此导致链路预测效果差GMUCp越训练,效果反而越差,初始的仅训练一个epoch,hits10能达到0.2左右,而训练了600epoch,预测结果不到0.1
总结:在该框架下训练模型,须适当调小学习率,方能得到与论文中接近的结果。
PASSELAF
1. 论文指标的数量级存疑
2. RotatE加进来后的表现明显相比另外两个指标而言较差,后续需查明原因
GeneratePSL
此部分以linqs团队开发的PSL程序为核心,为unKR用户提供了一个适用于我们数据格式的PSL接口,并允许用户自定义关系与谓词,不过目前上传的示例中,受内存限制,obs略小于实际值,生成的softlogic也较小,但是已能完整地运行整个流程,后续会跑出来一个完整的示例。
第三阶段:模型结果
NL27k
链 | 接 | 预 | 测 | 置信度 | 预测 | |||||
---|---|---|---|---|---|---|---|---|---|---|
模型 | Hits@1 | Hits@5 | Hits@10 | MR | MRR | WMR | WMRR | nDCG | MSE | MAE |
UKGE | 0.3650 | 0.5390 | 0.6110 | 155.9530 | 0.4510 | 134.0200 | 0.4810 | 0.7538 | 0.0310 | 0.0720 |
UKGE_PSL | 0.4350 | 0.6270 | 0.7050 | 154.2640 | 0.5280 | 131.4950 | 0.5610 | 0.8267 | 0.0290 | 0.0680 |
PASSLEAF_DistMult | 0.3240 | 0.5040 | 0.5840 | 256.2660 | 0.4110 | 222.6580 | 0.4630 | 0.7393 | 0.0220 | 0.0530 |
PASSLEAF_ComplEx | 0.3280 | 0.5060 | 0.5850 | 267.5960 | 0.4150 | 232.4220 | 0.4660 | 0.7450 | 0.0230 | 0.0520 |
PASSLEAF_RotatE | 0.5620 | 0.7540 | 0.8010 | 128.8960 | 0.6510 | 111.2940 | 0.6970 | 0.8983 | 0.0330 | 0.0820 |
BEUrRE | 0.2430 | 0.5210 | 0.5990 | 597.7600 | 0.2890 | 466.1090 | 0.3179 | 0.7083 | 0.0872 | 0.2179 |
FocusE(DistMult) | 0.7459 | 0.8919 | 0.9419 | 408.2340 | 0.8180 | 366.8210 | 0.8429 | 0.9332 | / | / |
UKGsE | 0.0030 | 0.0110 | 0.0230 | 1822.4110 | 0.0120 | 1018.4010 | 0.0410 | 0.3201 | 0.1100 | 0.2630 |
GTransE | 0.1740 | 0.3920 | 0.4620 | 1230.9091 | 0.2790 | 1159.0580 | 0.3000 | 0.6026 | / | / |
GMUC | 0.3589 | 0.5189 | 0.5960 | 67.2689 | 0.4399 | 67.1389 | 0.4420 | 0.6349 | 0.0280 | 0.1209 |
GMUC+ | 0.3499 | 0.5490 | 0.6480 | 44.2350 | 0.4480 | 44.0069 | 0.4490 | 0.6309 | 0.0149 | 0.0970 |
UPGAT |
CN15k
链 | 接 | 预 | 测 | 置信度 | 预测 | |||||
---|---|---|---|---|---|---|---|---|---|---|
模型 | Hits@1 | Hits@5 | Hits@10 | MR | MRR | WMR | WMRR | nDCG | MSE | MAE |
UKGE | 0.0800 | 0.1650 | 0.2090 | 2044.1680 | 0.1240 | 1876.6320 | 0.1370 | 0.6140 | 0.2160 | 0.3830 |
UKGE_PSL | 0.0800 | 0.1740 | 0.2240 | 1906.1050 | 0.1290 | 1738.6910 | 0.1420 | 0.6211 | 0.2190 | 0.3830 |
PASSLEAF_DistMult | 0.0590 | 0.1510 | 0.1930 | 1629.4900 | 0.1060 | 1484.8370 | 0.1200 | 0.5771 | 0.1800 | 0.3390 |
PASSLEAF_ComplEx | 0.0740 | 0.1650 | 0.2040 | 1897.3080 | 0.1210 | 1727.7020 | 0.1360 | 0.5988 | 0.1750 | 0.3400 |
PASSLEAF_RotatE | 0.0780 | 0.2150 | 0.2570 | 1631.1870 | 0.1440 | 1446.1200 | 0.1620 | 0.7010 | ||
BEUrRE | 0.0439 | 0.2409 | 0.2920 | 1136.7280 | 0.1390 | 1055.7650 | 0.1500 | 0.7309 | 0.1800 | 0.3529 |
FocusE(DistMult) | 0.1640 | 0.2806 | 0.3429 | 2025.5419 | 0.2259 | 1890.2019 | 0.2489 | 0.8325 | / | / |
UKGsE | 0.0030 | 0.0110 | 0.0210 | 1858.2120 | 0.0110 | 1068.9890 | 0.0370 | 0.3124 | 0.1060 | 0.2570 |
GTransE | 0.0490 | 0.1710 | 0.2240 | 1090.5320 | 0.1110 | 959.9320 | 0.1190 | 0.4907 | / | / |
GMUC | 0.0120 | 0.0890 | 0.1539 | 94.0910 | 0.0630 | 93.8160 | 0.0619 | 0.2070 | 0.0299 | 0.1599 |
UPGAT |
PPI5k
链 | 接 | 预 | 测 | 置信度 | 预测 | |||||
---|---|---|---|---|---|---|---|---|---|---|
模型 | Hits@1 | Hits@5 | Hits@10 | MR | MRR | WMR | WMRR | nDCG | MSE | MAE |
UKGE | 0.6090 | 0.9090 | 0.9560 | 22.1730 | 0.7430 | 16.2180 | 0.7680 | 0.9836 | 0.0030 | 0.0240 |
UKGE_PSL | 0.6150 | 0.9180 | 0.9630 | 20.9030 | 0.7500 | 15.4580 | 0.7740 | 0.9839 | 0.0030 | 0.0230 |
PASSLEAF_DistMult | 0.3560 | 0.7900 | 0.9010 | 11.5460 | 0.5480 | 11.2380 | 0.5840 | 0.9544 | 0.0030 | 0.0220 |
PASSLEAF_ComplEx | 0.4440 | 0.8350 | 0.9300 | 9.0550 | 0.6200 | 8.3860 | 0.6530 | 0.9654 | 0.0030 | 0.0220 |
PASSLEAF_RotatE | 0.3670 | 0.8780 | 0.9410 | 7.7170 | 0.6000 | 6.9570 | 0.6210 | 0.9611 | 0.0040 | 0.0290 |
BEUrRE | 0.0 | 0.8190 | 0.9190 | 9.8590 | 0.3689 | 7.5050 | 0.4050 | 0.9160 | 0.0170 | 0.0820 |
FocusE(DistMult) | 0.9359 | 0.9840 | 0.9929 | 4.6939 | 0.9610 | 5.3559 | 0.9679 | 0.9483 | / | / |
UKGsE | 0.1980 | 0.4480 | 0.6080 | 58.6620 | 0.3300 | 39.7310 | 0.3940 | 0.9124 | 0.0080 | 0.0530 |
GTransE | 0.0120 | 0.2070 | 0.3120 | 160.7230 | 0.1210 | 109.0610 | 0.1780 | 0.7669 | / | / |
UPGAT |
评测任务及指标
-
置信度预测任务:
1)MSE为均方误差(预测值与真实值的绝对平方误差的平均值) $${ MSE }=\frac{1}{m} \sum_{i=1}^m\left(y_i-\hat{y}_i\right)^2$$
2)MAE为平均绝对误差(预测值与真实值的绝对误差的平均值) $${ MAE }=\frac{1}{m} \sum_{i=1}^m\left|\hat{y}_i-y_i\right|$$
-
链接预测任务:
1)Hits@k指标越大越好(该指标是指在链接预测中排名小于k的三元组的平均占比) $${Hits} @ k=\frac{1}{|S|} \sum_{i=1}^{|S|} \mathbb{I}\left({rank}_i \leqslant k\right)$$
2)MR指标越小越好(排名越靠前rank越小,求和也就更小) $${MR}=\frac{1}{|S|} \sum_{i=1}^{|S|} {rank}_i$$
3)MRR指标越大越好(即预测排名越靠前,倒数就越大,求和结果越大越好) $${MRR}=\frac{1}{|S|} \sum_{i=1}^{|S|} \frac{1}{{rank}_i}$$
4)nDCG为归一化折损累计增益
Gain: 表示一个列表中所有item的相关性分数,rel(i)表示item(i)相关性得分 $$Gain={rel}(i)$$
Cumulative Gain(CG): 表示对K个item的Gain进行累加 $$C G_k=\sum_{i=1}^k {rel}(i)$$
Discounted Cumulative Gain(DCG): 考虑排序顺序的因素,使得排名靠前的item增益更高,对排名靠后的item进行折损 $$D C G_k=\sum_{i=1}^k \frac{{rel}(i)}{\log _2(i+1)}$$
$$I D C G_k=\sum_{i=1}^{|REL_k|} \frac{{rel}(i)}{\log _2(i+1)}$$
Normalized DCG(nDCG): 归一化折损累计增益 $$n D C G_k=\frac{{DCG}_k} {{IDCG}_k}$$
5)WMR指标越小越好(排名越靠前rank越小,求和也就更小) $$WMR=\frac{1}{\sum_i c_i}\sum_i c_i\cdot rank_i$$
6)WMRR指标越大越好(即预测排名越靠前,倒数就越大,求和结果越大越好) $$WMRR=\frac{1}{\sum_i c_i}\sum_i c_i\cdot \frac{1}{rank_i}$$
数据集
Dataset | Original Dataset | Entities | Relations | Uncertain Relation Facts |
---|---|---|---|---|
CN15K | ConceptNet | 15000 | 36 | 241158 |
NL27K | NELL | 27221 | 404 | 175412 |
PPI5K | STRING | 4999 | 7 | 271666 |
参考⼯作
- NeuralKG,浙⼤张⽂ NeuralKG: An Open Source Library for Diverse Representation Learning of Knowledge Graphs. SIGIR 2022 Demo. https://arxiv.org/pdf/2202.12571.pdf
- NeuralKG-ind: A Python Library for Inductive Knowledge Graph Representation Learning. SIGIR 2023 Demo. https://arxiv.org/pdf/2304.14678.pdf
- GitHub⽹⻚:https://github.com/zjukg/NeuralKG
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file unKR-0.2.0.tar.gz
.
File metadata
- Download URL: unKR-0.2.0.tar.gz
- Upload date:
- Size: 61.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.8.18
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e1200495528174a80d0ae70610d4de3236fa3429066ffa158e5602b9be1167ac |
|
MD5 | e9c248b0156a4b1a9120e7fd0aefe4ac |
|
BLAKE2b-256 | 50286fef20bc0702a62698652345ebbbc8a15fdc177798527bef3b2020a8cd27 |
File details
Details for the file unKR-0.2.0-py3-none-any.whl
.
File metadata
- Download URL: unKR-0.2.0-py3-none-any.whl
- Upload date:
- Size: 85.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.8.18
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6bb34fd7520abf93cbeb2346a250b73b43f3824135236c74d4caf4a57e5fca3e |
|
MD5 | e51fcb65a5aa46321d77da20a40095be |
|
BLAKE2b-256 | 5a0508001169105521fbcfa5252bf2a3419d80d27a14353546b38c5ab3bde38e |