A unified library for fake news detection.

Project description

FaKnow

FaKnow (Fake Know), a unified Fake News Detection algorithms library based on PyTorch, is designed for reproducing and developing fake news detection algorithms. It includes 22 models(see at Integrated Models), covering 2 categories:

content based
social context

Features

Unified Framework: provide a unified interface to cover a series of algorithm development processes, including data processing, model developing, training and evaluation
Generic Data Structure: use json as the file format read into the framework to fit the format of the data crawled down, allowing the user to customize the processing of different fields
Diverse Models: contains a number of representative fake news detection algorithms published in conferences or journals during recent years, including a variety of content-based and social context-based models
Convenient Usability: pytorch based style makes it easy to use with rich auxiliary functions like loss visualization, logging, parameter saving
Great Scalability: users just focus on the exposed API and inherit built-in classes to reuse most of the functionality and only need to write a little code to meet new requirements

Installation

FaKnow is available for Python 3.8 and higher.

Make sure PyTorch(including torch and torchvision) and PyG(including torch_geometric and optional dependencies) are already installed.

from pip

pip install faknow

from source

git clone https://github.com/NPURG/FaKnow.git && cd FaKnow
pip install -e . --verbose

Usage Examples

Quick Start

We provide several methods to run integrated models quickly with passing only few arguments. For hyper parameters like learning rate, values from the open source code of the paper are taken as default. You can also pass your own defined hyper parameters to these methods.

run

You can use run and run_from_yaml methods to run integrated models. The former receives the parameters as dict keyword arguments and the latter reads them from the yaml configuration file.

run from kargs

from faknow.run import run

model = 'mdfend'  # lowercase short name of models
kargs = {'train_path': 'train.json', 'test_path': 'test.json'}  # dict arguments
run(model, **kargs)

the json file for mdfend should be like:

[
    {
        "text": "this is a sentence.",
        "domain": 9,
        "label": 1
    },
    {
        "text": "this is a sentence.",
        "domain": 1,
        "label": 0
    }
]

run from yaml

# demo.py
from faknow.run import run_from_yaml

model = 'mdfend'  # lowercase short name of models
config_path = 'mdfend.yaml'  # config file path
run_from_yaml(model, config_path)

your yaml config file should be like:

# mdfend.yaml
train_path: train.json # the path of training set file
test_path: test.json # the path of testing set file

run specific models

You can also run specific models using run_$model$ and run_$model$_from_yaml methods by passing parameter, where $model$ should be the lowercase name of the integrated model you want to use. The usages are the same as run and run_from_yaml. Following is an example to run mdfend.

from faknow.run.content_based.run_mdfend import run_mdfend, run_mdfend_from_yaml

# run from kargs
kargs = {'train_path': 'train.json', 'test_path': 'test.json'}  # dict training arguments
run_mdfend(**kargs)

# or run from yaml
config_path = 'mdfend.yaml'  # config file path
run_mdfend_from_yaml(config_path)

Run From Scratch

Following is an example to run mdfend from scratch.

from faknow.data.dataset.text import TextDataset
from faknow.data.process.text_process import TokenizerFromPreTrained
from faknow.evaluate.evaluator import Evaluator
from faknow.model.content_based.mdfend import MDFEND
from faknow.train.trainer import BaseTrainer

import torch
from torch.utils.data import DataLoader

# tokenizer for MDFEND
max_len, bert = 170, 'bert-base-uncased'
tokenizer = TokenizerFromPreTrained(max_len, bert)

# dataset
batch_size = 64
train_path, test_path, validate_path = 'train.json', 'test.json', 'val.json'

train_set = TextDataset(train_path, ['text'], tokenizer)
train_loader = DataLoader(train_set, batch_size, shuffle=True)

validate_set = TextDataset(validate_path, ['text'], tokenizer)
val_loader = DataLoader(validate_set, batch_size, shuffle=False)

test_set = TextDataset(test_path, ['text'], tokenizer)
test_loader = DataLoader(test_set, batch_size, shuffle=False)

# prepare model
domain_num = 9
model = MDFEND(bert, domain_num)

# optimizer and lr scheduler
lr, weight_decay, step_size, gamma = 0.00005, 5e-5, 100, 0.98
optimizer = torch.optim.Adam(params=model.parameters(),
                             lr=lr,
                             weight_decay=weight_decay)
scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size, gamma)

# metrics to evaluate the model performance
evaluator = Evaluator()

# train and validate
num_epochs, device = 50, 'cpu'
trainer = BaseTrainer(model, evaluator, optimizer, scheduler, device=device)
trainer.fit(train_loader, num_epochs, validate_loader=val_loader)

# show test result
print(trainer.evaluate(test_loader))

Integrated Models

category	paper	journal/conference	publish year	source repository	our code
Content Based	Convolutional Neural Networks for Sentence Classification	EMNLP	2014	yoonkim/CNN_sentence	TextCNN
	EANN: Event Adversarial Neural Networks for Multi-Modal Fake News Detection	KDD	2018	yaqingwang/EANN-KDD18	EANN
	SpotFake: A Multimodal Framework for Fake News Detection	BigMM	2019	shiivangii/SpotFake	SpotFake
	SAFE: Similarity-Aware Multi-Modal Fake News Detection	PAKDD	2020	Jindi0/SAFE	SAFE
	MDFEND: Multi-domain Fake News Detection	CIKM	2021	kennqiang/MDFEND-Weibo21	MDFEND
	Multimodal Fusion with Co-Attention Networks for Fake News Detection	ACL	2021	wuyang45/MCAN_code	MCAN
	HMCAN: Hierarchical Multi-modal Contextual Attention Network for fake news Detection	SIGIR	2021	wangjinguang502/HMCAN	HMCAN
	MFAN: Multi-modal Feature-enhanced Attention Networks for Rumor Detection	IJCAI	2022	drivsaf/MFAN	MFAN
	Generalizing to the Future: Mitigating Entity Bias in Fake News Detection	SIGIR	2022	ICTMCG/ENDEF-SIGIR2022	ENDEF
	M3FEND: Memory-Guided Multi-View Multi-Domain Fake News Detection	TKDE	2022	ICTMCG/M3FEND	M3FEND
	CAFE: Cross-modal Ambiguity Learning for Multimodal Fake News Detection	WWW	2022	cyxanna/CAFE	CAFE
Social Context	Semi-Supervised Classification with Graph Convolutional Networks	ICLR	2017	safe-graph/GNN-FakeNews	GCN
	Inductive Representation Learning on Large Graphs	NeurIPS	2017	safe-graph/GNN-FakeNews	GraphSAGE
	Graph Attention Networks	ICLR	2018	safe-graph/GNN-FakeNews	GAT
	Fake News Detection on Social Media using Geometric Deep Learning	arXiv	2019	safe-graph/GNN-FakeNews	GCNFN
	Rumor detection on social media with bi-directional graph convolutional networks	AAAI	2020	safe-graph/GNN-FakeNews	BIGCN
	FANG: Leveraging Social Context for Fake News Detection Using Graph Representation	CIKM	2020	nguyenvanhoang7398/FANG	Fang
	Graph neural networks with continual learning for fake news detection from social media	arXiv	2020	safe-graph/GNN-FakeNews	GNNCL
	User Preference-aware Fake News Detection	SIGIR	2021	safe-graph/GNN-FakeNews	UPFD
	DUDEF: Mining Dual Emotion for Fake News Detection	WWW	2021	RMSnow/WWW2021	DUDEF
	Towards Propagation Uncertainty: Edge-enhanced Bayesian Graph Convolutional Networks for Rumor Detection, ACL 2021	ACL	2021	weilingwei96/EBGCN	EBGCN
	Towards Trustworthy Rumor Detection with Interpretable Graph Structural Learning	CIKM	2023	Anonymous4ScienceAuthor/TrustRD	TrustRD

Citation

@misc{faknow,
  title = {{{FaKnow}}: {{A Unified Library}} for {{Fake News Detection}}},
  shorttitle = {{{FaKnow}}},
  author = {Zhu, Yiyuan and Li, Yongjun and Wang, Jialiang and Gao, Ming and Wei, Jiali},
  year = {2024},
  month = jan,
  number = {arXiv:2401.16441},
  eprint = {2401.16441},
  primaryclass = {cs},
  publisher = {{arXiv}},
  archiveprefix = {arxiv},
  keywords = {Computer Science - Artificial Intelligence,Computer Science - Computation and Language,Computer Science - Machine Learning}
}

License

FaKnow has a MIT-style license, as found in the LICENSE file.

Algorithm	Hash digest
SHA256	`f2c156736c72361cd9eac5e83b61daf87add0cc036ee0d03d17bfe4bf07f4b27`
MD5	`598057211ac1094ec02c89eea40af84a`
BLAKE2b-256	`e4d9f94b85743c5b115f5b4a89db5ea388aed953e78af36a5f4b6b0b29e24f3e`

faknow 0.0.4

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers