Skip to main content

Simple fake news classifier package created for bachelor's thesis 'Detection of Fake News Using Machine Learning'.

Project description

Detection of Fake News Using Machine Learning

Bachelor's thesis

Author: Matej Koreň, 2022

Brno University of Technology

Faculty of Information Technology

Introduction

This project focuses on the use of machine learning in fake news detection. For this purpose, four models have been selected -- Bayesian, Decision Tree, Support Vector Machine and a Neural network. In five experiments on various datasets, these models were trained, tested, evaluated and compared with state-of-the-art methods. Final implementation is in the form of a package (or console application), which allows it's users to replicate this procedure with their own data.

Installing final product directly

  1. Unzip FakeNewsCLassifier project package
  2. Open terminal
  3. Install requirements

pip3 install -r requirements.txt

note: additional NLP libraries will install automatically with first run.

Alternative - installing FakeNewsClassifier package

To use these models in different project, this package can be installed from official PyPi index:

pip3 install FakeNewsClassifier

which checks for dependencies and loads all needed modules. Importing the module

from FakeNewsClassifier import models

Running scripts directly

To run the program, input arguments must be used:

  • -m / -–model = model selection, options:
    • SNN = Sequential neural network,
    • NB = Naive Bayes,
    • SVM = Support vector machine,
    • DT = Decision tree,
  • -p / -–pretrained = skips the training, pretrained model is used,
  • -url / -–url [Online article] = article to parse and evaluate,
  • –v / –-verbose = verbose mode, showing graphs and explanations.

Using the FakeNewsClassifier package

  1. Object initialization, which takes two arguments

    classifier = FakeNewsClassifier.models.FakeNewsClassifier(model:str, dataset:str)

    where model is one of [SNN, NB, SVM, DT], and dataset is a relative path to dataset in csv format (by default it's Dezinfo SK included in the package).

  2. Model training, which has optional "verbose" parameter:

    classifier.train(verbose:bool)

  3. Article evaluation, also with "verbose" and "url" argument:

    classifier.evaluate(verbose:bool, url:str)

    where url is a link to an online article or full-text to evaluate.

Code snippets and commands

Minimalistic python script to achieve functionality would look like this:

from FakeNewsClassifier import models as FNC

article_url="https://www.ta3.com/clanok/265328/na-matovica-podali-pre-potycku-na-pochode-za-prava-trans-udi-trestne-oznamenie"

lol = FNC.FakeNewsClassifier('NB')  # unused argument 'dataset=' is by default looking for dezinfo_sk dataset within package
lol.train(verbose=True)             # specified model training, optional verbosity
lol.evaluate(verbose=True,url=article_url)  # article evluation on trained model, optional verbosity

If the train() method is not called, evaluate() will use default pretrained models within package.

When running scripts directly, same funcionality can be obtained like this:

python3 main.py -m NB -v

which trains NB model on default dezinfo_sk dataset within package. Alternatively, you can specify it using: -d path/data.csv.

python3 main.py -m NB -v -p -url https://www.ta3.com/clanok/265328/na-matovica-podali-pre-potycku-na-pochode-za-prava-trans-udi-trestne-oznamenie

which evaluates article from -url argument.

Note: if verbosity is allowed, matrices are stored in figures/ folder and Lime analytics in html/ folder.

Additional requirements and warnings

Dataset structure

Datasets to perform training on need to be in a standard csv file format and must contain collumns

  • label with two values, e.g. "Fake"/"True" or "0/1",
  • text or title (or both) with article titles and bodies.

Other collumns are neglected.

Recomended english datasets

WELFake dataset (Saurabh Shahane): https://www.kaggle.com/datasets/saurabhshahane/fake-news-classification

Fake and Real news dataset (Clément Bisaillon): https://kaggle.com/datasets/clmentbisaillon/fake-and-real-news-dataset

Real & Fake news (nop.ai): https://www.kaggle.com/datasets/nopdev/real-and-fake-news-dataset

Online articles

When provided with url, buiilt-in scraper will try to parse given page and extract the title and body. However, many websites are protected against automated scraping and bot attacks, which makes the online evaluation impossible. Alternatively, full-text can be used as the article to evaluate.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

FakeNewsClassifier-0.0.22.tar.gz (10.1 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

FakeNewsClassifier-0.0.22-py3-none-any.whl (10.1 MB view details)

Uploaded Python 3

File details

Details for the file FakeNewsClassifier-0.0.22.tar.gz.

File metadata

  • Download URL: FakeNewsClassifier-0.0.22.tar.gz
  • Upload date:
  • Size: 10.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.12

File hashes

Hashes for FakeNewsClassifier-0.0.22.tar.gz
Algorithm Hash digest
SHA256 84e3e6412267cd7650e291c6eb02ca445b9c3cd18f8e287c22b7e0428ce42495
MD5 2ca7e5a088d7e277dba3134bd241f971
BLAKE2b-256 3a60371a44dc4041c148ebe379a82c7a75ccf74c3f29f06c807cb2a4dcf232dc

See more details on using hashes here.

File details

Details for the file FakeNewsClassifier-0.0.22-py3-none-any.whl.

File metadata

File hashes

Hashes for FakeNewsClassifier-0.0.22-py3-none-any.whl
Algorithm Hash digest
SHA256 d6c08b3be200a49954585886ad6c62c4acc792b46a4419eec02667af1a8755bf
MD5 d56b49f835631bc2bc3df191f5aedaab
BLAKE2b-256 1f89294ebf0030ac7dde9e36cbed53bba1c247a948749d574fdcec9c55cf0f1a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page